Collector causing high dbengine memory usage?

Problem/Question

Recently I noticed netdata was using over 20gb and upon inspection I saw that there was a significant allocation to the db engine.

This was a gradual increase and took about 3 months before raising alarms. I have since restarted netdata on the 29th, but the memory usage is steadily increasing and consumes around 1gb in 24 hours.

Some screenshots from grafana:

The odd thing is, we have this exact deployment in another location with the same uptime and netdata only uses about 300mb. Both locations are using an unmodified varnish.chart.py with identical versions of netdata, and identical configurations.

The only difference I can see is the bad location has about double the outout from varnishstat (1000 lines vs 2200 lines)

Ive checked the docs and used the resource calculator with the following and it tells me 156.25 MiB in total disk space 188 MiB in system memory. My configuration is well within these boundaries.

  • days needed for storage: 0
  • update every: 30
  • Metrics collected: 20k
  • streaming nodes: 0
  • compression ratio: 70
  • page cache size: 32

Could the extra dimensions coming from the varnish plugin be to blame? If so, shouldn’t netdata be limiting these resources?

Any help in troubleshooting this would be greatly appreciated. Thanks

Additional steps

Ive since installed netdata v1.30.1 on one of the effected nodes and purged any legacy settings, however it seems to still having the same issue.

Configuration

[global]
	hostname = varnish01.example.com
	run as user = netdata
	history = 300
	process scheduling policy = idle
	OOM score = 1000
	update every = 30
	memory mode = dbengine
	page cache size = 32
	dbengine disk space = 256
	
[web]
	web files owner = root
	web files group = netdata
	bind to = localhost
	mode = none


[plugins]
	proc = yes
	diskspace = yes
	cgroups = no
	tc = no
	idlejitter = yes
	enable running new plugins = no
	check for new plugins every = 60
	slabinfo = no
	apps = yes
	charts.d = no
	ebpf = no
	fping = no
	go.d = no
	ioping = no
	node.d = no
	perf = no
	python.d = yes

[health]
	enabled = no

[registry]
	enabled = no

[backend]
	enabled = yes
	data source = average
	type = graphite
	destination = graphite.example.com:2003
	prefix = netdata
	hostname = varnish01_example_com
	update every = 60
	buffer on failures = 10
	timeout ms = 20000
	send charts matching = *

[statsd]
	enabled = no

[plugin:proc]
	netdata server resources = yes
	/proc/pagetypeinfo = no
	/proc/stat = yes
	/proc/uptime = yes
	/proc/loadavg = yes
	/proc/sys/kernel/random/entropy_avail = yes
	/proc/pressure = yes
	/proc/interrupts = yes
	/proc/softirqs = yes
	/proc/vmstat = yes
	/proc/meminfo = yes
	/sys/kernel/mm/ksm = yes
	/sys/block/zram = yes
	/sys/devices/system/edac/mc = yes
	/sys/devices/system/node = yes
	/proc/net/dev = yes
	/proc/net/sockstat = yes
	/proc/net/sockstat6 = yes
	/proc/net/netstat = yes
	/proc/net/snmp = no
	/proc/net/snmp6 = no
	/proc/net/sctp/snmp = no
	/proc/net/softnet_stat = yes
	/proc/net/ip_vs/stats = yes
	/sys/class/infiniband = no
	/proc/net/stat/conntrack = no
	/proc/net/stat/synproxy = no
	/proc/diskstats = yes
	/proc/mdstat = yes
	/proc/net/rpc/nfsd = no
	/proc/net/rpc/nfs = no
	/proc/spl/kstat/zfs/arcstats = no
	/sys/fs/btrfs = no
	ipc = yes
	/sys/class/power_supply = no

[plugin:proc:diskspace]
	update every = 30
	check for new mount points every = 60

[plugin:apps]
	update every = 30

[plugin:python.d]
	update every = 30

[netdata.statsd_metrics]
	enabled = no

[netdata.statsd_useful_metrics]
	enabled = no

[netdata.statsd_events]
	enabled = no

[netdata.statsd_reads]
	enabled = no

[netdata.statsd_bytes]
	enabled = no

[netdata.statsd_packets]
	enabled = no

[netdata.tcp_connects]
	enabled = no

[netdata.tcp_connected]
	enabled = no

[netdata.private_charts]
	enabled = no

[netdata.plugin_statsd_charting_cpu]
	enabled = no

[netdata.plugin_statsd_collector1_cpu]
	enabled = no

Environment

centos 7.8
netdata 1.26.0

Hey @wca,

Thanks for sharing this very detailed report. Pinging @Stelios_Fragkakis and @ilyam8 who will be able to assist further.

In general, we are working towards further optimizing Netdata, so you can expect improvements in the following months.