Collector causing high dbengine memory usage?

wca · April 30, 2021, 10:43pm

Problem/Question

Recently I noticed netdata was using over 20gb and upon inspection I saw that there was a significant allocation to the db engine.

This was a gradual increase and took about 3 months before raising alarms. I have since restarted netdata on the 29th, but the memory usage is steadily increasing and consumes around 1gb in 24 hours.

Some screenshots from grafana:

The odd thing is, we have this exact deployment in another location with the same uptime and netdata only uses about 300mb. Both locations are using an unmodified varnish.chart.py with identical versions of netdata, and identical configurations.

The only difference I can see is the bad location has about double the outout from varnishstat (1000 lines vs 2200 lines)

Ive checked the docs and used the resource calculator with the following and it tells me 156.25 MiB in total disk space 188 MiB in system memory. My configuration is well within these boundaries.

days needed for storage: 0
update every: 30
Metrics collected: 20k
streaming nodes: 0
compression ratio: 70
page cache size: 32

Could the extra dimensions coming from the varnish plugin be to blame? If so, shouldn’t netdata be limiting these resources?

Any help in troubleshooting this would be greatly appreciated. Thanks

Additional steps

Ive since installed netdata v1.30.1 on one of the effected nodes and purged any legacy settings, however it seems to still having the same issue.

Configuration

[global]
	hostname = varnish01.example.com
	run as user = netdata
	history = 300
	process scheduling policy = idle
	OOM score = 1000
	update every = 30
	memory mode = dbengine
	page cache size = 32
	dbengine disk space = 256
	
[web]
	web files owner = root
	web files group = netdata
	bind to = localhost
	mode = none


[plugins]
	proc = yes
	diskspace = yes
	cgroups = no
	tc = no
	idlejitter = yes
	enable running new plugins = no
	check for new plugins every = 60
	slabinfo = no
	apps = yes
	charts.d = no
	ebpf = no
	fping = no
	go.d = no
	ioping = no
	node.d = no
	perf = no
	python.d = yes

[health]
	enabled = no

[registry]
	enabled = no

[backend]
	enabled = yes
	data source = average
	type = graphite
	destination = graphite.example.com:2003
	prefix = netdata
	hostname = varnish01_example_com
	update every = 60
	buffer on failures = 10
	timeout ms = 20000
	send charts matching = *

[statsd]
	enabled = no

[plugin:proc]
	netdata server resources = yes
	/proc/pagetypeinfo = no
	/proc/stat = yes
	/proc/uptime = yes
	/proc/loadavg = yes
	/proc/sys/kernel/random/entropy_avail = yes
	/proc/pressure = yes
	/proc/interrupts = yes
	/proc/softirqs = yes
	/proc/vmstat = yes
	/proc/meminfo = yes
	/sys/kernel/mm/ksm = yes
	/sys/block/zram = yes
	/sys/devices/system/edac/mc = yes
	/sys/devices/system/node = yes
	/proc/net/dev = yes
	/proc/net/sockstat = yes
	/proc/net/sockstat6 = yes
	/proc/net/netstat = yes
	/proc/net/snmp = no
	/proc/net/snmp6 = no
	/proc/net/sctp/snmp = no
	/proc/net/softnet_stat = yes
	/proc/net/ip_vs/stats = yes
	/sys/class/infiniband = no
	/proc/net/stat/conntrack = no
	/proc/net/stat/synproxy = no
	/proc/diskstats = yes
	/proc/mdstat = yes
	/proc/net/rpc/nfsd = no
	/proc/net/rpc/nfs = no
	/proc/spl/kstat/zfs/arcstats = no
	/sys/fs/btrfs = no
	ipc = yes
	/sys/class/power_supply = no

[plugin:proc:diskspace]
	update every = 30
	check for new mount points every = 60

[plugin:apps]
	update every = 30

[plugin:python.d]
	update every = 30

[netdata.statsd_metrics]
	enabled = no

[netdata.statsd_useful_metrics]
	enabled = no

[netdata.statsd_events]
	enabled = no

[netdata.statsd_reads]
	enabled = no

[netdata.statsd_bytes]
	enabled = no

[netdata.statsd_packets]
	enabled = no

[netdata.tcp_connects]
	enabled = no

[netdata.tcp_connected]
	enabled = no

[netdata.private_charts]
	enabled = no

[netdata.plugin_statsd_charting_cpu]
	enabled = no

[netdata.plugin_statsd_collector1_cpu]
	enabled = no

Environment

centos 7.8
netdata 1.26.0

OdysLam · May 6, 2021, 12:47pm

Hey @wca,

Thanks for sharing this very detailed report. Pinging @Stelios_Fragkakis and @ilyam8 who will be able to assist further.

In general, we are working towards further optimizing Netdata, so you can expect improvements in the following months.

Topic		Replies	Views
dbengine directory growing despite configured disk usage limits Help	10	1524	February 2, 2023
Netdata dbengine not keeping enough data Help agent-dbengine , agent	9	926	April 27, 2021
Recommendations for low RAM installations Help agent	2	803	December 20, 2020
Possible memory leak in netdata agent v1.37.0-115-gdb0eb4556 on RasPi 3. Help needed to Fix Help agent	5	765	January 13, 2023
Minimim memory usage for netdata? General agent-configuration , faq	4	784	September 28, 2021

Collector causing high dbengine memory usage?

Problem/Question

Additional steps

Configuration

Environment

Related topics