Netdata consuming high ram amount

Good news! I think ebpf is the plugin causing the memory issues.

I went through the docs and disabled each of these plugins one at a time:

[plugins]
#proc = no
#diskspace = no
#cgroups = no
#tc = no
#idlejitter = no
#fping = no
#ioping = no
#node.d = no
#python.d = no
#go.d = no
#apps = no
#perf = no
#charts.d = no
ebpf = no

Each time the memory would jump back up, but I noticed that the memory usage was flapping between too high and normal, shown below. I saw in htop that it was the ebpf plugin that was starting and stopping at the same time the memory usage was jumping. After disabling that, it has stayed at a normal level.

3 Likes

Yeah, ebpf was my suspect after checking your snapshot, i noticed that from time to time its cpu usage spikes to 95% (Applications). At the same time i saw a lot of do_fork calls (other dimensions) on the process (eBPF) chart.

We did some tests and discovered several problems with ebpf.plugin.

It contributes a lot to the kernel memory usage, indeed. That is why we see nothing in the ps output, it is not memory used by the processes.

The issue in the netdata repo: high kernel memory usage when using ebpf plugin · Issue #10949 · netdata/netdata · GitHub

@haydenseitz ebpf.plugin crashes on your server, those are the periods when memory usage drops.

Do you have cachestat collector enabled in the ebpf.d.conf?

We could make ebpf.plugin crashing under the following conditions:

  • cachestat collector enabled (ebpf.d.conf)
  • spawn processes every seconds (while true; do sleep 1; for i in $(seq 1 200); do sleep 1 & done; done)
1 Like

Thanks a lot @haydenseitz for your effort in identifying this. Without your analysis, it would take considerably more time for us to root-cause this.

Very nice job! :heart:

2 Likes

I’ve not changed any settings related to ebpf(aside from disabling on most systems), so I would assume cachestat was at it’s default state

  • we added Percpu dimension to the mem.kernel chart (Memory->kernel->Memory Used by Kernel). ebpf.plugin needs this memory to create hashtables [PR]

Memory allocated to the percpu allocator used to back percpu allocations. This stat excludes the cost of metadata

  • and we significantly decreased the amount of used Percpu memory by default (103=>29.8 MB on my server) [PR]

I consider this case of Netdata consuming high ram fixed.

:exclamation: Keep in mind that those changes are not in the latest stable (v1.30.1), but in the master branch.

2 Likes

@Lcarrillo @haydenseitz thanks for highlighting the problem and helping us to resolve it.

@Thiago_Marques_0 thanks for the fix :wink:

2 Likes

I think it’s still leaking memory, look at this picture out from ps_mem command

@Ahmed_Kamel, hey. If you can give us more details (Netdata Agent version, your setup (parent/child/standalone), install method, etc.) it’d be helpful. If you think there is a bug consider creating a bug report in the netdata/netdata repo.

I believe the same thing is happening to me on two different systems.  netdata Memory usage seems rather high:
---------------------------------------------------------------------------------------------
Install method:
wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh
---------------------------------------------------------------------------------------------
[plugins]
	# PATH environment variable = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
	# PYTHONPATH environment variable = 
	# checks = no
	# idlejitter = yes
	# timex = yes
	# tc = yes
	# diskspace = yes
	# proc = yes
	# cgroups = yes
	# enable running new plugins = yes
	# check for new plugins every = 60
	# slabinfo = no
	# python.d = yes
	# node.d = yes
	# nfacct = yes
	# perf = yes
	# charts.d = yes
	# fping = yes
	# go.d = yes
	# ioping = yes
	# apps = yes
	# ebpf = yes
---------------------------------------------------------------------------------------------
Ubuntu 20.04.3 LTS \n \l
---------------------------------------------------------------------------------------------
root@foo:~# netdata -v
netdata v1.33.0-21-g4ec90eea7
---------------------------------------------------------------------------------------------
root@foo:~# ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -30
11.3  0.5 2858976 251405 /usr/sbin/mysqld
 1.6  1.9 341332  437313 /usr/sbin/netdata -D
 1.5  0.0 1457644   1068 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
 1.3  0.0 1242580    715 /usr/lib/snapd/snapd