Good news! I think ebpf is the plugin causing the memory issues.
I went through the docs and disabled each of these plugins one at a time:
[plugins]
#proc = no
#diskspace = no
#cgroups = no
#tc = no
#idlejitter = no
#fping = no
#ioping = no
#node.d = no
#python.d = no
#go.d = no
#apps = no
#perf = no
#charts.d = no
ebpf = no
Each time the memory would jump back up, but I noticed that the memory usage was flapping between too high and normal, shown below. I saw in htop that it was the ebpf plugin that was starting and stopping at the same time the memory usage was jumping. After disabling that, it has stayed at a normal level.
3 Likes
ilyam8
April 9, 2021, 7:53am
23
Yeah, ebpf
was my suspect after checking your snapshot, i noticed that from time to time its cpu usage spikes to 95% (Applications). At the same time i saw a lot of do_fork calls (other dimensions) on the process (eBPF) chart.
ilyam8
April 9, 2021, 12:09pm
24
We did some tests and discovered several problems with ebpf.plugin
.
It contributes a lot to the kernel memory usage, indeed. That is why we see nothing in the ps
output, it is not memory used by the processes.
The issue in the netdata repo: high kernel memory usage when using ebpf plugin · Issue #10949 · netdata/netdata · GitHub
@haydenseitz ebpf.plugin
crashes on your server, those are the periods when memory usage drops.
Do you have cachestat
collector enabled in the ebpf.d.conf
?
We could make ebpf.plugin
crashing under the following conditions:
cachestat
collector enabled (ebpf.d.conf
)
spawn processes every seconds (while true; do sleep 1; for i in $(seq 1 200); do sleep 1 & done; done
)
1 Like
Thanks a lot @haydenseitz for your effort in identifying this. Without your analysis, it would take considerably more time for us to root-cause this.
Very nice job!
2 Likes
I’ve not changed any settings related to ebpf
(aside from disabling on most systems), so I would assume cachestat
was at it’s default state
ilyam8
April 15, 2021, 6:55pm
27
we added Percpu
dimension to the mem.kernel
chart (Memory->kernel->Memory Used by Kernel). ebpf.plugin
needs this memory to create hashtables [PR ]
Memory allocated to the percpu allocator used to back percpu allocations. This stat excludes the cost of metadata
and we significantly decreased the amount of used Percpu
memory by default (103=>29.8 MB on my server) [PR ]
I consider this case of Netdata consuming high ram fixed.
Keep in mind that those changes are not in the latest stable (v1.30.1), but in the master branch.
2 Likes
ilyam8
April 15, 2021, 6:56pm
28
@Lcarrillo @haydenseitz thanks for highlighting the problem and helping us to resolve it.
@Thiago_Marques_0 thanks for the fix
2 Likes
I think it’s still leaking memory, look at this picture out from ps_mem
command
ilyam8
January 11, 2022, 11:51am
30
@Ahmed_Kamel , hey. If you can give us more details (Netdata Agent version, your setup (parent/child/standalone), install method, etc.) it’d be helpful. If you think there is a bug consider creating a bug report in the netdata/netdata repo.
I believe the same thing is happening to me on two different systems. netdata Memory usage seems rather high:
---------------------------------------------------------------------------------------------
Install method:
wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh
---------------------------------------------------------------------------------------------
[plugins]
# PATH environment variable = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
# PYTHONPATH environment variable =
# checks = no
# idlejitter = yes
# timex = yes
# tc = yes
# diskspace = yes
# proc = yes
# cgroups = yes
# enable running new plugins = yes
# check for new plugins every = 60
# slabinfo = no
# python.d = yes
# node.d = yes
# nfacct = yes
# perf = yes
# charts.d = yes
# fping = yes
# go.d = yes
# ioping = yes
# apps = yes
# ebpf = yes
---------------------------------------------------------------------------------------------
Ubuntu 20.04.3 LTS \n \l
---------------------------------------------------------------------------------------------
root@foo:~# netdata -v
netdata v1.33.0-21-g4ec90eea7
---------------------------------------------------------------------------------------------
root@foo:~# ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -30
11.3 0.5 2858976 251405 /usr/sbin/mysqld
1.6 1.9 341332 437313 /usr/sbin/netdata -D
1.5 0.0 1457644 1068 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
1.3 0.0 1242580 715 /usr/lib/snapd/snapd