Apps.plugin high CPU usage

Environment

OS: Debian Linux, Bullseye (testing)
Netdata version: v1.31.0-56-nightly
CPU: VPS, with 3 cores of an E5-2680 v2 (2.8Ghz)

Problem/Question

Netdata’s apps.plugin, as well as the netdata executable itself, are taking a large amount of CPU power on one of my machines, on average around 8-9% for apps.plugin and 4-5% for Netdata itself:

This seems excessive, given the very low CPU usage I’ve seen in the past. This is problematic because this VPS has “fair use” CPU and only has 33% CPU dedicated (that is, I can use one core constantly, and have occasional spikes of usage higher than that), and Netdata is taking quite a big chunk of that.

On another one of my systems, apps.plugin only takes 0.7% CPU and Netdata itself only takes 1%. The configuration is very similar on that machine.

How should I debug this?

What I expected to happen

CPU usage should not be this high.

I dropped the update every setting for both the apps and ebpf plugins to 10 from its default of 1, which helped with the CPU usage of those plugins (well, they still use a bit of CPU, just less frequently), however the CPU usage of netdata itself is still quite high.

Please restart Netdata, wait 2-3 minutes, and make a snapshot. Send it to vlad@netdata.cloud.

1 Like

Related issue apps.plugin consuming high cpu on medium load server · Issue #11164 · netdata/netdata · GitHub

Thanks Vladimir! I just sent you an email :slight_smile:

I looked at the snapshot you sent. There are ~1500 active processes on your machine plus new processes are created constantly. So, increased load, as it was mentioned in #11164, is expected.

I ran 30 docker containers on my VM and increased the number of processes using stress --vm 1000 --vm-bytes 1M. In this scenario, apps.plugin consumes pretty much the same amount of CPU time as in your case.

I couldn’t reproduce your results for the netdata process itself, but I suspect there is some dynamic load on your machine that causes slightly increased CPU utilization.

The main factor that affects apps.plugin CPU usage is the number of active processes. For instance, if I start 20k processes (sleep) I get 60% single core.

@Daniel the apps.plugin CPU usage you get is expected with the workload (num of processes, new processes creation rate, etc.). Increasing apps update every is a correct decision.

We will think about apps.plugin possible optimizations.


On another one of my systems, apps.plugin only takes 0.7% CPU and Netdata itself only takes 1%.

Compare both systems workload/number of collected metrics/charts/alarms.


Most of the time apps.plugin parses these files:

Name:   systemd
Umask:  0000
State:  S (sleeping)
Tgid:   1
Ngid:   0
Pid:    1
PPid:   0
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 128
Groups:
NStgid: 1
NSpid:  1
NSpgid: 1
NSsid:  1
VmPeak:   231872 kB
VmSize:   166476 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     11284 kB
VmRSS:     11284 kB
RssAnon:            3292 kB
RssFile:            7992 kB
RssShmem:              0 kB
VmData:    19532 kB
VmStk:      1036 kB
VmExe:       816 kB
VmLib:      8548 kB
VmPTE:        92 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
THP_enabled:    1
Threads:        1
SigQ:   0/63964
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 7be3c0fe28014a03
SigIgn: 0000000000001000
SigCgt: 00000001800004ec
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Seccomp_filters:        0
Speculation_Store_Bypass:       thread vulnerable
SpeculationIndirectBranch:      conditional enabled
Cpus_allowed:   ff
Cpus_allowed_list:      0-7
Mems_allowed:   00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        131607
nonvoluntary_ctxt_switches:     958

and

1 (systemd) S 0 1 1 0 -1 4194560 29436 69844781 149 11133 525 326 419586 98402 20 0 1 0 7 170471424 2821 18446744073709551615 1 1 0 0 0 0 671173123 4096 1260 0 0 0 17 4 0 0 6 0 0 0 0 0 0 0 0 0 0

Interesting! Thanks for looking into it :slight_smile:

Interestingly, htop only says there’s 375 “tasks” (which I assume means processes):

whereas top says 468 :thinking:

ps aux | wc -l agrees with top, so I’m not sure why htop’s count is lower.

In any case, the count seems way lower than 1500?

We get the total number of processes from /proc/loadavg

[ilyam@pc ~]$ cat /proc/loadavg
0.17 0.31 0.19 1/404 4320

The first three columns measure CPU and IO utilization of the last one, five, and 15 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used. (source)

The chart is System Overview->processes->system.active_processes.


I googled a bit and it seems the value (loadavg 4th column) is processes + kernel threads + user threads.

@Daniel to show them in htop:

  • H: hide/show user process threads.
  • h: hide/show kernel process threads.

And I see top shows processes + kernel threads as Tasks (doesn’t count user threads).

:thinking:

Ok, it seems the source of my confusion was my poor understand of threads vs processes on Linux. Here is a very good topic that sheds light on the subject. (no difference between a thread and a process on Linux)

what is that shading magic! my mind is blown!

Wow, I didn’t know about this! Thanks for the link.

I usually turn off display of threads (both kernel threads and user threads) in htop because it bloats the display a bit, as some apps have a lot of threads. Good to know.

@Daniel, could you profile the apps plugin in your environment?

If you don’t mind doing that, please

  1. Run sudo valgrind --tool=callgrind ./apps.plugin > /dev/null in a folder where you’ve built Netdata.
  2. Wait for a while.
  3. Open callgrind.out.<PID> using KCachegrind or QCacheGrind for Windows
  4. Share the resulting map with us :slight_smile: .