High CPU use of Netdata

Suggested template:

Problem/Question

I am running 2 docker swarms that are both monitored by netdata. I decided to edit the config files in order to reduce the memory footprint. It worked fine on all worker nodes, but the two docker manager nodes showed close to 100% CPU use after the restart. I edited the docker stacks to reduce cpu allowance to 30%, and they are both staying there using 30% of CPU for more than a day already. I tried to delete the netdata config file to see if it was a config issue, but the problem remains. Those 2 nodes are manager nodes in the docker swarm, but in relation to netdata they are all independent nodes. Here a a screenshot:

Environment/Browser/Agent’s version etc

Netdata agent: 1.44.1
in Docker container

What I expected to happen

Just for appreciation, here a htop output:

Hey, @Martin_Neumann. The htop output not useful without thread names. And likely duplicate of netdata eating a lot of CPU

alright, what do I need to do to show that?

Check this topic netdata eating a lot of CPU - #2 by ilyam8

Yes, deleting the content in the cache folder did resolve the problem for now, but I also lost all historical data. If you give me instructions how to check for the thread name I could check on the other instance whether the cause was the same.

@Martin_Neumann there is a link the that thread, click on “using htop”

Anyway, I am pretty sure it will be “ACLKSYNC”. That is a bug, we know about it.

You can follow on GitHub:

  • issue
  • PR - I see it says “part 1”, it means merging it != fixing the problem.

deleting the content in the cache folder did resolve the problem for now

The problem is in netdata-meta.db (SQLite ), deleting cache content resolves the problem because you delete the file. We will provide an instruction on how to resolve the issue without deleting historical data - workaround until the issue is fixed.

Alright, I could confirm now that it is ACLKSYNC. Interestingly the problem appeared only on my two manager nodes of the docker swarm. All of worker nodes were updating fine without causing the same problem.