I am running 2 docker swarms that are both monitored by netdata. I decided to edit the config files in order to reduce the memory footprint. It worked fine on all worker nodes, but the two docker manager nodes showed close to 100% CPU use after the restart. I edited the docker stacks to reduce cpu allowance to 30%, and they are both staying there using 30% of CPU for more than a day already. I tried to delete the netdata config file to see if it was a config issue, but the problem remains. Those 2 nodes are manager nodes in the docker swarm, but in relation to netdata they are all independent nodes. Here a a screenshot:
Yes, deleting the content in the cache folder did resolve the problem for now, but I also lost all historical data. If you give me instructions how to check for the thread name I could check on the other instance whether the cause was the same.
PR - I see it says “part 1”, it means merging it != fixing the problem.
deleting the content in the cache folder did resolve the problem for now
The problem is in netdata-meta.db (SQLite ), deleting cache content resolves the problem because you delete the file. We will provide an instruction on how to resolve the issue without deleting historical data - workaround until the issue is fixed.
Alright, I could confirm now that it is ACLKSYNC. Interestingly the problem appeared only on my two manager nodes of the docker swarm. All of worker nodes were updating fine without causing the same problem.