In the Netdata logs I only see cgroup-name.sh in the logs once: INFO: cgroup ‘systemd’ is called ‘systemd’, labels ‘’
On another node I see all kinds of logs starting with cgroup-name.sh and then the pod name.
Any ideas on why this is skipped on some nodes? Is there some place I can look on filesystem to see why Netdata skips this step?
Hey, it is probably due to log flood protection. You can disable it in
netdata.conf by setting
errors flood protection period to
cgroup-name.sh for all cgroups that match the following pattern (can be found in
enable by default cgroups matching
run script to rename cgroups matching
I don’t believe it’s a logging problem. It seems that it’s not getting pod metrics. The pod metrics aren’t appearing in the dashboard.
It’d be nice if there was a place i could look on the node filesystem to see why netdata isn’t picking up the data. i looked in cgroup-name.sh but i don’t know the code well enough to figure out why netdata seems to be skipping this step.
What’s weird is it works fine on other nodes.
I don’t believe it’s a logging problem.
Then you just check: check if you have a line with
FLOOD in logs, e.g.
2023-08-09 14:01:43: netdata LOG FLOOD PROTECTION too many logs (201 logs in 13 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process ‘netdata’ for 1187 seconds
Also, I suggest you provide details about your setup, it helps to debug issues. Based on provided info my only suggestion is to check logs:
- disable flood protection
- grep all lines that have
Ok i disabled flood protection and now i don’t even see calls to cgroup-name.sh. The dash also just shows node metrics without any pod metrics. Unfortunately I can’t bring files down from my private network. It looks like it calls k8s api to get the pods and then looks in /sys/fs/cgroup. For some reason some nodes don’t process the pods to get metrics while other nodes that are almost similar do. I’d be nice if there was a log line that said ‘calls to /api/pods found N pods’
It would be really great if it logged ‘hey i tried to get pods by calling this url and this many came back’ on initial startup.
I got the netdata-debug container and and set debug flags = 0x0000000000100000 but i don’t see anything new.