I don’t believe it’s a logging problem. It seems that it’s not getting pod metrics. The pod metrics aren’t appearing in the dashboard.
It’d be nice if there was a place i could look on the node filesystem to see why netdata isn’t picking up the data. i looked in cgroup-name.sh but i don’t know the code well enough to figure out why netdata seems to be skipping this step.
What’s weird is it works fine on other nodes.
Then you just check: check if you have a line withFLOOD in logs, e.g.
2023-08-09 14:01:43: netdata LOG FLOOD PROTECTION too many logs (201 logs in 13 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process ‘netdata’ for 1187 seconds
Also, I suggest you provide details about your setup, it helps to debug issues. Based on provided info my only suggestion is to check logs:
Ok i disabled flood protection and now i don’t even see calls to cgroup-name.sh. The dash also just shows node metrics without any pod metrics. Unfortunately I can’t bring files down from my private network. It looks like it calls k8s api to get the pods and then looks in /sys/fs/cgroup. For some reason some nodes don’t process the pods to get metrics while other nodes that are almost similar do. I’d be nice if there was a log line that said ‘calls to /api/pods found N pods’