As stated in the title, container metrics (lxc containers) are not shown/visible in the Netdata dashboard. Now the version we’re using is v1.37.1. Netdata is installed on the host, and on the same host we have several lxc containers running. Netdata shows metrics for the actual host but not the metrics of the lxc containers.
Interestingly, we have the same version of Netdata on another host and there we do not see this problem.
Relevant docs you followed/actions you took to solve the issue
We tried uninstalling Netdata completely and re-installed it.
What I expected to happen
The dashboard should show lxc container metrics.
Below I’ve shared the errors we see in the error.log file on the host where container metrics are not shown.
(I’m hitting the body character limit, hence the pastebin link)
On host where the issue is:
On host where we see Netdata showing container info but some error related to cgroup-network:
Please ask if you need more information, happy to provide!
Here is the output from /var/log/netdata/error.log
2022-12-14 11:14:16: ebpf.plugin INFO : EBPF CGROUP : thread with task id 452551 finished
2022-12-14 11:14:16: netdata INFO : MAIN : EXIT: Stopping main thread: PLUGIN[cgroups]
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : cleaning up...
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : stopping discovery thread worker
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : waiting for discovery thread to finish...
2022-12-14 11:14:19: netdata INFO : PLUGIN[cgroups] : thread with task id 452394 finished
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : thread created with task id 610283
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : set name of thread 610283 to PLUGIN[cgroups]
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : cgroups v2 (unified cgroups) is available but are disabled on this system.
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : use unified cgroups false
2022-12-14 11:14:23: ebpf.plugin INFO : EBPF CGROUP : thread created with task id 610446
2022-12-14 11:14:23: ebpf.plugin INFO : EBPF CGROUP : set name of thread 610446 to EBPF CGROUP
2022-12-14 11:14:16: ebpf.plugin INFO : EBPF CGROUP : thread with task id 452551 finished
2022-12-14 11:14:16: netdata INFO : MAIN : EXIT: Stopping main thread: PLUGIN[cgroups]
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : cleaning up...
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : stopping discovery thread worker
2022-12-14 11:14:16: netdata INFO : PLUGIN[cgroups] : waiting for discovery thread to finish...
2022-12-14 11:14:19: netdata INFO : PLUGIN[cgroups] : thread with task id 452394 finished
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : thread created with task id 610283
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : set name of thread 610283 to PLUGIN[cgroups]
2022-12-14 11:14:23: apps.plugin ERROR : MAIN : PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf' (errno 2, No such file or directory)
2022-12-14 11:14:23: apps.plugin INFO : MAIN : Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
2022-12-14 11:14:23: apps.plugin INFO : MAIN : Loaded config file '/usr/lib/netdata/conf.d/apps_groups.conf'
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : cgroups v2 (unified cgroups) is available but are disabled on this system.
2022-12-14 11:14:23: netdata INFO : PLUGIN[cgroups] : use unified cgroups false
2022-12-14 11:14:23: INFO : MAIN : Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
2022-12-14 11:14:23: ebpf.plugin INFO : EBPF CGROUP : thread created with task id 610446
2022-12-14 11:14:23: ebpf.plugin INFO : EBPF CGROUP : set name of thread 610446 to EBPF CGROUP
2022-12-14 11:14:24: go.d ERROR: prometheus[kafka_consumer_group_exporter_local] Get "http://127.0.0.1:9208/metrics": dial tcp 127.0.0.1:9208: connect: connection refused
2022-12-14 11:14:24: go.d ERROR: prometheus[kafka_consumer_group_exporter_local] check failed
2022-12-14 11:14:24: go.d ERROR: prometheus[exporter_for_grouped_process_local] Get "http://127.0.0.1:9644/metrics": dial tcp 127.0.0.1:9644: connect: connection refused
2022-12-14 11:14:24: go.d ERROR: prometheus[exporter_for_grouped_process_local] check failed
Some logs can be missing if you haven’t disabled log flood protection (errors flood protection period). If you did it - I don’t know what is happening.
We can add cgroups.plugin debug logging but it will require compiling Netdata with debugging. An alternative solution is to create a Docker container with a custom Netdata image (I can do it).
Additionally, can you send me a snapshot to ilya@netdata.cloud?
Interesting, and you don’t see the container metrics?
It looks like all lxc.payload.X groups have been found and are being collected.
found and renamed
Downloads $ grep "lxc\.payload\..*" netdata_docker_logs.txt | grep "is called"
2022-12-15 10:39:27: cgroup-name.sh: INFO: cgroup 'lxc.payload.hydrogen' is called 'hydrogen'
2022-12-15 10:39:28: cgroup-name.sh: INFO: cgroup 'lxc.payload.planb' is called 'planb'
2022-12-15 10:39:30: cgroup-name.sh: INFO: cgroup 'lxc.payload.microsrvc2' is called 'microsrvc2'
2022-12-15 10:39:32: cgroup-name.sh: INFO: cgroup 'lxc.payload.mynode7' is called 'mynode7'
2022-12-15 10:39:37: cgroup-name.sh: INFO: cgroup 'lxc.payload.sapphirecap4' is called 'sapphirecap4'
2022-12-15 10:39:39: cgroup-name.sh: INFO: cgroup 'lxc.payload.emailserver' is called 'emailserver'
2022-12-15 10:39:41: cgroup-name.sh: INFO: cgroup 'lxc.payload.reverseprox1' is called 'reverseprox1'
2022-12-15 10:39:43: cgroup-name.sh: INFO: cgroup 'lxc.payload.butterflyeu' is called 'butterflyeu'
2022-12-15 10:39:49: cgroup-name.sh: INFO: cgroup 'lxc.payload.peng' is called 'peng'
2022-12-15 10:39:51: cgroup-name.sh: INFO: cgroup 'lxc.payload.bipradix' is called 'bipradix'
2022-12-15 10:39:52: cgroup-name.sh: INFO: cgroup 'lxc.payload.ireallydo' is called 'ireallydo'
2022-12-15 10:39:54: cgroup-name.sh: INFO: cgroup 'lxc.payload.ads' is called 'ads'
...
Yes, after creating the container I visited the agent dashboard and I didn’t find the container metrics (usually I would find them in the bottom right)
I think you need lxc.playload.* (no config changes are needed), not lxc.monitor.*. I don’t know what is happening. I see no problems in the logs. I will add more debug logs and build another custom image.
Sorry to bother but is that the right syntax? I guess an “!” means not to consider that, so I removed it and restarted Netdata but still nothing - no information on container metrics.