Netdata started now. Error log showed that it’s couldn’t bind to the default netdata port, seemed like some processes are still running. Killed the already running processes. Now netdata is running.
The metrics are SHOWN.
However, I added the same search pattern on another host that’s also affected (and running v1.37.1), but there the metrics are still missing.
I suspect that Netdata wasn’t properly restarted, which another possible bug we have on your specific OS/Kernel.
The same pattern working on one machine and not the other screams permission issues, but let’s not go further here, this is complicated enough to require a real time discussion. See Ilya’s message above and get on a Discord conversation with him. We’ll probably have to replicate the set up in our lab.
The issue is back Ilya. The host (Kes) that we made the changes on during the end of the call, doesn’t seem to work for some reason. We’ve not made any changes from then on but it suddenly stopped working.
So we just uninstalled and followed the procedure to install static only version from the stable channel again, still no. There’s no croup find worker process running. Tried installing the nightly version as well with the static again, same results.
Here I attached the error log. Please have a look.
So maybe changing the home dir after installing the static version wasn’t the fix - maybe this is why on another host even static version didn’t show the LXC container metrics.
Netdata fails to start and systemctl status netdata shows that the main PID (code=exited, status=1/FAILURE).
The issue is back Ilya. The host (Kes) that we made the changes on during the end of the call, doesn’t seem to work for some reason. We’ve not made any changes from then on but it suddenly stopped working.
Can you please elaborate a bit, do I understand correctly:
you switched from native packages (deb) to static install on N servers (stable version).
it worked (we switched 2 servers during our call).
after some time all container metrics just disappear? On all servers? And Netdata restart doesn’t help?
I see this issue even on the latest stable version of Netdata. The container metrics suddenly disappear. Had to reinstall Netdata. Even then, after some time or after a few days, the metrics disappear suddenly. This recently happened on two of our hosts.