Q: How to monitor for a running container to trigger alert when the container stops or dies
I don’t see us having anything really good in this area. I found how to alert on specific virtual network interfaces, a suggestion to use last_collected_sec of any collector here, and a different though useful alert called docker_unhealthy_containers. The problem seems to be the proper detection of a crash, as opposed to a normal termination. The only suggestion to distinguish this I’ve seen is here. We will discuss what we can do with @ilyam8
Q: How to monitor if a systemd service stops.
A: We now have a decent systemd units collector, which I suggest configuring to specifically monitor each service you want. This used to produce problematic charts on the cloud, but thanks to @ilyam8 they are now easier to understand (still on the nightlies, it will be in the next stable version too). We already have alerts for the failed state and it’s easy to add more for the inactive state as well (copy / paste the entire list and change $failed to $inactive plus the names).
Hi @Christopher_Akritid1
Thank you for this. Please, how do I see this in the dashboard after configuring from the collector in the machine. I have tried restarting the Netdata service (sudo systemctl restart netdata), but it still is not showing in any of the dashboard.
If it’s not showing up after changing the config file for these standard collectors, it almost always means there’s an error with that configuration and we can look into that in a different thread. Depending on the type of error, you may see something useful in /var/log/netdata/error.log.
It generally helps to run the collector in debug mode, when you have issues, so you don’t restart netdata and wait. Instructions on how to debug are in every collector’s documentation page.
In Netdata Cloud open your personal settings from the button on the bottom left. Under the tab “Notifications” select to receive “All alerts and unreachable” for the rooms that contain the nodes you want to be notified about. You will receive emails when the nodes in those rooms are disconnected from/reconnected to the cloud.