How to Monitor X questions

Received this from a user today and most questions are common enough to warrant a public answer:

Q: How to monitor and alert for endpoints e.g when no request is made to http://localhost:1234, or if the address gets a 400 or 500 status error page.

A: With the HTTP Check collector.


Q: How to monitor for dns or urls e.g when a url example.com does not serve its requested application.

A: With the DNS Query Collector.


Q: How to monitor for a running container to trigger alert when the container stops or dies

I don’t see us having anything really good in this area. I found how to alert on specific virtual network interfaces, a suggestion to use last_collected_sec of any collector here, and a different though useful alert called docker_unhealthy_containers. The problem seems to be the proper detection of a crash, as opposed to a normal termination. The only suggestion to distinguish this I’ve seen is here. We will discuss what we can do with @ilyam8


Q: How to monitor if a systemd service stops.

A: We now have a decent systemd units collector, which I suggest configuring to specifically monitor each service you want. This used to produce problematic charts on the cloud, but thanks to @ilyam8 they are now easier to understand (still on the nightlies, it will be in the next stable version too). We already have alerts for the failed state and it’s easy to add more for the inactive state as well (copy / paste the entire list and change $failed to $inactive plus the names).

3 Likes

I wonder if we should build an FAQ page like this in the docs somewhere?

Or maybe search in here is enough but could be nice to have a list somewhere in learn maybe.

2 Likes

Informative. Thank you.

Q: How to monitor SSL certificate expirations

A: Via the x509 certificate collector.

1 Like

Hi @Christopher_Akritid1
Thank you for this. Please, how do I see this in the dashboard after configuring from the collector in the machine. I have tried restarting the Netdata service (sudo systemctl restart netdata), but it still is not showing in any of the dashboard.

If it’s not showing up after changing the config file for these standard collectors, it almost always means there’s an error with that configuration and we can look into that in a different thread. Depending on the type of error, you may see something useful in /var/log/netdata/error.log.

It generally helps to run the collector in debug mode, when you have issues, so you don’t restart netdata and wait. Instructions on how to debug are in every collector’s documentation page.

Thanks. I have checked. Could not find nothing. But still not showing the dashboard.

Please create a new discussion with the config and the output of the debug command.

How to be alerted when a node goes offline

In Netdata Cloud open your personal settings from the button on the bottom left. Under the tab “Notifications” select to receive “All alerts and unreachable” for the rooms that contain the nodes you want to be notified about. You will receive emails when the nodes in those rooms are disconnected from/reconnected to the cloud.