How to Monitor X questions

Christopher_Akritid1 · September 19, 2022, 1:51pm

Received this from a user today and most questions are common enough to warrant a public answer:

Q: How to monitor and alert for endpoints e.g when no request is made to http://localhost:1234, or if the address gets a 400 or 500 status error page.

A: With the HTTP Check collector.

Q: How to monitor for dns or urls e.g when a url example.com does not serve its requested application.

A: With the DNS Query Collector.

Q: How to monitor for a running container to trigger alert when the container stops or dies

I don’t see us having anything really good in this area. I found how to alert on specific virtual network interfaces, a suggestion to use last_collected_sec of any collector here, and a different though useful alert called docker_unhealthy_containers. The problem seems to be the proper detection of a crash, as opposed to a normal termination. The only suggestion to distinguish this I’ve seen is here. We will discuss what we can do with @ilyam8

Q: How to monitor if a systemd service stops.

A: We now have a decent systemd units collector, which I suggest configuring to specifically monitor each service you want. This used to produce problematic charts on the cloud, but thanks to @ilyam8 they are now easier to understand (still on the nightlies, it will be in the next stable version too). We already have alerts for the failed state and it’s easy to add more for the inactive state as well (copy / paste the entire list and change $failed to $inactive plus the names).

andrewm4894 · September 19, 2022, 7:05pm

I wonder if we should build an FAQ page like this in the docs somewhere?

Or maybe search in here is enough but could be nice to have a list somewhere in learn maybe.

dare · September 20, 2022, 10:19am

Informative. Thank you.

Christopher_Akritid1 · September 20, 2022, 1:01pm

Q: How to monitor SSL certificate expirations

A: Via the x509 certificate collector.

dare · September 21, 2022, 2:51am

Hi @Christopher_Akritid1
Thank you for this. Please, how do I see this in the dashboard after configuring from the collector in the machine. I have tried restarting the Netdata service (sudo systemctl restart netdata), but it still is not showing in any of the dashboard.

Christopher_Akritid1 · September 21, 2022, 2:01pm

If it’s not showing up after changing the config file for these standard collectors, it almost always means there’s an error with that configuration and we can look into that in a different thread. Depending on the type of error, you may see something useful in /var/log/netdata/error.log.

It generally helps to run the collector in debug mode, when you have issues, so you don’t restart netdata and wait. Instructions on how to debug are in every collector’s documentation page.

dare · September 23, 2022, 4:04pm

Thanks. I have checked. Could not find nothing. But still not showing the dashboard.

Christopher_Akritid1 · September 23, 2022, 6:49pm

Please create a new discussion with the config and the output of the debug command.

Christopher_Akritid1 · December 16, 2022, 10:30am

How to be alerted when a node goes offline

In Netdata Cloud open your personal settings from the button on the bottom left. Under the tab “Notifications” select to receive “All alerts and unreachable” for the rooms that contain the nodes you want to be notified about. You will receive emails when the nodes in those rooms are disconnected from/reconnected to the cloud.

Topic		Replies	Views
Marking metrics or collectors as required for alerting purposes General feedback	2	348	September 7, 2023
Alert for another docker container crash Help agent-alarms , container-monitoring , agent	3	2356	February 10, 2021
How to provide a list of expected services Help agent	5	520	March 8, 2021
Monitoring When a container is down Help container-monitoring , agent	7	2295	December 1, 2021
Virtualmin / Websites monitoring httpcheck, problem flood alerting Help	7	302	March 16, 2024

How to Monitor X questions

Related topics