Alert when non-ephemeral nodes go stale

mkesper · February 5, 2024, 2:51pm

Problem/Question

Two nodes configured, one as child and one as parent. Configured alerts and notifications are working properly. No alert is triggered when the child cannot reach the parent, though.

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

AmazonLinux2023
netdata v1.44.1

What I expected to happen

An alert should be triggered by default when a child agent vanishes as then this node will go completely unmonitored!

hugo · February 6, 2024, 6:49pm

Hi @mkesper ,

We actually have this capability on Netdata Cloud through the Reachability notifications sent by Cloud.
Could you perhaps open a feature request for the Parent - Child case on the agent?

thomasjsn · February 10, 2024, 9:32pm

I would also like an alert if a child disconnects from its parent. Was this covered by the Reachability notifications? Or does it require a feature request, if so — was one created?

thomasjsn · February 15, 2024, 8:33pm

Ok, so this works. I just hadn’t enabled alerts for unreachable:

I had it set to All Alerts, but after I changed it to All Alerts and unreachable I do get notifications when child nodes go offline.

mkesper · February 27, 2024, 9:14am

I honestly didn’t think this needed a feature request, added: [Feat]: Add notification on child node going stale · Issue #17070 · netdata/netdata · GitHub

mkesper · February 27, 2024, 9:15am

This depends on netdata cloud, right?

thomasjsn · February 27, 2024, 11:57am

I am using NetData Cloud yes.

mkesper · April 5, 2024, 7:38am

OK, for anyone facing the same problem, have a look at Using Netdata with Prometheus | Learn Netdata and be sure to use the prometheus_all_hosts parameter like http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus_all_hosts&source=as-collected to include all hosts. That can easily be scraped even with simple tools like AWS route53 healtchecks or (if you don’t want to expose your netdata to the internet) a simple check by a web server like nginx which can then itself be checked from outside.

Topic		Replies	Views
Not receiving alerts for unreachable node Help cloud-alarms , agent	10	3220	April 8, 2022
Netdata Server Up/Down Notification Help cloud	4	1478	June 15, 2022
Alert if node is offline Help agent , cloud , alerts	1	50	May 7, 2025
Unreachable Netdata Cloud alerts to Slack/similar? Help cloud	4	975	April 7, 2022
Alerts from child node General	0	676	October 21, 2021

Alert when non-ephemeral nodes go stale

Problem/Question

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

What I expected to happen

Related topics