Alert when non-ephemeral nodes go stale

Problem/Question

Two nodes configured, one as child and one as parent. Configured alerts and notifications are working properly. No alert is triggered when the child cannot reach the parent, though.

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

AmazonLinux2023
netdata v1.44.1

What I expected to happen

An alert should be triggered by default when a child agent vanishes as then this node will go completely unmonitored!

Hi @mkesper ,

We actually have this capability on Netdata Cloud through the Reachability notifications sent by Cloud.
Could you perhaps open a feature request for the Parent - Child case on the agent?

I would also like an alert if a child disconnects from its parent. Was this covered by the Reachability notifications? Or does it require a feature request, if so — was one created?

Ok, so this works. I just hadn’t enabled alerts for unreachable:

image

I had it set to All Alerts, but after I changed it to All Alerts and unreachable I do get notifications when child nodes go offline.

I honestly didn’t think this needed a feature request, added: [Feat]: Add notification on child node going stale · Issue #17070 · netdata/netdata · GitHub

This depends on netdata cloud, right?

I am using NetData Cloud yes.

OK, for anyone facing the same problem, have a look at Using Netdata with Prometheus | Learn Netdata and be sure to use the prometheus_all_hosts parameter like http://your.netdata.ip:19999/api/v1/allmetrics?format=prometheus_all_hosts&source=as-collected to include all hosts. That can easily be scraped even with simple tools like AWS route53 healtchecks or (if you don’t want to expose your netdata to the internet) a simple check by a web server like nginx which can then itself be checked from outside.