Reachability alerts not particularly reliable

matiss · January 24, 2024, 4:38pm

Over a few months now I have been observing false unreachable notifications. Mixed hosts, mixed client versions. It seems that, based on some network conditions maybe, reachability tests are not particularly stable.

It does seem like hosts being online, with netdata agent working (confirmed no errors, reachable locally and service up) are reported offline, at seemingly random times/random intervals.
I often observe receiving not reachable notifications (webhooks), with reachable again webhooks MIA, or with significant delay (>10min)

Sometimes I am logged in onto the host, just to receive an alert about host being unreachable…

Is there something I am missing, a tunable that could prevent these alerts?

Wojtek_X · January 25, 2024, 4:06pm

Thats known issue Im affraid. Nothing has been done to solve this for well over a year.

If you are not new to coding, you may just open PR on GitHub. Otherwise, if you are plain user, not much can be done.

Luis_Johnstone · February 13, 2024, 6:22pm

You could provide some of the networking variables that you suspect might be involved- that might help start to build a picture of the issue. I, for example, never see the kind of issue that you are experiencing with nodes on-prem and in Azure. My hosts are all in the UK South region. I know Netdata’s ingress isn’t globally-distributed yet and so maybe there’s something in that?

You could also raise an issue on Github with some specific time-stamps so that perhaps the Netdata SRE team can look at what’s going on.

Topic		Replies	Views
I'm receiving a storm of reachability alerts Help cloud	6	889	March 24, 2021
Not receiving alerts for unreachable node Help cloud-alarms , agent	10	3373	April 8, 2022
Unreachable alerts are not being received Help cloud	1	623	September 10, 2021
node is unreachable SPAM Help cloud	2	276	July 25, 2023
Send notification when there is no network for 1 minute Help agent	4	546	January 17, 2022

Reachability alerts not particularly reliable

Related topics