So, I have connected all of my nodes with netdata cloud and sometimes my nodes will lose internet connectivity for few seconds (5-60seconds) and netdata will trigger a email notification.
This is a common issue with my ISP and It causes a lot of unwanted notification to my email.
So, I was wondering if there is any setting within netdata that I can configure to delay the up/down notification by X minutes just like regular alarms?
Example: It should not send any email unless my node has been down for more than X minutes (Say 10 minutes).
Is that possible? If that is not available right now
We do plan to make the delay configurable at some point, just not sure when.
One workaround I can think of is to disable cloud reachability notifications completely and have those nodes stream to a parent. I believe that the parent can be configured to generate an alert when a child stops streaming (if it doesn’t have such an alert already, sometimes a go.d.plugin alert can be such an indication). If it sounds acceptable, I could investigate a bit more.
I couldn’t reliably get any alerts from child disconnections. We will need to work on this too.
The only other alternatives are to configure [fping]
(fping.plugin | Learn Netdata) and/or httpcheck on at least one node, so that it monitors the other.
I would still:
Configure streaming with all your nodes replicating their data to a parent node
Configure both fping and httpcheck on that parent, so it alerts on the health of the children. Even if you don’t have web servers running on those children, you can httpcheck the agent’s Netdata UI on port 19999.