Does Netdata Cloud make different decisions as to which alarm notifications to send than the individual nodes do? I am asking because some of the alarms that I am notified of by emails sent by Netdata Cloud (level: ‘All alerts and unreachable’) are not sent to Slack by the individual nodes. I’m wondering if there’s a different delay calculation being done by the nodes than by Netdata Cloud, or if I have a misconfiguration.
For clarity: I have all my nodes monitored through Netdata Cloud. I also have Slack notifications set up individually on each of the nodes, in the /etc/netdata/health_alarm_notify.conf
file.
I’ve tested the Slack integration on all eight of the nodes I have running, and it works. When I run (as the net data user):
/usr/libexec/netdata/plugins.d/alarm-notify.sh test
I receive the three sample messages in Slack, as I should.
I also get some of the alarms that show up in Netdata Cloud, and in the emails that Netdata Cloud sends out. But not all. This morning, for example, I got both a web_log_1m_unmatched
Warning, and the associated Clear message, but neither came through to Slack.
I’ve looked through the error.log
on each of the eight servers, and I can’t see any reason why some messages would be sending and others would be not. Frankly, I’m a little stumped, but I’m sure I’m missing something obvious.
Thanks for your help.
Yes, agents can be configured to send their own notifications. Agent notifications are completely independent from the cloud notifications. It’s a bit confusing, but necessary to not force the use of cloud.
When we provide enough notification methods and features on the cloud like silencing etc we may reconsider if it males sense to allow both to be active at the same time, but for now we are letting them coexist.
But back to the specific issue you are facing. You may have configured slack notifications on the agent only for critical alerts, which would explain not receiving them for warnings.
Thank you for your reply.
I wondered about that, but from what I can see it still has the default settings. Here’s the config in health_alarm_notify.conf
on each node:
# slack (slack.com) global notification options
# multiple recipients can be given like this:
# "RECIPIENT1 RECIPIENT2 ..."
# enable/disable sending slack notifications
SEND_SLACK="YES"
# Login to your slack.com workspace and create an incoming webhook, using the "Incoming Webhooks" App: https://slack.com/apps/A0F7XDUAZ-incoming-webhooks
# Do not use the instructions in https://api.slack.com/incoming-webhooks#enable_webhooks, as those webhooks work only for a single channel.
# You need only one for all your netdata servers (or you can have one for each of your netdata).
# Without the app and a webhook, netdata cannot send slack notifications.
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/<redacted>"
# if a role's recipients are not configured, a notification will be send to:
# - A slack channel (syntax: '#channel' or 'channel')
# - A slack user (syntax: '@user')
# - The channel or user defined in slack for the webhook (syntax: '#')
# empty = do not send a notification for unconfigured roles
DEFAULT_RECIPIENT_SLACK="#"
I believe that, to have limited it, I’d have had to amend the DEFAULT_RECIPIENT_SLACK="#"
so that it read DEFAULT_RECIPIENT_SLACK="#|critical"
?
Is there anywhere else I should look?
Thank you.
I don’t understand why it might be failing, other than temporary errors, that would be recorded in /var/log/netdata/error.log
. The additional complication for the specific alert is the following:
In netdata/web_log.conf at master · netdata/netdata · GitHub you can see that this specific alert is configured by default to go to the role “webmaster” (as are many others, below in that file).
By default, that webmaster role also sends the notifications to ${DEFAULT_RECIPIENT_SLACK}
, see netdata/health_alarm_notify.conf at master · netdata/netdata · GitHub
So if you inadvertently changed that part of the file, you’d have with alerts going to that webmaster role, but it sounds a bit farfetched. Perhaps attach the whole conf file, with the sensitive parts redacted of course, just so we can be sure?
Thanks for the reply. I’m not sure why, but this seems to be working now. I’ll keep an eye on it and come back if I have another issue. The only thing I changed in the interim was to explicitly set all the other options (everything except Slack) in the file to =“NO”
.