I cannot understand why not all alarms are sent to my telegram. When i used centralized alarm notifications via email i got all notifications from alarms log. After that i disabled centralized alarm notifications of Netdata Cloud and setup Netdata Agent telegram notifications by that documentation section - Telegram | Learn Netdata. I left all the health configuration by default. And it’s working fine, but i only get system.ram alarms, but in the dashboard alarms logs I still see a lot more alarms that are not sent to telegram.
Testing of notifications(Alarm notifications | Learn Netdata) goes well - everything works. Do Netdata Cloud and Netdata Agent have separate alarm settings? If so, how can I configure Netdata Agent to receive all notifications from the dashboard alarms logs? Thank you!
Hello @itohin ,
Welcome to our community.
I am not sure about the cloud configuration, but all the information cloud has was gotten from your agents.
About the problem with telegram, please, can you run the following command and give us the output:
grep alarm-notify.sh /var/log/netdata/error.log
Another thing, please, do not forget to remove your telegram channel, it does not need to be public.
That is likely because of delay option.
This is used to provide optional hysteresis settings for the notifications, to defend against notification floods. These settings do not affect the actual alarm - only the time the
execscript is executed.
So it delays sending a notification (up: CLEAR → WARN/CRIT, down: WARN/CRIT → CLEAR).
The log reflects the actual state of the alarm. Netdata Cloud uses the actual state and doesn’t respect
delay (WIP to implement it).
Do Netdata Cloud and Netdata Agent have separate alarm settings?
Two different notifications dispatching pipelines.
@ilyam8 Thank you so much! Now everything is clear for me)
@itohin The alarm log contains a lot of values, we show only a few of them.
You can add additional columns using this menu
But default we don’t show alarm notifications script execution info at all.
I added the following columns:
- Script Run At
- Script Return Value
- Script Delay
- Script Delay Run At
The script is
alarm-notify.sh, we use it for sending notifications.
Let’s take the last entry:
- Event Date: means the alarm got CLEARed at 17:53:07
- Script Run At:
-means the script wasn’t executed so far (notifications haven’t been sent)
- Script Dealy:
1 minsays us that the current delay is 1 minute
- Script Delay Run At: means the script is going to be executed at 17:54:07 (1-minute delay)
- Script Return Value: means the script exit code (yes, we haven’t sent a notification, so there shouldn’t be any value - there is (OK), it is a bug)
Following your explanation I fail to understand why in my case,
recovered notification is not being fired.
I do however, receive
Eventually, the script never triggered even thought it has
Script Delay Run At with a timestamp.
@oddWill it is hard to say if there is a problem or not, according to your screenshot
httpcheck alarm is CLEARed at 10:01:28, and the notification script is going to be executed at 10:08:58 (delay is 7 mins 30 secs).
If the alarm goes into WARN/CRIT during the delay period - no CLEAR notification and that is by desing.
That alarm delay option
delay: down 5m multiplier 1.5 max 1h
multiplier 1.5 means that the delay period is multiplied by 1.5 when the alarm changes state, while a notification is delayed (that is why it is 7 mins 30 secs).
@oddWill consider removing
delay from the alarm, let’s see how it works for you without the option
Thanks for the reply, @ilyam8
I guess your suggestion to disable
delay made me check
health.d/httpcheck.conf again which helped me to noticing options: no-clear-notification is set on
bad content check.
Removing it sorted that
recovered notification for me!
Great find @oddWill Somehow I managed to miss it