I seem to be fundamentally misunderstanding some part of alerting. I have a disk_usage health alarm setup, and I see it in the web UI, but my “script to execute on alarm” (which just writes to a log file) never gets executed. I do see it executed on another alert, a random ipv4_tcp_resets problem I seem to see every time I start netdata, so I know the script works. The only thing I can find that would mess with this is setting the “exec” config entry on the particular alarm, which I’m not doing. What would cause an alarm to show up in the UI, but not to get passed to the script I have registered for all alarms? Thanks!
I left netdata running during the day, and did eventually see a disk_usage alert when it switched from WARNING to CRITICAL; it looks like it’s just that alarms that are true on startup don’t trigger alerts? Is that correct/intended behavior?
I posted my error.log; there’s a lot of noise in it but I didn’t see anything relevant.
All right @mrozekma !
Let us know if we can help you.
I cleared some output files to get a fresh state to post and this time it seems to have worked. I’m realizing from the state of my script-generated log file that the script is running twice at the same time (processing two different alarms), which didn’t occur to me; that may have been causing issues. I’ll post again if I mess things up again; I thought I was just not setting up the alarm correctly, but clearly that wasn’t it. Thanks very much for the help!
A possible problem for you do not have your script executed is the permission your script has, not only the missing execution, but the group and owner can restrict the script to be executed, please, compare the permission that your script has with the permission of /usr/libexec/netdata/plugins.d/alarm-notify.sh.
Would it be possible to post your
error.log? (see docs. Also please post the health configuration files of both alarms.
We will get to the bottom of this