Hello,
I am facing the following issue:
- Notifications are sent even when alarms are set to “silent”
- Notifications are sent even when alarms are fully commented out
The notifications are sent to a Telegram group and we want to disable the ones that are less relevant to us because they are spamming the channel, hiding the important ones.
This happens on multiple servers with health checks from the following files:
- /etc/netdata/health.d/tcp_listen.conf → commenting the alarms seems to have work here
- /etc/netdata/health.d/web_log.conf → commenting the alarms only worked for some hosts
- /etc/netdata/health.d/postgres.conf → commenting the alarms doesn’t seem to work at all
Observed with the following versions of Netdata:
- v1.43.2
- v1.43.0-210-gc672d8ab1
I restarted the netdata service after every changes made to the health check files but I keep receiving notifications for those alarms.
Surely I’m doing something wrong or forgetting something but I can’t put my finger on it.
Any idea what I could do to investigate this further?
Thank you
Hi @Kevin_Thierry
Do note, that even if you comment or delete alerts from /etc/netdata/health.d
, their respective stock alerts will still be loaded from the default directory (usually in /usr/lib/netdata/conf.d/health.d/
).
A better way to disable some alerts is to use the enabled alarms
config option (more info here → Configure alerts | Learn Netdata)
Thanks for the quick reply @Manolis_Vasilakis, I will check the enabled alarms
config option.
What about the fact that the “to: silent” parameter does not work?
I also tried to remove the critical alert and only left the warning one for those health checks and I still receive critical alerts notifications…
I thought that my custom postgres.conf was ignored but I have some alerts from a custom plugin inside it and those are working so this is not the issue.
1 Like
I also added the following to /etc/netdata/netdata.conf:
[health]
enabled alarms = !postgres_* *
And I still get notifications form the postgres_* alarms…
Hi @Kevin_Thierry
Ok, one thing to check. Could you check http://localhost:19999/api/v1/alarms?all
?
This should list all running alerts currently on the agent.
Also, are we talking about a single agent? Any streaming of other agents to it? Is it connected to the cloud?
Thanks
Hi @Manolis_Vasilakis,
Thank you for your reply.
This is a child node streaming to a parent node.
The API call does not return any alarms:
curl http://localhost:19999/api/v1/alarms?all?
{
"hostname": "xxxxxx",
"latest_alarm_log_unique_id": 1692624511,
"status": true,
"now": 1700035940,
"alarms": {
}
}
Ah, could you remove the final ?
from the url? It should be http://localhost:19999/api/v1/alarms?all
.
Also, just a note so to perhaps clear some area on alerts:
Both the child and the parent run health. Each for it’s own metrics, charts, etc.
In addition, the parent will run health also for the child.
When you configure an alert on the parent, that will apply to the parent itself and the child. However, the child itself will also run it’s own health configuration.
So if for example you setup the parent to not load any postgres
alerts, the child might continue to run them if a similar configuration is not made on the child’s netdata.conf
Is it possible that this would be the case here?
Thanks a lot for all the information.
I just found out that there are many errors like the one hereunder in my postgres.conf file:
2023-11-15 08:25:12: netdata ERROR : HEALTH : Health configuration at line 4 of file '/etc/netdata/health.d/postgres.conf' has unknown key 'on'. Expected either 'alarm' or 'template'
But I can’t figure out what the error is, these are the top lines of the file:
# you can disable an alarm notification by setting the 'to' line to: silent
template: postgres_total_connection_utilization
on: postgres.connections_utilization
class: Utilization
type: Database
component: PostgreSQL
hosts: *
lookup: average -1m unaligned of used
units: %
every: 1m
warn: $this > (($status >= $WARNING) ? (50) : (55))
crit: $this > (($status >= $CRITICAL) ? (55) : (60))
delay: down 15m multiplier 1.5 max 1h
info: average total connection utilization over the last minute
to: dba
I ran the api call without the ‘?’ at the end and I get the list of alarms which doesn’t contain alarms from the postgres.conf file.
As you suggest, the parent may be involved with this issue. I will check the config running on it too.
Thank you for your help
Please ignore that error. It’s just a by-product of you disabling it via the enabled alarms
config option, will fix it.
There is no health checks for postgres on the parent (no health.d/postgres.conf file and “curl http://localhost:19999/api/v1/alarms?all” does not return postgres alarms) so I still don’t know where those alarms come from
@Manolis_Vasilakis, you were right about the parent throwing those alerts. I added a postgres.conf configuration on the parent with the alarms set to silent and it stopped the notifications.
Thanks again for your help!
No problem, thanks for the follow up!