So I am trying to increase the thresholds for alerting for a specific label. So I dig through the documentation and come across that if a template and an alarm with the same name exists, the alarm will be preferred if both match. So from my understanding if I leave the original default config in peace and instead add a new config with an alarm that matches the alerts I want to modify with the correct labels this should override the alert limits.
However it currently looks like I also need to override the template and specifically exclude the labels to get it to work, otherwise it’ll just keep applying the normal rules.
Concrete example
/usr/lib/netdata/conf.d/health.d/web_log.conf
:
template: web_log_web_slow
on: web_log.request_processing_time
class: Latency
type: Web Server
component: Web log
lookup: average -1m unaligned of avg
units: ms
every: 10s
green: 500
red: 1000
warn: ($web_log_1m_requests > 120) ? ($this > $green && $this > ($web_log_10m_response_time * 2) ) : ( 0 )
crit: ($web_log_1m_requests > 120) ? ($this > $red && $this > ($web_log_10m_response_time * 4) ) : ( 0 )
delay: down 15m multiplier 1.5 max 1h
summary: Web log processing time
info: Average HTTP response time over the last 1 minute
options: no-clear-notification
to: webmaster
/etc/netdata/health.d/web_log_exceptions.conf
:
alarm: web_log_web_slow
on: web_log.request_processing_time
class: Latency
type: Web Server
component: Web log
lookup: average -5m unaligned of avg
units: ms
every: 30s
green: 500
red: 1000
warn: ($web_log_1m_requests > 120) ? ($this > $green && $this > ($web_log_10m_response_time * 2) ) : ( 0 )
crit: ($web_log_1m_requests > 120) ? ($this > $red && $this > ($web_log_10m_response_time * 4) ) : ( 0 )
delay: down 15m multiplier 1.5 max 1h
summary: Web log processing time for CTAN mirror
info: Average HTTP response time over the last 5 minutes of the CTAN mirror \
(this alert is set to 5 minutes instead of 1 due to some individual \
downloads taking a very long time!)
options: no-clear-notification
to: webmaster
chart labels: _collect_job=nginx_ctan-mirror
Now this does work, when I also include the web_log_web_slow
template in the config and specifically exclude the _collect_job=nginx_ctan-mirror
label.
Am I misunderstanding the documentation? Why do I need to override the template if I just want to modify individual instances?