With the exception to the alarm editing options I am a very happy netdata user. I understand that our environment (containers running on Proxmox VE) has some special configurations needed, particularly as we use LVM and create many snapshots for backup purposes that tend to trigger too many alerts.
I am trying to reduce, rather than silent most of the many alerts I receive and I am stack with this one. I know there is a documentation regarding alarm editing but there is a lot of margin to improve it from my point of view.
I keep getting this kind of alerts:
10min_disk_backlog
on xxxxx
124132.79 ms
Details: average of the kernel estimated disk backlog, for the last 15 minutes
Chart: disk_backlog.dm-16-0l080Zl6JNEpMp03Bz3zT6HRhbRSYH8t53LqzJ7r8q3kSQk3JU9KIzvP9kgYO33G-tpool
Context: disk.backlog
Family:
Raised to critical, for 0 seconds
I edited the alert config with:
template: 10min_disk_backlog
on: disk.backlog
os: linux
hosts: *
families: !pve-vm--* !pve-thin-t* !dm-*
lookup: average -20m unaligned
units: ms
every: 10m
green: 6000
red: 20000
warn: $this > $green * (($status >= $WARNING) ? (0.7) : (1))
crit: $this > $red * (($status == $CRITICAL) ? (0.7) : (1))
delay: down 60m multiplier 1.2 max 2h
info: average of the kernel estimated disk backlog, for the last 60 minutes
to: sysadmin
But obviously as the alert has no “family” on it, I understand is not applying, so not sure how to remove all the dm-?? alerts.
I also have the impression this alert has started with a recent update. Currently running vo. v1.43.1
The alert editing feature or at least an assistant, would be to me the most needed improvemente to netdata