Hi, @pk1966. The fix can be
- excluding your raid device from the default alarm.
- creating a custom alarm for your raid device (trigger if down > 1).
To do that you need to configure the “health.d/md.conf” file: we need to copy/paste “mdstat_disks” and add charts filter to both:
template: mdstat_disks
on: md.disks
class: Errors
type: System
component: RAID
charts: !*md0* *
units: failed devices
every: 10s
calc: $down
crit: $this > 0
info: number of devices in the down state for the $family array. \
Any number > 0 indicates that the array is degraded.
to: sysadmin
template: mdstat_disks_md0
on: md.disks
class: Errors
type: System
component: RAID
charts: *md0* !*
units: failed devices
every: 10s
calc: $down
crit: $this > 1
info: number of devices in the down state for the $family array. \
Any number > 0 indicates that the array is degraded.
to: sysadmin