Netdata Community

Feature Request: Repeat delays new alarms of same type

Disclaimer

Feature Request was originally posted on the GitHub:

Bug report summary

When using the repeat option and an alarm is being triggered → then cleared → and then triggered again, the last trigger is delayed for the repeat duration (I realized this when using slack notifications). I think every triggered alarm should have a unique id, so if a new alarm of the same type is triggered it should not be delayed by the repeat value, instead it should be treated as new alarm. Only if an existing alarm trigger is still open for the time of the repeat value (so it still has the same id) it should be delayed by value of repeat.

OS / Environment
Linux 9310 5.11.0-13-generic #14-Ubuntu SMP Fri Mar 19 16:55:27 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
/etc/lsb-release:DISTRIB_ID=Ubuntu
/etc/lsb-release:DISTRIB_RELEASE=21.04
/etc/lsb-release:DISTRIB_CODENAME=hirsute
/etc/lsb-release:DISTRIB_DESCRIPTION="Ubuntu Hirsute Hippo (development branch)"
/etc/os-release:NAME="Ubuntu"
/etc/os-release:VERSION="21.04 (Hirsute Hippo)"
/etc/os-release:ID=ubuntu
/etc/os-release:ID_LIKE=debian
/etc/os-release:PRETTY_NAME="Ubuntu Hirsute Hippo (development branch)"
/etc/os-release:VERSION_ID="21.04"
/etc/os-release:HOME_URL="https://www.ubuntu.com/"
/etc/os-release:SUPPORT_URL="https://help.ubuntu.com/"
/etc/os-release:BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
/etc/os-release:PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
/etc/os-release:VERSION_CODENAME=hirsute
/etc/os-release:UBUNTU_CODENAME=hirsute
Netdata version
Version: netdata v1.30.1
Configure options:  '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--with-user=netdata' '--with-math' '--with-zlib' '--with-webdir=/var/lib/netdata/www' '--with-bundled-lws=externaldeps/libwebsockets' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security'
Features:
    dbengine:                YES
    Native HTTPS:            YES
    Netdata Cloud:           YES 
    Cloud Implementation:    Legacy
    TLS Host Verification:   YES
Libraries:
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    LWS:                     YES static v3.2.2
    mosquitto:               YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    YES
    EBPF:                    YES
    IPMI:                    YES
    NFACCT:                  YES
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES

Installation method

Netdata installed via apt packagecloud package

Component Name

Repeat config

Steps To Reproduce
  1. configure healt_alarm_notify to get alarm notifications and then edit netdata.conf:
[health]
    default repeat warning = 3m
  1. Trigger a high ram alarm (on a system with 8 GB ram for example): stress -m 1 --vm-bytes 6G --vm-keep
  2. After the alarm triggers a notification, stop the stress test and see that the clear notification comes instantly. But when running the stress test again the next warning notification is delayed by 3 minutes

This is a problem if a repeat warning of 24h as a reminder is set, because then you will not get notified about new alarms for 24h!

Expected behavior

Do not delay new alarms when the old one of the same type is already cleared. Only delay still open alarms which have not been cleared in the meantime.

Thanks in advance :slight_smile: