False Alarm?



  • I have the following alert setting.
    alarm: apps_cpu_Primetime
    on: apps.cpu
    os: linux
    hosts: *
    lookup: average -10m unaligned of Primetime
    unit: %
    every: 1m
    warn: $this > (($status >= $WARNING) ? (180) : (190))
    crit: $this > (($status == $CRITICAL) ? (190) : (290))
    delay: down 15m multiplier 1.5 max 1h
    info: App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes
    to: sysadmin

    And I got the email alert
    apps.cpu CHART
    apps cpu Primetime = 304.1%
    App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes ALARM
    cpu FAMILY
    CRITICAL SEVERITY
    Wed Sep 2 08:42:00 PDT 2020 TIME
    $this > (($status == $CRITICAL) ? (190) : (290)) EVALUATED EXPRESSION
    [ $this = 304.06178 ] [ $status = 1 ] [ $CRITICAL = 4 ] EXPRESSION VARIABLES
    The host has 6 WARNING and 1 CRITICAL alarm(s) raised.

    AFAIK, the average in the “lookup:” use average of collected dataset for the dimension of Primetime in the last 10 minutes (I setup update every 5 seconds). Then, I looked at the chart several times and I don’t know how the average of the CPU from the last 10 minutes could go above 290% (see screenshot).

    Is that the false alarm (bug)? Or am I missing something?

    Thanks.

    1013d5fe-67cf-4e16-b252-44863fb3f6e2-image.png



  • This post can be closed since the fix was using “min” instead of “average”.

    Thanks!


Log in to reply