I have the following alert setting.
lookup: average -10m unaligned of Primetime
warn: $this > (($status >= $WARNING) ? (180) : (190))
crit: $this > (($status == $CRITICAL) ? (190) : (290))
delay: down 15m multiplier 1.5 max 1h
info: App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes
And I got the email alert
apps cpu Primetime = 304.1%
App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes ALARM
Wed Sep 2 08:42:00 PDT 2020 TIME
$this > (($status == $CRITICAL) ? (190) : (290)) EVALUATED EXPRESSION
[ $this = 304.06178 ] [ $status = 1 ] [ $CRITICAL = 4 ] EXPRESSION VARIABLES
The host has 6 WARNING and 1 CRITICAL alarm(s) raised.
AFAIK, the average in the “lookup:” use average of collected dataset for the dimension of Primetime in the last 10 minutes (I setup update every 5 seconds). Then, I looked at the chart several times and I don’t know how the average of the CPU from the last 10 minutes could go above 290% (see screenshot).
Is that the false alarm (bug)? Or am I missing something?
This post can be closed since the fix was using “min” instead of “average”.