Alert Configuration Question

I used the following trying to capture an app (Spectre) running over 290 percents for the last 10 minutes. The email alert works fine if a job running CPU over 290% CONSTANTLY for 10 minutes or longer. However, I have another email alert with another job running CPU over 1200% for a little bit over 2.5 minutes. Mathematically, both email alerts seem working fine but the latter one isn’t the one I expect.

How do I configure to capture the app running over 290 percents for full 10 minutes. In other words, it triggers when all captured metrics for Spectre must be equal or greater than 290%. Using “average” in lookup does not work for me when there are spikes occurred.

Thanks!

alarm: apps_cpu_Spectre
on: apps.cpu
os: linux
hosts: *
lookup: average -10m unaligned of Spectre
unit: %
every: 1m
warn: $this > (($status >= $WARNING) ? (280) : (290))
crit: $this > (($status == $CRITICAL) ? (290) : (390))
delay: down 15m multiplier 1.5 max 1h
to: sysadmin

Cool, thank you for letting us know!

Update:

I just replaced “average” with “min” in the above example and the alert worked as expected.

Thanks!

Thank you for the reply. But how do I use “min”? AFAIK, if I have the line like the following:

lookup: min -10m unaligned of Spectre

a single value will be return to $this variable. Am I able to query/access the dataset from the last 10 minutes from that health configuration file and then post process the dataset?

Thanks.

You may have to use a more complex expression

For example, instead of average, you might also want to use min.