Alert Configuration Question

pleung2 · September 2, 2020, 3:59am

I used the following trying to capture an app (Spectre) running over 290 percents for the last 10 minutes. The email alert works fine if a job running CPU over 290% CONSTANTLY for 10 minutes or longer. However, I have another email alert with another job running CPU over 1200% for a little bit over 2.5 minutes. Mathematically, both email alerts seem working fine but the latter one isn’t the one I expect.

How do I configure to capture the app running over 290 percents for full 10 minutes. In other words, it triggers when all captured metrics for Spectre must be equal or greater than 290%. Using “average” in lookup does not work for me when there are spikes occurred.

Thanks!

alarm: apps_cpu_Spectre
on: apps.cpu
os: linux
hosts: *
lookup: average -10m unaligned of Spectre
unit: %
every: 1m
warn: $this > (($status >= $WARNING) ? (280) : (290))
crit: $this > (($status == $CRITICAL) ? (290) : (390))
delay: down 15m multiplier 1.5 max 1h
to: sysadmin

OdysLam · September 7, 2020, 1:23pm

Cool, thank you for letting us know!

pleung2 · September 4, 2020, 2:45pm

Update:

I just replaced “average” with “min” in the above example and the alert worked as expected.

Thanks!

pleung2 · September 3, 2020, 1:33am

Thank you for the reply. But how do I use “min”? AFAIK, if I have the line like the following:

lookup: min -10m unaligned of Spectre

a single value will be return to $this variable. Am I able to query/access the dataset from the last 10 minutes from that health configuration file and then post process the dataset?

Thanks.

zack · September 3, 2020, 12:59am

You may have to use a more complex expression

For example, instead of average, you might also want to use min.

Topic		Replies	Views
False Alarm? Help agent	1	660	September 4, 2020
cpu alarm question Help	3	361	November 18, 2022
View alert link in email ends in a warning via Netdata cloud Help cloud	6	430	November 16, 2022
Alarms web_log_1m_redirects CRITICAL & web_log_1m_successful CRITICAL Help agent	9	1649	October 2, 2023
execute a script when alert conditions are met Help	4	108	February 29, 2024

Alert Configuration Question

Related Topics