cpu alarm question

Juan_Jose_Garcia_Gon · November 10, 2022, 7:26pm

I have a file cpu.conf
after reviewing Configure health alarms | Learn Netdata and do tests…
alarm: cpu_template
on: system.cpu
lookup: average -10s percentage foreach system,user,nice
every: 10s
warn: $this > 60
crit: $this > 90
repeat: warning 7200s critical 3630s
to: sysadmin

The alarm goes off when it has not yet exceeded the % of the warning or the estimated time that I have configured for it. Example

Could it be that trying to control with seconds is a bit risky due to small variations? Do I have the misconfigured? Thank you very much community for your time.

ilyam8 · November 11, 2022, 9:19am

Hi, @Juan_Jose_Garcia_Gon.

The value is 68.2 and the warning threshold is 60 - no problems here.
The values on the chart are correct too. The default aggregation algorithm is average, the values you see are the average values over the last 10+ minutes. So the more you zoom out (bigger timeframe) more averaged values you get. Choose MAX instead of AVG (each as)and you will see 68.2.

ilyam8 · November 11, 2022, 9:23am

Because of the small lookup time and no delay option you can get flooded with notifications because of small variations in the value when it is varying regularly but staying close to the threshold value.

Consider using:

conditional operator in warn/crit
delay
bigger lookup timeframe

Juan_Jose_Garcia_Gon · November 18, 2022, 10:43pm

I haved that change the lookup, very thanks ilyam8!

Topic		Replies	Views
False Alarm? Help agent	1	700	September 4, 2020
Alert Configuration Question Help agent	4	688	September 7, 2020
Custom alarm is not working Help agent-health , agent	4	1116	May 26, 2021
Custom alarm only sending notification when event stays true for x time Help cloud	2	570	June 22, 2021
Alert status escalation Help agent , cloud , alerts , configuration , notifications	2	63	December 3, 2024

cpu alarm question

Related topics