Hi,
I think there’s an error in the rules example for alerting about high cpu usage. The documentation says:
alert: node_high_cpu_usage_70
expr: avg(rate(netdata_cpu_cpu_percentage_average{dimension=“idle”}[1m])) by (job) > 70
for: 1m
annotations:
description: ‘{{ $labels.job }} on ‘’{{ $labels.job }}’’ CPU usage is at {{ humanize $value }}%.’
summary: CPU alert for container node ‘{{ $labels.job }}’
but this expression doesn’t return a good value in my opinion. Are you sure about it ? it seems very low for a cpu usage percentage, don’t you think ? However I can’t find the good expression, so if you can help with the correct one…
# sources: as-collected | raw | average | sum | volume
# default is: average
#source: [as-collected]
I see it is not used in Install prometheus.yml, but that CPU rule expression expects it (there is rate function). Apart from it the selector supposed to be !="idle".
sum(sum_over_time(netdata_system_cpu_percentage_average{dimension=~"(user|system|softirq|irq|guest)"}[10m])) by (job) / sum(count_over_time(netdata_system_cpu_percentage_average{dimension="idle"}[10m])) by (job)