Zscore on health alarm calc

Hi all, right now I have a typical alarm that notifies me when a threshold has been reached. Unfortunately it’s not very useful given how much the data fluctuates. I’d like to modify the alarm so that I evaluate the zscore.

Question:
What functions are available to the calc parameter? Looking at the docs it just mentions a range of expressions - but nothing like avg() or stddev() (for example).

I do see the zscore plugin - but I was hoping to do this within the alarm.

Hmm, I know we have stdev and CV which are almost a version of a zscore.

I wonder if we could actually use stddev and mean to derive the zscore as part of the alarm config.

Agree though that just having a zscore would be nice as is a more standard thing users would be familiar with.

So I wonder if something like ($this - mean) / stddev is possible

Hi Andrew, thanks for the reply - and sorry for the delay.

I’ve tried this in my alarm calc

# cat test.conf
alarm: system_active_processes
      on: system.active_processes
    calc: ($this - mean) / stddev
   every: 60s
    warn: $this > 1
      to: sysadmin

But I get these errors:

2021-12-02 12:26:27: netdata ERROR : MAIN : Health configuration at line 9 of file '/etc/netdata/conf.d/health.d/test.conf' for alarm 'system_active_processes' at key 'calc' has unparse-able expression '($this - mean) / stddev': remaining characters after expression at 'mean) / stddev'

Are these the only expressions available for calc?

Hi @mjtice i’ve had a play around and asked internally and below is an example of how to make a zscore based alarm.

 alarm: cpu_user_mean
    on: system.cpu
lookup: mean -60s of user
 every: 10s

 alarm: cpu_user_stddev
    on: system.cpu
lookup: stddev -60s of user
 every: 10s

 alarm: cpu_user_zscore
    on: system.cpu
lookup: mean -10s of user
  calc: ($this - $cpu_user_mean) / $cpu_user_stddev
 every: 10s
  warn: $this < -2 or $this > 2
  crit: $this < -3 or $this > 3

Here you can see i make an alarm for mean and sttdev and then reference them in my zscore alarm. I then make a warning if abs(zscore) > 2 or critical if abs(zscore) > 3.

I think you could play around with this approach to get it working for your use case.

I think would be best if we had just a zscore function you could use so i’ll make a feature request for this.

i made a feature request here: https://community.netdata.cloud/t/zscore-in-netdata-agent-health/2274

anyone interested please upvote.

1 Like

made a pr here to add as an example in our docs: add z score alarm example by andrewm4894 · Pull Request #11871 · netdata/netdata · GitHub

Very nice! Thanks, @andrewm4894

1 Like