Zookeeper custom alarm when request latency is to high

Hi,

I think that everything is in the title. I would like to create a custom alarm for my Zk nodes when the request latency is too high. Could somebody help me please (I’m sorry but I think that I still did not properly understood the alarm configuration language) ?

Best,
Jerome

Hi, there is an example, adjust trigger conditions to your needs.

If there are any questions about health alarm syntax check our docs.

template: zookeeper_requests_latency
      on: zookeeper.requests_latency
  lookup: average -1m unaligned of avg
   units: milliseconds
   every: 10s
    warn: $this < (($status >= $WARNING)  ? (150) : (200))
    crit: $this < (($status == $CRITICAL) ? (200) : (300))
   delay: up 1m down 15m multiplier 1.5 max 1h
    info: average requests latency for the last minute
      to: sysadmin

Thanks a lot !

You have a really well written documentation but it take time to really understand all the subtleties.

2 Likes

Hey @jrevillard !

We know that alarm configuration is challenging and we are working hard to improve on that front.

Thanks for the kind words on our docs (kudos to @joel for his phenomenal work).

Side note: Zookeeper restart brake the netada plugin · Issue #10753 · netdata/netdata · GitHub

Best,
Jerome

I tried to setup the alarm . I put it in the health.d directory but I do not see it in the dashboard… how to debug please ?

[root@zk3 [RCC] netdata-configs]# ls -al health.d/
total 8
drwxr-xr-x.  2 netdata netdata   41 Mar 11 15:48 .
drwxr-xr-x. 10 netdata netdata 4096 Mar 11 15:03 ..
-rw-r--r--.  1 root    root     391 Mar 11 15:48 zookeeper_custom_alarm.conf
[root@zk3 [RCC] netdata-configs]# cat health.d/zookeeper_custom_alarm.conf
template: zookeeper_requests_latency
      on: zookeeper.requests_latency
  lookup: average -1m unaligned of avg
   units: milliseconds
   every: 10s
    warn: $this < (($status >= $WARNING)  ? (150) : (200))
    crit: $this < (($status == $CRITICAL) ? (200) : (300))
   delay: up 1m down 15m multiplier 1.5 max 1h
    info: average requests latency for the last minute
      to: webmaster

PS: I setup it on the netdata agents and looking at the master Netdata dashboard

btw it should be >, not <. If greater then …

  warn: $this > (($status >= $WARNING)  ? (150) : (200))
  crit: $this > (($status == $CRITICAL) ? (200) : (300))

It works


I setup it on the netdata agents and looking at the master Netdata dashboard.

Could be the case, put it on the master and restart it.

Ok it’s not working in the master too.

Should I find something in the logs somewhere please ?

[root@zk3 [RCC] netdata-configs]# ls -al health.d/

netdata-configs

What is that directory? Do you run netdata in a docker container?

Nop, installed from the install script:

[root@zk3 [RCC] netdata]# pwd
/opt/netdata
[root@zk3 [RCC] netdata]# ls -al
total 0
drwxrwxr-x. 10 netdata netdata 248 Mar 11 17:30 .
drwxr-xr-x.  4 root    root     35 Feb 16 08:56 ..
drwxrwxr-x.  3 netdata netdata 145 Feb  9 12:31 bin
drwxrwxr-x.  3 netdata netdata  32 Feb  9 12:31 etc
drwxr-xr-x.  3 netdata netdata  18 Feb  9 12:27 include
drwxr-xr-x.  3 netdata netdata  58 Feb  9 12:27 lib
lrwxrwxrwx.  1 netdata netdata  11 Mar 11 17:30 netdata-configs -> etc/netdata
lrwxrwxrwx.  1 netdata netdata  15 Mar 11 17:30 netdata-dbs -> var/lib/netdata
lrwxrwxrwx.  1 netdata netdata  15 Mar 11 17:30 netdata-logs -> var/log/netdata
lrwxrwxrwx.  1 netdata netdata  17 Mar 11 17:30 netdata-metrics -> var/cache/netdata
lrwxrwxrwx.  1 netdata netdata  19 Mar 11 17:30 netdata-plugins -> usr/libexec/netdata
lrwxrwxrwx.  1 netdata netdata  21 Mar 11 17:30 netdata-web-files -> usr/share/netdata/web
lrwxrwxrwx.  1 netdata netdata   3 Mar 11 17:30 sbin -> bin
drwxrwxr-x.  7 netdata netdata  66 Feb  9 12:31 share
drwxrwxr-x.  2 netdata netdata 216 Feb  9 12:31 system
drwxrwxr-x.  5 netdata netdata  81 Mar 11 17:30 usr
drwxrwxr-x.  6 netdata netdata  52 Feb  9 12:31 var

What script? :thinking: