Zookeeper custom alarm when request latency is to high

jrevillard · March 10, 2021, 10:45pm

Hi,

I think that everything is in the title. I would like to create a custom alarm for my Zk nodes when the request latency is too high. Could somebody help me please (I’m sorry but I think that I still did not properly understood the alarm configuration language) ?

Best,
Jerome

ilyam8 · March 11, 2021, 11:58am

Hi, there is an example, adjust trigger conditions to your needs.

If there are any questions about health alarm syntax check our docs.

template: zookeeper_requests_latency
      on: zookeeper.requests_latency
  lookup: average -1m unaligned of avg
   units: milliseconds
   every: 10s
    warn: $this < (($status >= $WARNING)  ? (150) : (200))
    crit: $this < (($status == $CRITICAL) ? (200) : (300))
   delay: up 1m down 15m multiplier 1.5 max 1h
    info: average requests latency for the last minute
      to: sysadmin

jrevillard · March 11, 2021, 1:32pm

Thanks a lot !

You have a really well written documentation but it take time to really understand all the subtleties.

OdysLam · March 11, 2021, 1:42pm

Hey @jrevillard !

We know that alarm configuration is challenging and we are working hard to improve on that front.

Thanks for the kind words on our docs (kudos to @joel for his phenomenal work).

jrevillard · March 11, 2021, 2:09pm

Side note: Zookeeper restart brake the netada plugin · Issue #10753 · netdata/netdata · GitHub

Best,
Jerome

jrevillard · March 11, 2021, 3:51pm

I tried to setup the alarm . I put it in the health.d directory but I do not see it in the dashboard… how to debug please ?

[root@zk3 [RCC] netdata-configs]# ls -al health.d/
total 8
drwxr-xr-x.  2 netdata netdata   41 Mar 11 15:48 .
drwxr-xr-x. 10 netdata netdata 4096 Mar 11 15:03 ..
-rw-r--r--.  1 root    root     391 Mar 11 15:48 zookeeper_custom_alarm.conf
[root@zk3 [RCC] netdata-configs]# cat health.d/zookeeper_custom_alarm.conf
template: zookeeper_requests_latency
      on: zookeeper.requests_latency
  lookup: average -1m unaligned of avg
   units: milliseconds
   every: 10s
    warn: $this < (($status >= $WARNING)  ? (150) : (200))
    crit: $this < (($status == $CRITICAL) ? (200) : (300))
   delay: up 1m down 15m multiplier 1.5 max 1h
    info: average requests latency for the last minute
      to: webmaster

PS: I setup it on the netdata agents and looking at the master Netdata dashboard

ilyam8 · March 11, 2021, 4:19pm

btw it should be >, not <. If greater then …

  warn: $this > (($status >= $WARNING)  ? (150) : (200))
  crit: $this > (($status == $CRITICAL) ? (200) : (300))

It works

I setup it on the netdata agents and looking at the master Netdata dashboard.

Could be the case, put it on the master and restart it.

jrevillard · March 11, 2021, 5:42pm

Ok it’s not working in the master too.

Should I find something in the logs somewhere please ?

ilyam8 · March 11, 2021, 7:20pm

[root@zk3 [RCC] netdata-configs]# ls -al health.d/

netdata-configs

What is that directory? Do you run netdata in a docker container?

jrevillard · March 11, 2021, 7:35pm

Nop, installed from the install script:

[root@zk3 [RCC] netdata]# pwd
/opt/netdata
[root@zk3 [RCC] netdata]# ls -al
total 0
drwxrwxr-x. 10 netdata netdata 248 Mar 11 17:30 .
drwxr-xr-x.  4 root    root     35 Feb 16 08:56 ..
drwxrwxr-x.  3 netdata netdata 145 Feb  9 12:31 bin
drwxrwxr-x.  3 netdata netdata  32 Feb  9 12:31 etc
drwxr-xr-x.  3 netdata netdata  18 Feb  9 12:27 include
drwxr-xr-x.  3 netdata netdata  58 Feb  9 12:27 lib
lrwxrwxrwx.  1 netdata netdata  11 Mar 11 17:30 netdata-configs -> etc/netdata
lrwxrwxrwx.  1 netdata netdata  15 Mar 11 17:30 netdata-dbs -> var/lib/netdata
lrwxrwxrwx.  1 netdata netdata  15 Mar 11 17:30 netdata-logs -> var/log/netdata
lrwxrwxrwx.  1 netdata netdata  17 Mar 11 17:30 netdata-metrics -> var/cache/netdata
lrwxrwxrwx.  1 netdata netdata  19 Mar 11 17:30 netdata-plugins -> usr/libexec/netdata
lrwxrwxrwx.  1 netdata netdata  21 Mar 11 17:30 netdata-web-files -> usr/share/netdata/web
lrwxrwxrwx.  1 netdata netdata   3 Mar 11 17:30 sbin -> bin
drwxrwxr-x.  7 netdata netdata  66 Feb  9 12:31 share
drwxrwxr-x.  2 netdata netdata 216 Feb  9 12:31 system
drwxrwxr-x.  5 netdata netdata  81 Mar 11 17:30 usr
drwxrwxr-x.  6 netdata netdata  52 Feb  9 12:31 var

ilyam8 · March 11, 2021, 8:37pm

What script?

jrevillard · March 11, 2021, 9:26pm

Topic		Replies	Views
Different alarm settings for different web_log conf Help agent-health , agent	5	1261	April 29, 2021
Custom alarm is not working Help agent-health , agent	4	1112	May 26, 2021
alarms: a comprehension problem Help cloud	5	517	March 23, 2023
Netdata cloud / agent notifications question Help cloud	5	720	May 7, 2021
Customize alarm_id Help agent-configuration , agent	5	921	September 21, 2021

Zookeeper custom alarm when request latency is to high

Related topics