Netdata Community

How to print the dimension name in the alert notification?

I would like to find out which application triggering the alert but I could not find a way to print the dimension out in the alert notification. The following is my cpu.conf:

template: 1min_appcpu_usage
on: apps.cpu
os: linux
hosts: *
lookup: average -1m unaligned of *
units: %
every: 10s
warn: $this > (($status >= $WARNING) ? (190) : (400))
crit: $this > (($status == $CRITICAL) ? (401) : (3200))
delay: down 15m multiplier 1.5 max 1h
info: cpu utilization for the last minute
to: sysadmin

Thanks in advance.

Thanks for the update!

Thank you for the additional information, now I have a clear idea about what is happening.

  1. Installedfeom EPEL

Netdata official packages are not installed from EPEL, we use another server https://packagecloud.io/netdata/netdata and you can take more information in our learning site https://learn.netdata.cloud/docs/agent/packaging/installer/methods/packages.

About the error you are having, I remembered today that when the multihost was merged, we started to have problems with foreach alarms, but 8 days ago the PR https://github.com/netdata/netdata/pull/9712 was merged and the problem was fixed. If you install the nightly Netdata using RPMs or compiling Netdata using kickstart, you will have your alarms working as expected.

We would like to apologize you and all our users for this problem.

Best regards!

answers for your questions:

  1. Yes
  2. Yes.
    {
    “labels”: [“time”, “netdata”, “email”, “wifi”, “logs”, “nfs”, “ssh”, “time”, “cron”, “X”, “ksmd”, “system”, “kernel”, “other”, “Simvision”, “Virtuoso”, “CFEngine”, “Spectre”, “Editors”],
    “data”:
    [
    [ 1597798483, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6.0001, 0, 0, 0, 284.0053, 0],
    [ 1597798482, 1.8902, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.8368, 0, 0, 0, 299.583, 0],
    [ 1597798481, 1.997, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 300.705, 0],
    [ 1597798480, 1.001, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.305, 0],
    [ 1597798479, 3.996, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.7063, 0],
  3. Installedfeom EPEL

rpm -qa |grep netdata

netdata-data-1.23.2-1.el7.noarch
netdata-conf-1.23.2-1.el7.noarch
netdata-1.23.2-1.el7.x86_64

Checked the alarm via localhost:19999/api/v1/alarms?all
{
“hostname”: “XXXXXXXXXX”,
“latest_alarm_log_unique_id”: 1597170798,
“status”: true,
“now”: 1597798827,
“alarms”: {

}

}

no data is found in the alarm. Again, my conf file is in the following:
more APPS_CPU.conf
alarm: appscpu1min
on: apps.cpu
os: linux
hosts: *
#lookup: average -10m percentage foreach Spectre
#lookup: average -1m percentage foreach Spectre
lookup: average -1m percentage foreach *
unit: %

every: 1m

every: 10s
warn: $this > (($status >= $WARNING) ? (180) : (190))
crit: $this > (($status == $CRITICAL) ? (190) : (290))
delay: down 15m multiplier 1.5 max 1h
info: App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes
to: silent

Hi everybody,

I tested your alarm with basically two changes in the lookup, the first change was;

lookup: average -30s percentage foreach *

And I had all alarms created according the API (localhost:19999/api/v1/alarms?all). I am copying few of them to reduce the size of my answer:

“apps.cpu.1min_appcpu_usage_nfs”
“apps.cpu.1min_appcpu_usage_httpd”
“apps.cpu.1min_appcpu_usage_sql”
“apps.cpu.1min_appcpu_usage_email”

After this I changed the lookup line for:

lookup: average -30s percentage foreach nfs, email

And after to execute the request, I observed that I also had the expected dimensions.
Finally I extended the test for 4 dimensions:

lookup: average -30s percentage foreach nfs, email, ssh, kernel

And again I got all alarms.

I would like to call attention that if an dimension was not created, the alarm cannot be created.

Considering the tests that I made on my Netdata, I have two questions for you:

1 - Are Spectre,Simvision the exact name that you defined inside your apps_groups.conf?
2 - Do you see these dimensions when you do the request http://localhost:19999/api/v1/data?chart=apps.cpu
3 - How did you install your Netdata?

Best regards!

#not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision

This one should be working. In fact, “foreach” is the only use case where it makes sense to have the specific dimension that triggered the alarm inside the notification. No other alarm configuration has that ambiguity.

So getting it uninitialized sounds like a bug and we also need to add something new, so that the dimension that does trigger the alarm appears in the notifications. (see https://learn.netdata.cloud/docs/agent/health/notifications/custom for the variables that are currently sent to the notifications script, I suggest we add the dimension name to ${info} and/or ${alarm}.

That’s correct. I don’t see the app name in both email and alarm in the web UI. For the time being, I just manually configured each of the apps defined in the apps_groups.conf. That works but troublesome.

So basically the problem is that you get the alarm, but you don’t know which app triggers it because the app name doesn’t show up? I thought it would pop up under db lookup in the alarm, but I will have to double check it’s working as expected.

The dynamic alarm setting just does not work. See my following configuration:

alarm: app_cpu
on: apps.cpu
#works but it does not provide dimension name in the alarm lookup: average -30s percentage of S*
#not working. alarm has the dimension name but the status showed uninit and removed and no alarm lookup: average -30s percentage foreach S*
#not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision
#works but it does not provide dimension name in the alarm lookup: average -30s percentage Spectre,Simvision
lookup: average -30s percentage foreach *
unit: %
every: 3s
warn: $this > 1
crit: $this > 110
to: sysadmin

Goal: I expect the dimension name would be somewhere in the alarm. It takes too much time to look into this dynamic alarm. For the time being, I just need to configure an alarm for each app which defined in apps_groups.conf

I also tried dynamic alarm template setting but no luck. The netdata version is 1.23.2.