How to print the dimension name in the alert notification?

I would like to find out which application triggering the alert but I could not find a way to print the dimension out in the alert notification. The following is my cpu.conf:

template: 1min_appcpu_usage
on: apps.cpu
os: linux
hosts: *
lookup: average -1m unaligned of *
units: %
every: 10s
warn: $this > (($status >= $WARNING) ? (190) : (400))
crit: $this > (($status == $CRITICAL) ? (401) : (3200))
delay: down 15m multiplier 1.5 max 1h
info: cpu utilization for the last minute
to: sysadmin

Thanks in advance.

Thanks for the update!

Thank you for the additional information, now I have a clear idea about what is happening.

  1. Installedfeom EPEL

Netdata official packages are not installed from EPEL, we use another server netdata/netdata - Packages · packagecloud and you can take more information in our learning site Install Netdata with kickstart.sh | Learn Netdata.

About the error you are having, I remembered today that when the multihost was merged, we started to have problems with foreach alarms, but 8 days ago the PR Fixed issue with missing alarms by stelfrag · Pull Request #9712 · netdata/netdata · GitHub was merged and the problem was fixed. If you install the nightly Netdata using RPMs or compiling Netdata using kickstart, you will have your alarms working as expected.

We would like to apologize you and all our users for this problem.

Best regards!

answers for your questions:

  1. Yes
  2. Yes.
    {
    “labels”: [“time”, “netdata”, “email”, “wifi”, “logs”, “nfs”, “ssh”, “time”, “cron”, “X”, “ksmd”, “system”, “kernel”, “other”, “Simvision”, “Virtuoso”, “CFEngine”, “Spectre”, “Editors”],
    “data”:
    [
    [ 1597798483, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6.0001, 0, 0, 0, 284.0053, 0],
    [ 1597798482, 1.8902, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.8368, 0, 0, 0, 299.583, 0],
    [ 1597798481, 1.997, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 300.705, 0],
    [ 1597798480, 1.001, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.305, 0],
    [ 1597798479, 3.996, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.7063, 0],
  3. Installedfeom EPEL

rpm -qa |grep netdata

netdata-data-1.23.2-1.el7.noarch
netdata-conf-1.23.2-1.el7.noarch
netdata-1.23.2-1.el7.x86_64

Checked the alarm via localhost:19999/api/v1/alarms?all
{
“hostname”: “XXXXXXXXXX”,
“latest_alarm_log_unique_id”: 1597170798,
“status”: true,
“now”: 1597798827,
“alarms”: {

}

}

no data is found in the alarm. Again, my conf file is in the following:
more APPS_CPU.conf
alarm: appscpu1min
on: apps.cpu
os: linux
hosts: *
#lookup: average -10m percentage foreach Spectre
#lookup: average -1m percentage foreach Spectre
lookup: average -1m percentage foreach *
unit: %

every: 1m

every: 10s
warn: $this > (($status >= $WARNING) ? (180) : (190))
crit: $this > (($status == $CRITICAL) ? (190) : (290))
delay: down 15m multiplier 1.5 max 1h
info: App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes
to: silent

Hi everybody,

I tested your alarm with basically two changes in the lookup, the first change was;

lookup: average -30s percentage foreach *

And I had all alarms created according the API (localhost:19999/api/v1/alarms?all). I am copying few of them to reduce the size of my answer:

“apps.cpu.1min_appcpu_usage_nfs”
“apps.cpu.1min_appcpu_usage_httpd”
“apps.cpu.1min_appcpu_usage_sql”
“apps.cpu.1min_appcpu_usage_email”

After this I changed the lookup line for:

lookup: average -30s percentage foreach nfs, email

And after to execute the request, I observed that I also had the expected dimensions.
Finally I extended the test for 4 dimensions:

lookup: average -30s percentage foreach nfs, email, ssh, kernel

And again I got all alarms.

I would like to call attention that if an dimension was not created, the alarm cannot be created.

Considering the tests that I made on my Netdata, I have two questions for you:

1 - Are Spectre,Simvision the exact name that you defined inside your apps_groups.conf?
2 - Do you see these dimensions when you do the request http://localhost:19999/api/v1/data?chart=apps.cpu
3 - How did you install your Netdata?

Best regards!

#not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision

This one should be working. In fact, “foreach” is the only use case where it makes sense to have the specific dimension that triggered the alarm inside the notification. No other alarm configuration has that ambiguity.

So getting it uninitialized sounds like a bug and we also need to add something new, so that the dimension that does trigger the alarm appears in the notifications. (see Custom | Learn Netdata for the variables that are currently sent to the notifications script, I suggest we add the dimension name to ${info} and/or ${alarm}.

That’s correct. I don’t see the app name in both email and alarm in the web UI. For the time being, I just manually configured each of the apps defined in the apps_groups.conf. That works but troublesome.

So basically the problem is that you get the alarm, but you don’t know which app triggers it because the app name doesn’t show up? I thought it would pop up under db lookup in the alarm, but I will have to double check it’s working as expected.

The dynamic alarm setting just does not work. See my following configuration:

alarm: app_cpu
on: apps.cpu
#works but it does not provide dimension name in the alarm lookup: average -30s percentage of S*
#not working. alarm has the dimension name but the status showed uninit and removed and no alarm lookup: average -30s percentage foreach S*
#not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision
#works but it does not provide dimension name in the alarm lookup: average -30s percentage Spectre,Simvision
lookup: average -30s percentage foreach *
unit: %
every: 3s
warn: $this > 1
crit: $this > 110
to: sysadmin

Goal: I expect the dimension name would be somewhere in the alarm. It takes too much time to look into this dynamic alarm. For the time being, I just need to configure an alarm for each app which defined in apps_groups.conf

I also tried dynamic alarm template setting but no luck. The netdata version is 1.23.2.

It seems to me it’s still not possible to show which dimension triggered the alarm in 1.43.2. Has anyone found a way to do it?

Hi @boutetnico

Could you share please an example of what you would like to have?

Hi @Manolis_Vasilakis, sure!

We monitor unhealthy target groups in our AWS Application Load Balancers. Target groups are dynamic, they come and go, they are not a static list. Each target group status is a dimension in a chart, see below.

We have set an alert rule using foreach *that iterates on the dynamic list of target groups, to check if any group has any unhealthy targets. See below.

template: alb_unhealthy_targets
on: alb.state_unhealthy
class: Utilization
type: System
component: AWS
os: linux
hosts: *
lookup: min -10s foreach *
every: 10s
warn: $this > 0
crit: $this > 0
summary: ALB unhealthy targets
info: ALB unhealthy targets
to: sysadmin

The alert is configured to send a Slack notification. It works, however the message received on Slack does not contain the name of the dimension that triggered the alert, see below.

How to include the dimension name that triggered the alert in the Slack notification?