How to print the dimension name in the alert notification?



  • I would like to find out which application triggering the alert but I could not find a way to print the dimension out in the alert notification. The following is my cpu.conf:

    template: 1min_appcpu_usage
    on: apps.cpu
    os: linux
    hosts: *
    lookup: average -1m unaligned of *
    units: %
    every: 10s
    warn: $this > (($status >= $WARNING) ? (190) : (400))
    crit: $this > (($status == $CRITICAL) ? (401) : (3200))
    delay: down 15m multiplier 1.5 max 1h
    info: cpu utilization for the last minute
    to: sysadmin

    Thanks in advance.



  • I also tried dynamic alarm template setting but no luck. The netdata version is 1.23.2.



  • The dynamic alarm setting just does not work. See my following configuration:

    alarm: app_cpu
    on: apps.cpu
    #works but it does not provide dimension name in the alarm lookup: average -30s percentage of S*
    #not working. alarm has the dimension name but the status showed uninit and removed and no alarm lookup: average -30s percentage foreach S*
    #not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision
    #works but it does not provide dimension name in the alarm lookup: average -30s percentage Spectre,Simvision
    lookup: average -30s percentage foreach *
    unit: %
    every: 3s
    warn: $this > 1
    crit: $this > 110
    to: sysadmin

    Goal: I expect the dimension name would be somewhere in the alarm. It takes too much time to look into this dynamic alarm. For the time being, I just need to configure an alarm for each app which defined in apps_groups.conf



  • So basically the problem is that you get the alarm, but you don’t know which app triggers it because the app name doesn’t show up? I thought it would pop up under db lookup in the alarm, but I will have to double check it’s working as expected.



  • That’s correct. I don’t see the app name in both email and alarm in the web UI. For the time being, I just manually configured each of the apps defined in the apps_groups.conf. That works but troublesome.


  • Staff

    #not working. alarm has diemension names but the status showed uninit and removedand no alarm lookup: average -30s percentage foreach Spectre,Simvision

    This one should be working. In fact, “foreach” is the only use case where it makes sense to have the specific dimension that triggered the alarm inside the notification. No other alarm configuration has that ambiguity.

    So getting it uninitialized sounds like a bug and we also need to add something new, so that the dimension that does trigger the alarm appears in the notifications. (see https://learn.netdata.cloud/docs/agent/health/notifications/custom for the variables that are currently sent to the notifications script, I suggest we add the dimension name to ${info} and/or ${alarm}.


  • Staff

    Hi everybody,

    I tested your alarm with basically two changes in the lookup, the first change was;

    lookup: average -30s percentage foreach *

    And I had all alarms created according the API (localhost:19999/api/v1/alarms?all). I am copying few of them to reduce the size of my answer:

    “apps.cpu.1min_appcpu_usage_nfs”
    “apps.cpu.1min_appcpu_usage_httpd”
    “apps.cpu.1min_appcpu_usage_sql”
    “apps.cpu.1min_appcpu_usage_email”

    After this I changed the lookup line for:

    lookup: average -30s percentage foreach nfs, email

    And after to execute the request, I observed that I also had the expected dimensions.
    Finally I extended the test for 4 dimensions:

    lookup: average -30s percentage foreach nfs, email, ssh, kernel

    And again I got all alarms.

    I would like to call attention that if an dimension was not created, the alarm cannot be created.

    Considering the tests that I made on my Netdata, I have two questions for you:

    1 - Are Spectre,Simvision the exact name that you defined inside your apps_groups.conf?
    2 - Do you see these dimensions when you do the request http://localhost:19999/api/v1/data?chart=apps.cpu
    3 - How did you install your Netdata?

    Best regards!



  • answers for your questions:

    1. Yes
    2. Yes.
      {
      “labels”: [“time”, “netdata”, “email”, “wifi”, “logs”, “nfs”, “ssh”, “time”, “cron”, “X”, “ksmd”, “system”, “kernel”, “other”, “Simvision”, “Virtuoso”, “CFEngine”, “Spectre”, “Editors”],
      “data”:
      [
      [ 1597798483, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6.0001, 0, 0, 0, 284.0053, 0],
      [ 1597798482, 1.8902, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.8368, 0, 0, 0, 299.583, 0],
      [ 1597798481, 1.997, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 300.705, 0],
      [ 1597798480, 1.001, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.305, 0],
      [ 1597798479, 3.996, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 299.7063, 0],
    3. Installedfeom EPEL

    rpm -qa |grep netdata

    netdata-data-1.23.2-1.el7.noarch
    netdata-conf-1.23.2-1.el7.noarch
    netdata-1.23.2-1.el7.x86_64

    Checked the alarm via localhost:19999/api/v1/alarms?all
    {
    “hostname”: “XXXXXXXXXX”,
    “latest_alarm_log_unique_id”: 1597170798,
    “status”: true,
    “now”: 1597798827,
    “alarms”: {

    }
    

    }

    no data is found in the alarm. Again, my conf file is in the following:
    more APPS_CPU.conf
    alarm: appscpu1min
    on: apps.cpu
    os: linux
    hosts: *
    #lookup: average -10m percentage foreach Spectre
    #lookup: average -1m percentage foreach Spectre
    lookup: average -1m percentage foreach *
    unit: %

    every: 1m

    every: 10s
    warn: $this > (($status >= $WARNING) ? (180) : (190))
    crit: $this > (($status == $CRITICAL) ? (190) : (290))
    delay: down 15m multiplier 1.5 max 1h
    info: App CPU Usage above 190 as warning and 290 as critical for the last 10 minutes
    to: silent


  • Staff

    Thank you for the additional information, now I have a clear idea about what is happening.

    1. Installedfeom EPEL

    Netdata official packages are not installed from EPEL, we use another server https://packagecloud.io/netdata/netdata and you can take more information in our learning site https://learn.netdata.cloud/docs/agent/packaging/installer/methods/packages.

    About the error you are having, I remembered today that when the multihost was merged, we started to have problems with foreach alarms, but 8 days ago the PR https://github.com/netdata/netdata/pull/9712 was merged and the problem was fixed. If you install the nightly Netdata using RPMs or compiling Netdata using kickstart, you will have your alarms working as expected.

    We would like to apologize you and all our users for this problem.

    Best regards!

    stelfrag created this issue in netdata/netdata

    closed Fixed issue with missing alarms #9712



  • Thanks for the update!


Log in to reply