Disable cgroup_ram_in_use alerts for specific service

Hi there,

we want to fully disable the cgroup_ram_in_use alert for promtail running through Docker Swarm.

How do we achieve this? We tried multiple attempts but couldn’t figure out a way that works.

Here is the most recent config we tried:

/etc/netdata/health.d/cgroup_mem.conf

template: cgroup_ram_in_use
      on: cgroup.mem_usage
   class: Utilization
    type: Cgroups
component: Memory
host labels: _os=linux
chart labels: cgroup_name=!*promtail* *
    calc: ($ram) * 100 / $memory_limit
   units: %
   every: 10s
    warn: $this > (($status >= $WARNING) ? (80) : (90))
    crit: $this > (($status == $CRITICAL) ? (90) : (98))
   delay: down 15m multiplier 1.50 max 1h
 summary: Cgroup ${label:cgroup_name} memory utilization
    info: Cgroup ${label:cgroup_name} memory utilization
      to: silent

Hi @Slind14,
If you want to disable it for all the nodes you can just comment out the all block.
Another option is to filter by host labels.

Disable on all OS’s AKA completely disable:

host labels: !_os=*

Disable on a specific node:

host labels: !_hostname=my-node-hostname

Enable only on a specific node:

host labels: _hostname=my-node-hostname

We want to disable it for a specific cgroup name. See the code above. This isn’t working though. It only works when applied through the UI.

Sorry, didn’t catch it right at first.

I tried myself and I was able to filter the alert out.

Initially alert is running.

Then I updated the config to exclude by chart label.
Found what’s the correct label to filter out by checking it on my node’s single view.

Config:

    template: cgroup_ram_in_use
          on: cgroup.mem_usage
       class: Utilization
        type: Cgroups
   component: Memory
 host labels: _os=linux
chart labels: cgroup_name=!*data_master* *
        calc: ($ram) * 100 / $memory_limit
       units: %
       every: 10s
        warn: $this > (($status >= $WARNING)  ? (80) : (90))
        crit: $this > (($status == $CRITICAL) ? (90) : (98))
     summary: Cgroup ${label:cgroup_name} memory utilization
        info: Cgroup ${label:cgroup_name} memory utilization
          to: sysadmin

After config change, I restarted the agent and the alert is gone.

Bear in mind that on Cloud the removed alert changes take a while to be propagated (max 10min).

If after this expiation you are still having issues, let me know which agent version are you running.

Where did you place the file with the single alert override?

Placed in:

/etc/netdata/health.d/cgroups.conf

When placed there it overrides the other cgroup alerts or not?

No, it just overrides the alerts matching the template/alarm key.

1 Like