Hi there,
we want to fully disable the cgroup_ram_in_use alert for promtail running through Docker Swarm.
How do we achieve this? We tried multiple attempts but couldn’t figure out a way that works.
Here is the most recent config we tried:
/etc/netdata/health.d/cgroup_mem.conf
template: cgroup_ram_in_use
on: cgroup.mem_usage
class: Utilization
type: Cgroups
component: Memory
host labels: _os=linux
chart labels: cgroup_name=!*promtail* *
calc: ($ram) * 100 / $memory_limit
units: %
every: 10s
warn: $this > (($status >= $WARNING) ? (80) : (90))
crit: $this > (($status == $CRITICAL) ? (90) : (98))
delay: down 15m multiplier 1.50 max 1h
summary: Cgroup ${label:cgroup_name} memory utilization
info: Cgroup ${label:cgroup_name} memory utilization
to: silent
car12o
2
Hi @Slind14,
If you want to disable it for all the nodes you can just comment out the all block.
Another option is to filter by host labels.
Disable on all OS’s AKA completely disable:
host labels: !_os=*
Disable on a specific node:
host labels: !_hostname=my-node-hostname
Enable only on a specific node:
host labels: _hostname=my-node-hostname
Slind14
3
We want to disable it for a specific cgroup name. See the code above. This isn’t working though. It only works when applied through the UI.
car12o
4
Sorry, didn’t catch it right at first.
I tried myself and I was able to filter the alert out.
Initially alert is running.
Then I updated the config to exclude by chart label.
Found what’s the correct label to filter out by checking it on my node’s single view.
Config:
template: cgroup_ram_in_use
on: cgroup.mem_usage
class: Utilization
type: Cgroups
component: Memory
host labels: _os=linux
chart labels: cgroup_name=!*data_master* *
calc: ($ram) * 100 / $memory_limit
units: %
every: 10s
warn: $this > (($status >= $WARNING) ? (80) : (90))
crit: $this > (($status == $CRITICAL) ? (90) : (98))
summary: Cgroup ${label:cgroup_name} memory utilization
info: Cgroup ${label:cgroup_name} memory utilization
to: sysadmin
After config change, I restarted the agent and the alert is gone.
Bear in mind that on Cloud the removed alert changes take a while to be propagated (max 10min).
If after this expiation you are still having issues, let me know which agent version are you running.
Slind14
5
Where did you place the file with the single alert override?
car12o
6
Placed in:
/etc/netdata/health.d/cgroups.conf
Slind14
7
When placed there it overrides the other cgroup alerts or not?
car12o
8
No, it just overrides the alerts matching the template/alarm
key.
1 Like