OOM Kills happening on host are shown in Netdata running inside an LXC container

Problem/Question

OOM Kills are reported by Netdata (which is installed inside the cotnainer) even when the system (an LXC container) has lot of RAM available i.e. the container never reached the RAM limit.

We believe the Netadata inside the container is actually showing the OOM Kills happening on the host.

Relevant docs you followed/actions you took to solve the issue

We checked the lxc container RAM usage using htop. The RAM usage is very low and the container will never run out of RAM as the only thing running is Uptime Kuma (a lightweight website uptime monitor)

What I expected to happen

The OOM kills should not be shown as they are not happening inside the container.

Hi, @philip. Netdata gets the number of OOM kills from /proc/vmstat. The same true for other metrics we get from procfs (there are a lot). I am not sure how this can be fixed for LXC containers.

Have you considered installing Netdata on the host system and monitoring VMs/containers with cgroups.plugin? It gathers CPU, memory, disk, and network statistics for every VM/container.

@ilyam8

Yes, we already do that.

From within we want to monitor the LXC container, so Netdata was installed but it showed OOM kills and we wondered where those kills are coming from.

As I mentioned, the system metrics come from reading procfs. I see that you get the host metrics reading procfs from inside a container (checked on a Proxmox server), and it is not the case for qemu/kvm VMs. I think this is by design (shared kernel) and not a bug. Netdata reports whatever kernel reports (procfs).

@ilyam8 Yes, understood.

Thanks for your assistance.

Hello

@ilyam8

I just wanted to let you guys know that there’s a change in LXD that added a new metric for OOM kills. Maybe we thought it would be helpful for you guys.

Thanks, @philip. Netdata gathers containers/VMs metrics reading the /sys/fs/cgroup/* directory (see cgroups.plugin). It is easy to add oom_kills metric/chart, but it is not really useful from my experience - if OOM killer kills the main process (which is likely) in the container that counter doesn’t get incremented.