The OOM Killer (Out of Memory Killer) is a process that the Linux kernel uses when the system is critically low on memory or a process reached its memory limits. As the name suggests, it has the duty to review all running processes and kill one or more of them in order to free up memory and keep the system running.1
Linux Kernel 4.19 introduced cgroup awareness of OOM killer implementation which adds an ability to kill a cgroup as a single unit and to guarantee the integrity of the workload. In a nutshell, cgroups allow the limitation of memory, disk I/O, and network usage for a group of processes. Furthermore, cgroups may set usage quotas, and prioritize a process group to receive more CPU time or memory than other groups. You can see more about cgroups in the cgroup man pages
The Netdata Agent monitors the number of Out Of Memory (OOM) kills in the last 30 minutes. Receiving this alert indicates that some processes got killed by OOM Killer.
References and Sources
Troubleshoot issues in the OOM killer
The OOM Killer uses a heuristic system to choose a processes for termination. It is based on a score associated with each running application, which is calculated by
oom_badness() call inside Linux kernel 3
To identify which process/apps was killed from the OOM killer, inspect the logs:
root@netdata~ # dmesg -T | egrep -i 'killed process'
The system response looks similar to this:
Jan 7 07:12:33 mysql-server-01 kernel: Out of Memory: Killed process 3154 (mysqld).
To see the current
oom_score(the priority in which OOM killer will act upon your processes) run the following script. The script prints all running processes (by pid and name) with likelihood to be killed by the OOM killer (second column). The greater the
oom_score(second column) the more propably to be killed by OOM killer.
root@netdata~ # while read -r pid comm; do printf '%d\t%d\t%s\n' "$pid" "$(cat /proc/$pid/oom_score)" "$comm"; done < <(ps -e -o pid= -o comm=) | sort -k 2n
oom_scoreto protect processes using the
choomutil from the
root@netdata~ # choom -p PID -n number
Note: Setting an adjust score value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task’s allowed memory from being considered as scoring against the task.
Once the settings work to your case, make the change permanent. In the unit file of your service, under the [Service] section, add the following value:
Check the per-process RAM usage to find the top consumers
To see which processes are the main RAM consumers, use
top utility. The
%MEMcolumn displays RAM consumption in percent.
root@netdata~ # top -b -o +%MEM | head -n 22
Close any of the main consumer processes. Netdata strongly suggests knowing exactly what processes you are closing and being certain that they are not necessary.
Add a temporary swap file
Keep in mind this requires creating a swap file in one of the disks. Performance of your system may
Decide where your swapfile will live. It is strongly advised to allocate the swap file under in
the root directory. A swap file is like an extension of your RAM and it should be protected, far
from normal user accessible directories. Run the following command:
root@netdata # dd if=/dev/zero of=<path_in_root> bs=1024 count=<size_in_bytes>
Grant root only access to the swap file:
root@netdata # chmod 600 <path_to_the_swap_file_you_created>
Make it a Linux swap area:
root@netdata # mkswap <path_to_the_swap_file_you_created>
Enable the swap with the following command:
root@netdata # swapon <path_to_the_swap_file_you_created>
If you plan to use it a regular basis, you should update the
/etc/fstabconfig. The entry you
will add would look like:
/swap_file swap sw 0 0
For more information see the fstab manpage: