Need some help with cgroups configuration for qemu VMs

ovi · April 12, 2021, 11:22am

Hi there,

I have installed netdata on a host where I run plenty of docker containers and out of the box I can see stats for each container:

ovi · April 12, 2021, 11:23am

As a new user I can only add one image per post hence my reply to my own topic.

Next I installed netdata on a host where I run KVMs (using Proxmox qhich is based on qemu) but here I don’t see the stats per KVM but rather alltogether:

I found the instructions a bit beyond my understanding, maybe someone can help me configure netdata so that the resources used are grouped by VM?
=> cgroups.plugin | Learn Netdata

vlvkobal · April 12, 2021, 11:39am

I don’t know if it will help but try to tweak

[plugin:cgroups]
	enable by default cgroups matching =  !*/init.scope  !/system.slice/run-*.scope  *.scope  /machine.slice/*.service  /kubepods/pod*/*  /kubepods/*/pod*/*  !/kubepods*  !*/vcpu*  !*/emulator  !*.mount  !*.partition  !*.service  !*.socket  !*.slice  !*.swap  !*.user  !/  !/docker  !/libvirt  !/lxc  !/lxc/*/*  !/lxc.monitor*  !/lxc.pivot  !/lxc.payload  !/machine  !/qemu  !/system  !/systemd  !/user  * 
	search for cgroups in subpaths matching =  !*/init.scope  !*-qemu  !*.libvirt-qemu  !/init.scope  !/system  !/systemd  !/user  !/user.slice  !/lxc/*/*  !/lxc.monitor  !/lxc.payload/*/*  !/lxc.payload.*  *

in netdata.conf

OdysLam · April 12, 2021, 11:41am

Thanks @vlvkobal, this is the answer. Try to remove the !*-qemu bit in the configuration and then restart Netdata.

ilyam8 · April 12, 2021, 12:31pm

but here I don’t see the stats per KVM but rather alltogether

@vlvkobal your fix filters out kvm all qemu stats, doesn’t it?

I found the issue QEMU metrics incorrect and crashing dashboard · Issue #9254 · netdata/netdata · GitHub

ovi · April 12, 2021, 12:35pm

Thanks everyone. I removed the “!*-qemu” bit in both lines (enable by default cgroups matching = and search for cgroups in subpaths matching =) and restarted netdata.

I am not accessing netdata locally - only through netdata cloud. A few refreshes later I still see:

vlvkobal · April 12, 2021, 12:42pm

That isn’t a fix, it’s just our default config. I don’t know what is the structure of containers in QEMU, that’s why I suggested tweaking the configuration parameters, rather than providing a more specific suggestion.

ovi · April 12, 2021, 12:45pm

Thanks for your pointer, I understand. I had hoped someone would be using qemu and netdata and might be able to provide more help.

Lets give it a few days, maybe someone will spot this thread

ilyam8 · April 12, 2021, 12:52pm

That is cgroup directory structure on my server

[ilyam@pc ~]$ tree -d -L 5 /sys/fs/cgroup/cpu/
/sys/fs/cgroup/cpu/
├── dev-hugepages.mount
├── dev-mqueue.mount
├── docker
│   └── d9484ad246a254c96d52f9e3a0b414a80026bf5a17c560cc648d640cd9e98793
├── init.scope
├── -.mount
├── proc-sys-fs-binfmt_misc.mount
├── sys-fs-fuse-connections.mount
├── sys-kernel-config.mount
├── sys-kernel-debug.mount
├── sys-kernel-tracing.mount
├── system.slice
│   ├── apparmor.service
│   ├── avahi-daemon.service
│   ├── avahi-daemon.socket
│   ├── boot-efi.mount
│   ├── cronie.service
│   ├── dbus.service
│   ├── dbus.socket
│   ├── dm-event.socket
│   ├── docker.service
│   ├── docker.socket
│   ├── kmod-static-nodes.service
│   ├── lm_sensors.service
│   ├── lvm2-lvmpolld.socket
│   ├── lvm2-monitor.service
│   ├── ModemManager.service
│   ├── NetworkManager.service
│   ├── NetworkManager-wait-online.service
│   ├── ntpd.service
│   ├── polkit.service
│   ├── rtkit-daemon.service
│   ├── run-user-1000-gvfs.mount
│   ├── run-user-1000.mount
│   ├── sddm.service
│   ├── smartd.service
│   ├── snapd.apparmor.service
│   ├── snapd.socket
│   ├── sshd.service
│   ├── systemd-binfmt.service
│   ├── systemd-coredump.socket
│   ├── systemd-journald-audit.socket
│   ├── systemd-journald-dev-log.socket
│   ├── systemd-journald.service
│   ├── systemd-journald.socket
│   ├── systemd-journal-flush.service
│   ├── systemd-logind.service
│   ├── systemd-modules-load.service
│   ├── systemd-random-seed.service
│   ├── systemd-remount-fs.service
│   ├── systemd-rfkill.socket
│   ├── systemd-sysctl.service
│   ├── systemd-tmpfiles-setup-dev.service
│   ├── systemd-tmpfiles-setup.service
│   ├── systemd-udevd-control.socket
│   ├── systemd-udevd-kernel.socket
│   ├── systemd-udevd.service
│   ├── systemd-udev-trigger.service
│   ├── systemd-update-utmp.service
│   ├── systemd-user-sessions.service
│   ├── system-getty.slice
│   ├── system-modprobe.slice
│   ├── system-systemd\x2dcoredump.slice
│   ├── system-systemd\x2dfsck.slice
│   ├── tlp.service
│   ├── tmp.mount
│   ├── udisks2.service
│   ├── upower.service
│   └── wpa_supplicant.service
└── user.slice

70 directories

I have one docker container running and we can see it

├── docker
│   └── d9484ad246a254c96d52f9e3a0b414a80026bf5a17c560cc648d640cd9e98793

@ovi lets see what you have

ovi · April 12, 2021, 1:05pm

Thanks!
I cut the output of that command as the qemu part is at the top, that should give you enough info or do you need the rest?

tree -d -L 5 /sys/fs/cgroup/cpu/

/sys/fs/cgroup/cpu/
├── init.scope
├── qemu.slice
│   ├── 100.scope
│   ├── 101.scope
│   └── 102.scope
├── system.slice
│   ├── apparmor.service
│   ├── blk-availability.service
│   ├── console-setup.service
│   ├── cron.service

ovi · April 12, 2021, 1:08pm

Just to clarify, I expected to see stats for the 3 VMs separately grouped underneath the qemu menu in netdata, just like it is show for each docker container.
(Just making sure I properly explained things in my first to posts)

This is one sample container:

ovi · April 12, 2021, 1:10pm

qemu are all together, not grouped by “slice”:

ilyam8 · April 12, 2021, 1:22pm

@ovi

Show ls -l /etc/pve/qemu-server/

We expect to see /etc/pve/qemu-server/100.conf(101, 102) in there. And we need /etc/pve/qemu-server/100.conf content to identify the problem, please share.

ovi · April 12, 2021, 1:25pm

ls -l /etc/pve/qemu-server/

total 2
-rw-r----- 1 root www-data  396 Apr 12 05:51 100.conf
-rw-r----- 1 root www-data 1012 Apr 12 08:50 101.conf
-rw-r----- 1 root www-data  382 Apr 12 13:11 102.conf


 cat /etc/pve/qemu-server/100.conf

#Ubuntu 20.4
#mailcow
agent: 1
boot: order=ide2;scsi0;net0
cores: 2
ide2: none,media=cdrom
memory: 9216
name: mail
net0: virtio=CA:18:5B:74:78:78,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-100-disk-0,discard=on,size=128G
scsihw: virtio-scsi-pci
smbios1: uuid=34fdaae9-f689-4903-8f17-f683578ee6bb
sockets: 1
startup: up=60,down=60
vmgenid: f8209b9e-fc9e-4a2f-9c51-bb5995a2c5fb

ilyam8 · April 12, 2021, 1:27pm

We suspect that the problem is cgroups name resolution script.

A quick fix is to disable name resolution for qemu cgroups

add this line to the netdata.conf to the [plugin:cgroups] section

run script to rename cgroups matching =  !/  !*.mount  !*.socket  !*.partition  /machine.slice/*.service  !*.service  !*.slice  !*.swap  !*.user  !init.scope  !*.scope/vcpu*  !*.scope/emulator  *.scope  *docker*  *lxc*  !*qemu*  *kubepods*  *.libvirt-qemu  *

Don’t forget to restart netdata service after the changes.

ovi · April 12, 2021, 1:38pm

in my /etc/netdata/netdata.conf I had that line already but commented. I un-commented it and compared to yours - they are identical but there are still no changes.

Just to summarize:

In these two lines I have removed the “!*-qemu” bits

enable by default cgroups matching
search for cgroups in subpaths matching

while this line:
run script to rename cgroups matching

still contains it.

ilyam8 · April 12, 2021, 1:40pm

default run script to rename cgroups matching contains *qemu*, i changed it to !*qemu*

ovi · April 12, 2021, 1:44pm

sorry, my mistake thanks for spotting it but even with that change I see no result

ilyam8 · April 12, 2021, 1:45pm

Just to be sure, you’ve restarted netdata servie after the changes?

Anyway, let’s check logs

grep -i cgroup-name error.log

ovi · April 12, 2021, 1:54pm

Yes I did, after every change.

Thanks for the pointer with the netdata logs, I guess now all is clear:

/var/log/netdata# grep -i cgroup-name error.log
2021-04-12 13:05:24: cgroup-name.sh: ERROR: proxmox config file missing /etc/pve/qemu-server/100.conf or netdata does not have read access.  Please ensure netdata is a member of www-data group.

fixed by adding the user to the group:
adduser netdata www-data

check:
groups netdata
netdata : netdata adm proxy www-data ceph

restarted netdata and it works:

Do you reckon I still need all those edits we made during this thread except for adding netdata to the www-data group?

Topic		Replies	Views
Installed Netdata Container 1:33:1 - missing qemu/cgroups Help agent , collectors	5	1231	June 3, 2022
Flaky cgroups graphs and hostname on Balena Help agent-collector , agent	4	642	January 28, 2022
LXC Containers Stats Are Not Shown Help	72	2950	September 28, 2023
Proxmox - Netdata Cloud doesn't show VMs or containers, but the local instance of Netdata does Help cloud , dashboard	6	2687	February 27, 2023
Problem with Process getting killed Help	3	428	December 28, 2023

Need some help with cgroups configuration for qemu VMs

Related topics