Hi there,
I have installed netdata on a host where I run plenty of docker containers and out of the box I can see stats for each container:
Hi there,
I have installed netdata on a host where I run plenty of docker containers and out of the box I can see stats for each container:
As a new user I can only add one image per post hence my reply to my own topic.
Next I installed netdata on a host where I run KVMs (using Proxmox qhich is based on qemu) but here I donβt see the stats per KVM but rather alltogether:
I found the instructions a bit beyond my understanding, maybe someone can help me configure netdata so that the resources used are grouped by VM?
=> cgroups.plugin | Learn Netdata
I donβt know if it will help but try to tweak
[plugin:cgroups]
enable by default cgroups matching = !*/init.scope !/system.slice/run-*.scope *.scope /machine.slice/*.service /kubepods/pod*/* /kubepods/*/pod*/* !/kubepods* !*/vcpu* !*/emulator !*.mount !*.partition !*.service !*.socket !*.slice !*.swap !*.user !/ !/docker !/libvirt !/lxc !/lxc/*/* !/lxc.monitor* !/lxc.pivot !/lxc.payload !/machine !/qemu !/system !/systemd !/user *
search for cgroups in subpaths matching = !*/init.scope !*-qemu !*.libvirt-qemu !/init.scope !/system !/systemd !/user !/user.slice !/lxc/*/* !/lxc.monitor !/lxc.payload/*/* !/lxc.payload.* *
in netdata.conf
Thanks @vlvkobal, this is the answer. Try to remove the !*-qemu
bit in the configuration and then restart Netdata.
but here I donβt see the stats per KVM but rather alltogether
@vlvkobal your fix filters out kvm all qemu stats, doesnβt it?
I found the issue QEMU metrics incorrect and crashing dashboard Β· Issue #9254 Β· netdata/netdata Β· GitHub
Thanks everyone. I removed the β!*-qemuβ bit in both lines (enable by default cgroups matching = and search for cgroups in subpaths matching =) and restarted netdata.
I am not accessing netdata locally - only through netdata cloud. A few refreshes later I still see:
That isnβt a fix, itβs just our default config. I donβt know what is the structure of containers in QEMU, thatβs why I suggested tweaking the configuration parameters, rather than providing a more specific suggestion.
Thanks for your pointer, I understand. I had hoped someone would be using qemu and netdata and might be able to provide more help.
Lets give it a few days, maybe someone will spot this thread
That is cgroup directory structure on my server
[ilyam@pc ~]$ tree -d -L 5 /sys/fs/cgroup/cpu/
/sys/fs/cgroup/cpu/
βββ dev-hugepages.mount
βββ dev-mqueue.mount
βββ docker
β βββ d9484ad246a254c96d52f9e3a0b414a80026bf5a17c560cc648d640cd9e98793
βββ init.scope
βββ -.mount
βββ proc-sys-fs-binfmt_misc.mount
βββ sys-fs-fuse-connections.mount
βββ sys-kernel-config.mount
βββ sys-kernel-debug.mount
βββ sys-kernel-tracing.mount
βββ system.slice
β βββ apparmor.service
β βββ avahi-daemon.service
β βββ avahi-daemon.socket
β βββ boot-efi.mount
β βββ cronie.service
β βββ dbus.service
β βββ dbus.socket
β βββ dm-event.socket
β βββ docker.service
β βββ docker.socket
β βββ kmod-static-nodes.service
β βββ lm_sensors.service
β βββ lvm2-lvmpolld.socket
β βββ lvm2-monitor.service
β βββ ModemManager.service
β βββ NetworkManager.service
β βββ NetworkManager-wait-online.service
β βββ ntpd.service
β βββ polkit.service
β βββ rtkit-daemon.service
β βββ run-user-1000-gvfs.mount
β βββ run-user-1000.mount
β βββ sddm.service
β βββ smartd.service
β βββ snapd.apparmor.service
β βββ snapd.socket
β βββ sshd.service
β βββ systemd-binfmt.service
β βββ systemd-coredump.socket
β βββ systemd-journald-audit.socket
β βββ systemd-journald-dev-log.socket
β βββ systemd-journald.service
β βββ systemd-journald.socket
β βββ systemd-journal-flush.service
β βββ systemd-logind.service
β βββ systemd-modules-load.service
β βββ systemd-random-seed.service
β βββ systemd-remount-fs.service
β βββ systemd-rfkill.socket
β βββ systemd-sysctl.service
β βββ systemd-tmpfiles-setup-dev.service
β βββ systemd-tmpfiles-setup.service
β βββ systemd-udevd-control.socket
β βββ systemd-udevd-kernel.socket
β βββ systemd-udevd.service
β βββ systemd-udev-trigger.service
β βββ systemd-update-utmp.service
β βββ systemd-user-sessions.service
β βββ system-getty.slice
β βββ system-modprobe.slice
β βββ system-systemd\x2dcoredump.slice
β βββ system-systemd\x2dfsck.slice
β βββ tlp.service
β βββ tmp.mount
β βββ udisks2.service
β βββ upower.service
β βββ wpa_supplicant.service
βββ user.slice
70 directories
I have one docker container running and we can see it
βββ docker
β βββ d9484ad246a254c96d52f9e3a0b414a80026bf5a17c560cc648d640cd9e98793
@ovi lets see what you have
Thanks!
I cut the output of that command as the qemu part is at the top, that should give you enough info or do you need the rest?
tree -d -L 5 /sys/fs/cgroup/cpu/
/sys/fs/cgroup/cpu/
βββ init.scope
βββ qemu.slice
β βββ 100.scope
β βββ 101.scope
β βββ 102.scope
βββ system.slice
β βββ apparmor.service
β βββ blk-availability.service
β βββ console-setup.service
β βββ cron.service
Just to clarify, I expected to see stats for the 3 VMs separately grouped underneath the qemu menu in netdata, just like it is show for each docker container.
(Just making sure I properly explained things in my first to posts)
This is one sample container:
qemu are all together, not grouped by βsliceβ:
Show ls -l /etc/pve/qemu-server/
We expect to see /etc/pve/qemu-server/100.conf
(101, 102) in there. And we need /etc/pve/qemu-server/100.conf
content to identify the problem, please share.
ls -l /etc/pve/qemu-server/
total 2
-rw-r----- 1 root www-data 396 Apr 12 05:51 100.conf
-rw-r----- 1 root www-data 1012 Apr 12 08:50 101.conf
-rw-r----- 1 root www-data 382 Apr 12 13:11 102.conf
cat /etc/pve/qemu-server/100.conf
#Ubuntu 20.4
#mailcow
agent: 1
boot: order=ide2;scsi0;net0
cores: 2
ide2: none,media=cdrom
memory: 9216
name: mail
net0: virtio=CA:18:5B:74:78:78,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-100-disk-0,discard=on,size=128G
scsihw: virtio-scsi-pci
smbios1: uuid=34fdaae9-f689-4903-8f17-f683578ee6bb
sockets: 1
startup: up=60,down=60
vmgenid: f8209b9e-fc9e-4a2f-9c51-bb5995a2c5fb
We suspect that the problem is cgroups name resolution script.
A quick fix is to disable name resolution for qemu cgroups
add this line to the netdata.conf to the [plugin:cgroups]
section
run script to rename cgroups matching = !/ !*.mount !*.socket !*.partition /machine.slice/*.service !*.service !*.slice !*.swap !*.user !init.scope !*.scope/vcpu* !*.scope/emulator *.scope *docker* *lxc* !*qemu* *kubepods* *.libvirt-qemu *
Donβt forget to restart netdata service after the changes.
in my /etc/netdata/netdata.conf I had that line already but commented. I un-commented it and compared to yours - they are identical but there are still no changes.
Just to summarize:
In these two lines I have removed the β!*-qemuβ bits
enable by default cgroups matching
search for cgroups in subpaths matching
while this line:
run script to rename cgroups matching
still contains it.
default run script to rename cgroups matching
contains *qemu*
, i changed it to !*qemu*
sorry, my mistake thanks for spotting it but even with that change I see no result
Just to be sure, youβve restarted netdata servie after the changes?
Anyway, letβs check logs
grep -i cgroup-name error.log
Yes I did, after every change.
Thanks for the pointer with the netdata logs, I guess now all is clear:
/var/log/netdata# grep -i cgroup-name error.log
2021-04-12 13:05:24: cgroup-name.sh: ERROR: proxmox config file missing /etc/pve/qemu-server/100.conf or netdata does not have read access. Please ensure netdata is a member of www-data group.
fixed by adding the user to the group:
adduser netdata www-data
check:
groups netdata
netdata : netdata adm proxy www-data ceph
restarted netdata and it works:
Do you reckon I still need all those edits we made during this thread except for adding netdata to the www-data group?