Environment
We want to setup a fleet of devices running on Balena with Netdata.
We currently have 2 issues.
Problem/Question
Issue 1:
On our deployment, we have a few containers including the netdata container. When we deploy to a new device, the cgroups are mostly not showing on the dashboard. When we do some reboots and restarts of the netdata container, they start to appear. But then they can disappear again after a reboot or container restart.
Issue 2:
The hostname that is shown on top of the netdata dashboard is a random number on balena. When the container restarts, the number is generated again. We know we can set it using the docker hostname argument but we want to use Balena’s devicename for each device. I did an attempt to set it before we start netdata inside the launch.sh script. But that doens’t seems to work.
I created a minimal deployment for testing at GitHub - mrtncls/netdata-on-balena-test
What I expected to happen
The cgroups should always be show for each container.
The hostname to be the Belena devicename.
cgroup discovery can take quite a long time (up to several minutes), especially when there are a lot of cgroups (not just visible, but the total number) on a monitored host. There is also a timeout between discoveries and limit for the total number of processed cgroups.
[plugin:cgroups]
check for new cgroups every = 10
max cgroups to allow = 1000
Try not to reboot the Netdata container but to refresh the dashboard.
The container is now running for 25 minutes.
I did a ctrl+f5 refresh on chrome but still no data.
The times it worked after reboot, the cgroups were almost immediately available.
The log shows the plugin started but nothing else related to cgroups.
Can I activate verbose logging for this plugin?
What log should I look for?
2022-01-27 16:25:23: netdata INFO : PLUGIN[tc] : thread created with task id 223
2022-01-27 16:25:23: netdata INFO : PLUGIN[tc] : set name of thread 223 to PLUGIN[tc]
2022-01-27 16:25:23: netdata INFO : PLUGIN[cgroups] : thread created with task id 222
2022-01-27 16:25:23: netdata INFO : PLUGIN[cgroups] : set name of thread 222 to PLUGIN[cgroups]
2022-01-27 16:25:23: netdata INFO : ACLK_Main : thread created with task id 226
2022-01-27 16:25:23: netdata INFO : ACLK_Main : set name of thread 226 to ACLK_Main
2022-01-27 16:25:23: netdata INFO : ACLK_Main : Starting ACLK-NG
...
2022-01-27 16:50:05: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 127/130 bytes -2%, prep/sent/total = 0.06/0.07/0.13 ms) 200 '//api/v1/alarms'
2022-01-27 16:50:16: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 126/130 bytes -3%, prep/sent/total = 0.04/0.06/0.10 ms) 200 '//api/v1/alarms'
2022-01-27 16:50:27: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 127/130 bytes -2%, prep/sent/total = 0.10/0.08/0.18 ms) 200 '//api/v1/alarms'
2022-01-27 16:50:28: 82: 229 '[localhost]:51792' 'CONNECTED'
2022-01-27 16:50:28: 82: 229 '[localhost]:51792' 'DISCONNECTED'
2022-01-27 16:50:28: 82: 229 '[localhost]:51792' 'DATA' (sent/all = 4659/4659 bytes -0%, prep/sent/total = 0.30/0.44/0.74 ms) 200 '/api/v1/info'
2022-01-27 16:50:38: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 126/130 bytes -3%, prep/sent/total = 0.06/0.05/0.11 ms) 200 '//api/v1/alarms'
2022-01-27 16:50:49: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 127/130 bytes -2%, prep/sent/total = 0.04/0.06/0.10 ms) 200 '//api/v1/alarms'
2022-01-27 16:51:00: 80: 229 '[52.4.252.97]:56820' 'DATA' (sent/all = 126/130 bytes -3%, prep/sent/total = 0.06/0.06/0.12 ms) 200 '//api/v1/alarms'
2022-01-27 16:51:03: 81: 229 '[52.4.252.97]:56866' 'DISCONNECTED'
2022-01-27 16:51:03: 79: 229 '[52.4.252.97]:56800' 'DISCONNECTED'
2022-01-27 16:51:03: 78: 229 '[52.4.252.97]:56794' 'DISCONNECTED'
2022-01-27 16:51:03: 77: 229 '[52.4.252.97]:56782' 'DISCONNECTED'
2022-01-27 16:51:03: 68: 229 '[52.4.252.97]:59864' 'DISCONNECTED'
Well, there is nothing helpful in the log you sent. Please do grep cgroup
, are there any lines with cgroup-name.sh
when cgroups are not available in the dashboard? Any information about the thread being stopped?
The cgroup task keeps running but no logs.
When the cgroups are not available, I don’t see the cgroup-name.sh logs.
root@289f739:~# balena logs netdata_4487554_2052594 2>&1 | grep cgroup
2022-01-28 07:45:32: netdata INFO : PLUGIN[cgroups] : thread created with task id 220
2022-01-28 07:45:32: netdata INFO : PLUGIN[cgroups] : set name of thread 220 to PLUGIN[cgroups]
root@289f739:~# balena logs netdata_4487554_2052594 2>&1 --tail 4
2022-01-28 08:03:46: 35: 227 '[52.4.252.97]:53968' 'DATA' (sent/all = 1059/6381 bytes -83%, prep/sent/total = 0.41/4.52/4.93 ms) 200 '/api/v1/data'
2022-01-28 08:03:46: 27: 227 '[52.4.252.97]:34794' 'DATA' (sent/all = 128/130 bytes -2%, prep/sent/total = 0.07/0.95/1.02 ms) 200 '//api/v1/alarms'
2022-01-28 08:03:57: 27: 227 '[52.4.252.97]:34794' 'DATA' (sent/all = 127/130 bytes -2%, prep/sent/total = 0.04/0.06/0.10 ms) 200 '//api/v1/alarms'
2022-01-28 08:04:08: 27: 227 '[52.4.252.97]:34794' 'DATA' (sent/all = 127/130 bytes -2%, prep/sent/total = 0.04/0.05/0.09 ms) 200 '//api/v1/alarms'
root@289f739:~# ps aux | grep /netdata
201 1431 1.3 2.2 112236 45668 pts/0 SNsl+ 07:45 0:16 /usr/sbin/netdata -u netdata -D -s /host -p 19999 -W set web web files group root -W set web web files owner root
201 1627 0.0 0.3 30356 7356 pts/0 SNl+ 07:45 0:00 /usr/sbin/netdata --special-spawn-server
201 2021 0.0 0.0 2440 1880 pts/0 SN+ 07:45 0:00 bash /usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1
root 2026 0.2 0.1 54588 3124 pts/0 SN+ 07:45 0:03 /usr/libexec/netdata/plugins.d/apps.plugin 1
201 2029 0.0 1.2 731336 24652 pts/0 SNl+ 07:45 0:00 /usr/libexec/netdata/plugins.d/go.d.plugin 1
201 2030 0.1 1.4 33872 29676 pts/0 SNl+ 07:45 0:01 /usr/bin/python3 /usr/libexec/netdata/plugins.d/python.d.plugin 1
root 7392 0.0 0.0 4236 1060 pts/1 S+ 08:04 0:00 grep /netdata
root@289f739:~# ps -T -p 1431
PID SPID TTY TIME CMD
1431 1431 pts/0 00:00:08 netdata
1431 1626 pts/0 00:00:00 netdata
1431 1983 pts/0 00:00:00 netdata
1431 1986 pts/0 00:00:00 GLOBAL_STATS
1431 1987 pts/0 00:00:01 PLUGIN[proc]
1431 1988 pts/0 00:00:00 PLUGIN[diskspac
1431 1989 pts/0 00:00:00 PLUGIN[timex]
1431 1990 pts/0 00:00:00 PLUGIN[cgroups]
1431 1991 pts/0 00:00:00 PLUGIN[tc]
1431 1992 pts/0 00:00:03 PLUGIN[idlejitt
1431 1993 pts/0 00:00:00 STATSD
1431 1994 pts/0 00:00:00 ACLK_Main
1431 1997 pts/0 00:00:00 WEB_SERVER[stat
1431 1998 pts/0 00:00:00 PLUGINSD
1431 1999 pts/0 00:00:00 HEALTH
1431 2000 pts/0 00:00:00 SERVICE
1431 2001 pts/0 00:00:00 netdata
1431 2003 pts/0 00:00:01 PLUGINSD[apps]
1431 2006 pts/0 00:00:00 PLUGINSD[python
1431 2007 pts/0 00:00:00 PLUGINSD[go.d]
1431 2018 pts/0 00:00:00 STATSD_COLLECTO
1431 6833 pts/0 00:00:00 netdata
1431 6834 pts/0 00:00:00 netdata
1431 6835 pts/0 00:00:00 netdata
1431 6836 pts/0 00:00:00 netdata
root@289f739:~# cat /proc/1431/task/1990/cmdline
/usr/sbin/netdata-unetdata-D-s/host-p19999-Wsetwebweb files grouproot-Wsetwebweb files ownerroot