Why is netdata reloading?

Hello,
We’ve noticed that some agents on the VMs we monitor with netdata seem to reload without any real indication as to why they are doing so.

On what seems like a side note, when the agents reload, there is a small bug in the netdata service file in regards to the issue found here: Fix boolean value for ProtectControlGroups by didier13150 · Pull Request #11281 · netdata/netdata · GitHub
The issue is due to the fact that ProtectControlGroups expects ‘on’ or ‘yes’ but is configured with ‘true’. Not a problem since we can manually update the netdata.service file. OK.

My question is, when reviewing the Journal logs, we see that the netdata agent is reloading without any prior reason. We don’t see this type of behavior on many of the other nodes that we monitor. We are curious to understand why they reload and what triggers the reload?

Our sample Journal log is below.

Thanks in advance!

[root@netdata-snmp-agent ~]# journalctl --since "2022-07-25 03:45" --until "2022-07-25 03:55"
-- Logs begin at Thu 2022-07-14 21:04:17 UTC, end at Mon 2022-07-25 13:55:13 UTC. --
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: Reloading.
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:50] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:69] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: Reloading.
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:50] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:69] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Jul 25 03:50:53 netdata-snmp-agent systemd[1]: Stopping Real time performance monitoring...
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: Stopped Real time performance monitoring.
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: Starting Real time performance monitoring...
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: Started Real time performance monitoring.
Jul 25 03:50:56 netdata-snmp-agent yum[8462]: Updated: netdata-1.35.0.211.nightly-1.el7.x86_64
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: Reloading.
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:50] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: [/usr/lib/systemd/system/netdata.service:69] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Jul 25 03:50:56 netdata-snmp-agent systemd[1]: Stopping Real time performance monitoring...
Jul 25 03:50:58 netdata-snmp-agent ebpf.plugin[8698]: Does not have a configuration file inside `/etc/netdata/ebpf.d.conf. It will try to load stock file.
Jul 25 03:50:58 netdata-snmp-agent ebpf.plugin[8698]: Name resolution is disabled, collector will not parser "hostnames" list.
Jul 25 03:50:58 netdata-snmp-agent ebpf.plugin[8698]: The network value of CIDR 127.0.0.1/8 was updated for 127.0.0.0 .
Jul 25 03:50:58 netdata-snmp-agent ebpf.plugin[8698]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jul 25 03:50:58 netdata-snmp-agent ebpf.plugin[8698]: ebpf.plugin should either run as root (now running with uid 997, euid 997) or have special capabilities..
Jul 25 03:51:04 netdata-snmp-agent systemd[1]: Stopped Real time performance monitoring.
Jul 25 03:51:04 netdata-snmp-agent systemd[1]: Starting Real time performance monitoring...
Jul 25 03:51:04 netdata-snmp-agent systemd[1]: Started Real time performance monitoring.
Jul 25 03:51:05 netdata-snmp-agent run-parts(/etc/cron.daily)[9059]: finished netdata-updater
Jul 25 03:51:05 netdata-snmp-agent anacron[7905]: Job `cron.daily` terminated
Jul 25 03:51:05 netdata-snmp-agent anacron[7905]: Normal exit (1 job run)
Jul 25 03:51:05 netdata-snmp-agent systemd[1]: Removed slice User Slice of root.
Jul 25 03:51:06 netdata-snmp-agent ebpf.plugin[9103]: Does not have a configuration file inside `/etc/netdata/ebpf.d.conf. It will try to load stock file.
Jul 25 03:51:06 netdata-snmp-agent ebpf.plugin[9103]: Name resolution is disabled, collector will not parser "hostnames" list.
Jul 25 03:51:06 netdata-snmp-agent ebpf.plugin[9103]: The network value of CIDR 127.0.0.1/8 was updated for 127.0.0.0 .
Jul 25 03:51:07 netdata-snmp-agent ebpf.plugin[9103]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jul 25 03:51:07 netdata-snmp-agent ebpf.plugin[9103]: ebpf.plugin should either run as root (now running with uid 997, euid 997) or have special capabilities..
[root@netdata-snmp-agent ~]#

By default, the Netdata Agent auto-updates itself to always be on the latest nightly. You will see it reloading once per day, which has the nightly updates enabled. This behavior is fully configurable, check Installation guide | Learn Netdata

Shoud we also look into that unknown lvalue? I don’t see it on my Ubuntu installation.

By coincidence (?) I see the very recent Fix boolean value for ProtectControlGroups by didier13150 · Pull Request #11281 · netdata/netdata · GitHub

Hi Christopher,
Thanks for the reply!
I saw PR11281 but it seems to be blocked due to failing checks. It was opened a year ago and no movement since.

Hi, @amit. What is your system version?

Hi @ilyam8,
CentOS 7.9

Thanks. Let’s see

systemd --version

There’s no such command.

[root@netdata-snmp-agent ~]# systemd --version
-bash: systemd: command not found
[root@netdata-snmp-agent ~]# system
systemctl                       systemd-coredumpctl             systemd-inhibit                 systemd-run
systemd-analyze                 systemd-delta                   systemd-loginctl                systemd-stdio-bridge
systemd-ask-password            systemd-detect-virt             systemd-machine-id-setup        systemd-sysv-convert
systemd-cat                     systemd-escape                  systemd-notify                  systemd-tmpfiles
systemd-cgls                    systemd-firstboot               systemd-nspawn                  systemd-tty-ask-password-agent
systemd-cgtop                   systemd-hwdb                    systemd-path
~~~

Ok, systemctl --version then.

@ilyam8

[root@netdata-snmp-agent ~]# systemctl --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

systemd 219

Ok, that is an old version. ProtectControlGroups was added in v232.

The issue is due to the fact that ProtectControlGroups expects ‘on’ or ‘yes’ but is configured with ‘true’. Not a problem since we can manually update the netdata.service file. OK

Does changing the value to on fix the problem?

Yes, it does appear to resolve it as we have not seen any errors in the journal logs since the manual change.

I can’t confirm this by checking the source code:

  • parse_boolean from v219 - true is expected.
  • ProtectControlGroups doesn’t exist in v219 - it was added later. If the option doesn’t exist, then the config parser is not aware that it needs to use config_parse_boolean.

The log message is a cosmetic problem - the function returns 0 (success) after logging the message. See https://github.com/systemd/systemd/blob/a88abde72169ddc2df77df3fa5bed30725022253/src/shared/conf-parser.c#L195-L200 (but Fix boolean value for ProtectControlGroups by didier13150 · Pull Request #11281 · netdata/netdata · GitHub will be merged anyway)

So I see no how that minor problem could be the reason for “netdata reloading”.

Hi @ilyam8,
Thanks for looking into this!
You are right.
Last night a couple of VMs were still complaining about that ProtocolControlGroups property despite the changed value to ‘on’.
Is a restart expected behavior so that the agent checks for updated builds?
If so, does this always trigger a disconnect in the cloud portal?

Is a restart expected behavior so that the agent checks for updated builds?

Restart, at least once a day, is expected if you

  • are using the nightly channel (not stable) - we publish nightlies every night.
  • have netdata updater enabled.

Short disconnects are expected because of netdata service restarts.