Environment
- Ubuntu 20.04 VPS
- Only has netdata stable installed with minimal config changes
Problem/Question
I’m evaluating netdata agent & cloud as a potential solution for monitoring a few servers, however I am receiving email alerts about, " X is unreachable" and then shortly after “X is reachable” this happens ~2-3 times during a 24 hr period.
I can observe this outage via netdata cloud:
Digging a bit into the issue, it would appear to be an issue with the agent restarting. I’m not particularly familure with netdata and it’s logs and am hoping for some guidance as to understand if this is caused by user error or is indeed a bug? (I’m happy to open an issue on github if required)
I originally saw this issue when using the nightly branch (installed via the kickstart script), but assumed it restarts were due to auto-updates which is why I switched to the stable branch, yet I continue to see these odd agent restarts.
The logs I’m sharing are from a new Ubuntu 20.04 VPS with only netdata installed on it.
What I expected to happen
I don’t expect the netdata agent to “randomly” restart. I’m happy to provide any additional logs.
Logs & config files
/etc/netdata/.environment
# Created by installer
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
CFLAGS="-O2"
LDFLAGS=""
NETDATA_TMPDIR="/tmp"
NETDATA_PREFIX=""
NETDATA_CONFIGURE_OPTIONS=" --with-bundled-lws=externaldeps/libwebsockets"
NETDATA_ADDED_TO_GROUPS=" adm proxy"
INSTALL_UID="0"
NETDATA_GROUP="netdata"
REINSTALL_OPTIONS="--auto-update --disable-telemetry --stable-channel "
RELEASE_CHANNEL="stable"
IS_NETDATA_STATIC_BINARY="no"
NETDATA_LIB_DIR="/var/lib/netdata"
journalctl -u netdata.service
Feb 19 07:09:04 netdata-test ebpf.plugin[84212]: PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf'
Feb 19 07:09:04 netdata-test ebpf.plugin[84212]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Feb 24 07:12:36 netdata-test systemd[1]: /lib/systemd/system/netdata.service:14: PIDFile= references a path below legacy directory /var/run/, updating /var/run/netdata/netdata.pid → /run/netdata/netdata.>
Feb 24 07:12:36 netdata-test systemd[1]: /lib/systemd/system/netdata.service:14: PIDFile= references a path below legacy directory /var/run/, updating /var/run/netdata/netdata.pid → /run/netdata/netdata.>
Feb 24 07:12:36 netdata-test systemd[1]: Stopping Real time performance monitoring...
Feb 24 07:12:38 netdata-test systemd[1]: netdata.service: Succeeded.
Feb 24 07:12:38 netdata-test systemd[1]: Stopped Real time performance monitoring.
Feb 24 07:12:43 netdata-test systemd[1]: Starting Real time performance monitoring...
Feb 24 07:12:43 netdata-test systemd[1]: Started Real time performance monitoring.
Feb 24 07:12:43 netdata-test netdata[135193]: SIGNAL: Not enabling reaper
Feb 24 07:12:43 netdata-test netdata[135193]: 2021-02-24 07:12:43: netdata INFO : MAIN : SIGNAL: Not enabling reaper
Feb 24 07:12:43 netdata-test systemd[1]: Stopping Real time performance monitoring...
Feb 24 07:12:43 netdata-test ebpf.plugin[135278]: Does not have a configuration file inside `/etc/netdata/ebpf.conf. It will try to load stock file.
Feb 24 07:12:43 netdata-test ebpf.plugin[135278]: Name resolution is disabled, collector will not parser "hostnames" list.
Feb 24 07:12:43 netdata-test ebpf.plugin[135278]: The network value of CIDR 127.0.0.1/8 was updated for 127.0.0.0 .
Feb 24 07:12:43 netdata-test ebpf.plugin[135278]: PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf'
Feb 24 07:12:43 netdata-test ebpf.plugin[135278]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Feb 24 07:12:51 netdata-test systemd[1]: netdata.service: Succeeded.
Feb 24 07:12:51 netdata-test systemd[1]: Stopped Real time performance monitoring.
Feb 24 07:12:56 netdata-test systemd[1]: /lib/systemd/system/netdata.service:14: PIDFile= references a path below legacy directory /var/run/, updating /var/run/netdata/netdata.pid → /run/netdata/netdata.pid; please update the unit file accordingly.
/var/log/netdata/error.log
2021-02-24 07:07:57: netdata INFO : PLUGINSD[apps] : RRDSET: chart name 'apps.pipes' on host 'netdata-test' already exists.
2021-02-24 07:12:36: netdata INFO : MAIN : SIGNAL: Received SIGTERM. Cleaning up to exit...
2021-02-24 07:12:36: netdata INFO : MAIN : Shutting down command server.
2021-02-24 07:12:36: netdata ERROR : PLUGINSD[apps] : read failed: end of file
2021-02-24 07:12:36: netdata INFO : PLUGINSD[apps] : PARSER ended
2021-02-24 07:12:36: netdata ERROR : PLUGINSD[apps] : '/usr/libexec/netdata/plugins.d/apps.plugin' (pid 84216) disconnected after 5791614 successful data collections (ENDs).
2021-02-24 07:12:36: netdata ERROR : PLUGINSD[apps] : child pid 84216 killed by signal 15.
2021-02-24 07:12:36: netdata INFO : PLUGINSD[apps] : '/usr/libexec/netdata/plugins.d/apps.plugin' (pid 84216) was killed with SIGTERM. Disabling it.
2021-02-24 07:12:36: netdata ERROR : PLUGIN[tc] : child pid 122172 killed by signal 15.
2021-02-24 07:12:36: netdata INFO : MAIN : Shutting down command event loop.
2021-02-24 07:12:36: netdata INFO : MAIN : Shutting down command loop complete.
2021-02-24 07:12:36: netdata ERROR : PLUGINSD[go.d] : read failed: end of file (errno 9, Bad file descriptor)
2021-02-24 07:12:36: netdata INFO : PLUGINSD[go.d] : PARSER ended
2021-02-24 07:12:36: netdata ERROR : PLUGINSD[go.d] : '/usr/libexec/netdata/plugins.d/go.d.plugin' (pid 84213) disconnected after 0 successful data collections (ENDs).
2021-02-24 07:12:36: netdata INFO : PLUGINSD[go.d] : '/usr/libexec/netdata/plugins.d/go.d.plugin' (pid 84213) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again..
2021-02-24 07:12:36: netdata INFO : MAIN : Command server has stopped.
2021-02-24 07:12:36: netdata INFO : PLUGINSD[apps] : thread with task id 84204 finished
2021-02-24 07:12:36: netdata INFO : MAIN : EXIT: netdata prepares to exit with code 0...
2021-02-24 07:12:36: netdata INFO : MAIN : EXIT: cleaning up the database...
2021-02-24 07:12:36: netdata INFO : MAIN : Cleaning up database [1 hosts(s)]...
2021-02-24 07:12:36: netdata INFO : MAIN : Cleaning up database of host 'netdata-test'...
2021-02-24 07:12:36: netdata INFO : MAIN : EXIT: stopping static threads...
/etc/netdata/netdata.conf (only changes made to default config)
[global]
update every = 5
...
[system.entropy]
enabled = no
...