netdata agent restart

Our agent just restarted and we dont know the reason.

/opt/netdata/usr/bin/netdata -v
netdata v1.42.0-97-ge9989c519

error.log

2023-08-31 00:09:08: netdata INFO  : UV_WORKER[16] : METADATA: Freeing 3 database pages
2023-08-31 00:09:34: netdata INFO  : MAIN : SIGNAL: Received SIGTERM. Cleaning up to exit...

Environment

/opt/netdata//etc/netdata/.environment
# Created by installer
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
CFLAGS="-ffunction-sections -fdata-sections -static -O2 -funroll-loops -I/openssl-static/include -I/libnetfilter-acct-static/include/libnetfilter_acct -I/usr/include/libmnl -pipe"
LDFLAGS="-Wl,--gc-sections -static -L/openssl-static/lib64 -L/libnetfilter-acct-static/lib -lnetfilter_acct -L/usr/lib -lmnl"
MAKEOPTS="-j2"
NETDATA_TMPDIR="/tmp"
NETDATA_PREFIX="/opt/netdata"
NETDATA_CONFIGURE_OPTIONS=" --enable-cloud --without-bundled-protobuf --disable-dependency-tracking --enable-lto"
NETDATA_ADDED_TO_GROUPS=" docker nginx varnish haproxy adm nsd proxy squid ceph nobody"
INSTALL_UID="0"
NETDATA_GROUP="netdata"
REINSTALL_OPTIONS=" --disable-telemetry"
RELEASE_CHANNEL="nightly"
IS_NETDATA_STATIC_BINARY="yes"
NETDATA_LIB_DIR="/opt/netdata/var/lib/netdata"

journalctl -u netdata.service

Aug 29 07:16:31 blue [1143]: NFACCT reached my lifetime expectancy. Exiting to restart.
Aug 29 07:16:32 blue [12498]: Zswap is disabled
Aug 29 07:16:32 blue [12498]: Failed to find powercap zones.
Aug 30 02:03:23 blue [1116]: Cannot process /proc/6162/limits (command 'sed')
Aug 30 07:16:33 blue [13338]: Zswap is disabled
Aug 30 07:16:33 blue [13338]: Failed to find powercap zones.
Aug 30 07:16:33 blue [12505]: NFACCT reached my lifetime expectancy. Exiting to restart.
Aug 31 00:09:34 blue systemd[1]: Stopping Real time performance monitoring...
Aug 31 00:09:34 blue [1133]: thread with task id 1151 finished
Aug 31 00:09:34 blue [1133]: thread with task id 1158 finished
Aug 31 00:09:34 blue [1133]: thread with task id 1154 finished
Aug 31 00:09:35 blue [1133]: thread with task id 1157 finished
Aug 31 00:09:35 blue [1133]: thread with task id 1152 finished
Aug 31 00:09:35 blue [9881]: PROCFILE: Cannot open file '/opt/netdata/etc/netdata/apps_groups.conf'
Aug 31 00:09:35 blue [9881]: Cannot read process groups configuration file '/opt/netdata/etc/netdata/apps_groups.conf'. Will try '/opt>
Aug 31 00:09:35 blue [9881]: Loaded config file '/opt/netdata/usr/lib/netdata/conf.d/apps_groups.conf'
Aug 31 00:09:35 blue [1133]: eBPF cannot unload all threads on time, but it will go away
Aug 31 00:09:35 blue [9881]: started on pid 9881
Aug 31 00:09:35 blue [9881]: set name of thread 9885 to APPS_READER
Aug 31 00:09:35 blue [1133]: thread with task id 1160 finished
Aug 31 00:09:36 blue [9881]: Using now_boottime_usec() for uptime (dt is 8 ms)
Aug 31 00:09:36 blue [9882]: Zswap is disabled
Aug 31 00:09:36 blue [9882]: Failed to find powercap zones.
Aug 31 00:09:36 blue [1133]: thread with task id 1153 finished
Aug 31 00:09:36 blue [1133]: thread with task id 1156 finished
Aug 31 00:09:36 blue [1133]: eBPF cannot unload all threads on time, but it will go away
Aug 31 00:12:04 blue systemd[1]: netdata.service: State 'stop-sigterm' timed out. Killing.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 942 (netdata) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 944 (netdata) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 943 (DAEMON_SPAWN) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 946 (netdata) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1061 (DBENGINE) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1062 (UV_WORKER[2]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1063 (UV_WORKER[1]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1064 (UV_WORKER[16]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1065 (UV_WORKER[3]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1067 (UV_WORKER[5]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1068 (UV_WORKER[13]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1069 (UV_WORKER[7]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1070 (UV_WORKER[6]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1071 (UV_WORKER[8]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1072 (UV_WORKER[10]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1073 (UV_WORKER[14]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1074 (UV_WORKER[9]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1075 (UV_WORKER[12]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1076 (UV_WORKER[11]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1077 (UV_WORKER[15]) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1078 (METASYNC) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Killing process 1104 (DYNCFG) with signal SIGKILL.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Main process exited, code=killed, status=9/KILL
Aug 31 00:12:04 blue systemd[1]: netdata.service: Failed with result 'timeout'.
Aug 31 00:12:04 blue systemd[1]: Stopped Real time performance monitoring.
Aug 31 00:12:04 blue systemd[1]: netdata.service: Consumed 2h 11min 20.759s CPU time.
Aug 31 00:12:09 blue systemd[1]: Starting Real time performance monitoring...
Aug 31 00:12:09 blue systemd[1]: Started Real time performance monitoring.
Aug 31 00:12:10 blue [10208]: PROCFILE: Cannot open file '/opt/netdata/etc/netdata/apps_groups.conf'
Aug 31 00:12:10 blue [10208]: Cannot read process groups configuration file '/opt/netdata/etc/netdata/apps_groups.conf'. Will try '/op>
Aug 31 00:12:10 blue [10208]: Loaded config file '/opt/netdata/usr/lib/netdata/conf.d/apps_groups.conf'
Aug 31 00:12:10 blue [10208]: started on pid 10208
Aug 31 00:12:10 blue [10208]: set name of thread 10235 to APPS_READER
Aug 31 00:12:10 blue [10241]: no charts enabled - nothing to do.
Aug 31 00:12:10 blue [10257]: Does not have a configuration file inside `/opt/netdata/etc/netdata/ebpf.d.conf. It will try to load sto>
Aug 31 00:12:10 blue [10257]: Cannot read process groups configuration file '/opt/netdata/etc/netdata/apps_groups.conf'. Will try '/op>
Aug 31 00:12:10 blue [10257]: thread created with task id 10283
Aug 31 00:12:10 blue [10257]: set name of thread 10283 to EBPF OOMKILL
Aug 31 00:12:10 blue [10257]: thread created with task id 10284

The config is the default config.

/opt/netdata/var/log/netdata/debug.log is empty.

Is there any way we can figure out what happened?

It took about 5 minutes for the agent to shutdown. Does that sound correct? Or do we need to fine tune anything to make sure restarts are quicker.

Thanks!

Hi @avrapal

Hmmm, 5 minutes to shutdown doesn’t seem normal.

Do you have more of the error.log during that time?

Thanks for looking into this so quickly. Is it possible for me to mail you the log?

Sure, yes, please send to manolis@netdata.cloud

Got it thanks, we’re looking into it!

Thank you. Much appreciated.