Netdata service restarting randomly in one of the server.

I am facing a similar issue. My netdata service is randomly restarting but no error logs are generated. As a result, I am getting unreachable alerts even if the server is online.
Can someone help with this? I am not sure what to do at this point.

Recently Netdata logs have been moved to systemd journal logs, can you check there? throug this
Netdata Logging | Learn Netdata or the UI

Are you guys sane? Thats ridiculous change. Only adds easily avoidable burden to troubleshooting.

We saw it as an advantage that you can get all the logs on the UI directly and them being centralized.
You can read more on that on the v1.44 release notes.

We keep the possibility to log them to a file if this is more suitable to you. This can be configured for each log source please check here

@hugo I understand, and am not new in IT, so I will be allright, but look from perspective of newbie: they install netdata and think that it will log as any other service in Linux (to /var/log dir) without need to configure it.

@hugo Last time the service restarted and I tried to check the logs from your documentation here is the output. Can you please help in identifying and solving the random restart issue?

root@n2:~#journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata
Jan 26 02:45:59 n2 netdata[1440077]: Netdata agent version "v1.44.1" is starting
Jan 26 02:45:59 n2 netdata[1440077]: IEEE754: system is using IEEE754 DOUBLE PRECISION values
Jan 26 02:45:59 n2 netdata[1440077]: TIMEZONE: using the contents of /etc/timezone
Jan 26 02:45:59 n2 netdata[1440077]: TIMEZONE: fixed as 'Asia/Bangkok'
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: next: initialize ML
Jan 26 02:45:59 n2 netdata[1440077]: ml database version is 2 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       3 ms, initialize ML - next: initialize signals
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize signals - next: initialize static threads
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize static threads - next: initialize web server
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize web server - next: initialize h2o server
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize h2o server - next: set resource limits
Jan 26 02:45:59 n2 netdata[1440077]: resources control: allowed file descriptors: soft = 1024, max = 524288
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, set resource limits - next: become daemon
Jan 26 02:45:59 n2 netdata[1440077]: Failed to open pidfile '/var/run/netdata/netdata.pid'.
Jan 26 02:45:59 n2 netdata[1440077]: Out-Of-Memory (OOM) score is already set to the wanted value 0
Jan 26 02:45:59 n2 netdata[1440077]: Adjusted netdata scheduling policy to batch (3), with priority 0.
Jan 26 02:45:59 n2 netdata[1440077]: Running with process scheduling policy 'batch', nice level 19
Jan 26 02:45:59 n2 netdata[1440077]: Cannot chown '/var/run/netdata/netdata.pid' to 109:116
Jan 26 02:45:59 n2 netdata[1440077]: netdata started on pid 1440077.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       2 ms, become daemon - next: initialize threads after fork
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize threads after fork - next: initialize registry
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, initialize registry - next: fork the spawn server
Jan 26 02:45:59 n2 netdata[1440077]: Initializing spawn client.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, fork the spawn server - next: collecting system info
Jan 26 02:45:59 n2 netdata[1440079]: Spawn server is up.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in     108 ms, collecting system info - next: initialize RRD structures
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database /var/cache/netdata/netdata-meta.db initialization
Jan 26 02:45:59 n2 netdata[1440077]: metadata database version is 15 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database initialization completed
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database /var/cache/netdata/context-meta.db initialization
Jan 26 02:45:59 n2 netdata[1440077]: context database version is 1 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: Cannot open the file /var/lib/netdata/health.silencers.json, so Netdata will work with the default health configuration.
Jan 26 02:45:59 n2 netdata[1440077]: CONFIG: cannot load user config '/etc/netdata/stream.conf'. Will try stock config.
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 68 files in path /var/cache/netdata/dbengine-tier1
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 35 files in path /var/cache/netdata/dbengine-tier2
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 122 files in path /var/cache/netdata/dbengine
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 12 data/journal of tier 2...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 23 data/journal of tier 1...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 41 data/journal of tier 0...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 12 journal files of tier 2, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 41 journal files of tier 0, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 23 journal files of tier 1, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 0 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 1 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 2 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: Host 'cirrus.n2' (at registry as 'cirrus.n2') with guid '3cb84fc6-a974-11ee-9b6e-3cecefbf5390' initialized, os 'linux', timezone 'Asia/Bangkok', tags '',>
Jan 26 02:45:59 n2 netdata[1440077]: Creating archived hosts
Jan 26 02:45:59 n2 netdata[1440077]: Created 0 archived hosts
Jan 26 02:45:59 n2 netdata[1440077]: ACLK sync initialization completed
Jan 26 02:45:59 n2 netdata[1440077]: Starting ACLK synchronization thread
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in      79 ms, initialize RRD structures - next: check for incomplete shutdown
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, check for incomplete shutdown - next: collect claiming info
Jan 26 02:45:59 n2 netdata[1440077]: File '/var/lib/netdata/cloud.d/claimed_id' was found. Setting state to AGENT_CLAIMED.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, collect claiming info - next: collect host labels
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       1 ms, collect host labels - next: start the static threads
Jan 26 02:45:59 n2 netdata[1440077]: CONFIG: cannot load user exporting config '/etc/netdata/exporting.conf'. Will try the stock version.
Jan 26 02:45:59 n2 netdata[1440077]: To use encryption it is necessary to set "ssl certificate" and "ssl key" in [web] !
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 2
Jan 26 02:45:59 n2 netdata[1440077]: Waiting for Cloud to be enabled
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 3
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 4
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 5
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 6
Jan 26 02:45:59 n2 netdata[1440077]: No connector instances to activate
Jan 26 02:45:59 n2 netdata[1440077]: EXPORTING: no exporting connectors configured
Jan 26 02:45:59 n2 netdata[1440077]: cleaning up...
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       0 ms, start the static threads - next: initialize commands API
Jan 26 02:45:59 n2 netdata[1440077]: Initializing command server.
Jan 26 02:45:59 n2 netdata[1440077]: STATSD collector thread started with taskid 1440566
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in       1 ms, initialize commands API - next: ready
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: completed in 199 ms. Enjoy real-time performance monitoring!
Jan 26 02:45:59 n2 netdata[1440077]: use unified cgroups true
Jan 26 02:45:59 n2 perf.plugin[1440544]: no charts enabled - nothing to do.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: plugin called DISABLE. Disabling it.
Jan 26 02:45:59 n2 apps.plugin[1440564]: PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: Loaded config file '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: started on pid 1440564
Jan 26 02:45:59 n2 netdata[1440077]: child pid 1440544 exited with code 1.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: 'host:cirrus.n2', '/usr/libexec/netdata/plugins.d/perf.plugin' (pid 1440544) exited with error code 1 and haven't collected any data. Disabling>
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Does not have a configuration file inside `/etc/netdata/ebpf.d.conf. It will try to load stock file.
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440596]: FireQOS is not installed on this system. Use FireQOS to apply traffic QoS and expose the class names to netdata. Check https://github.com/netdat>
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: plugin called DISABLE. Disabling it.
Jan 26 02:45:59 n2 netdata[1440077]: child pid 1440594 exited with code 1.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: 'host:cirrus.n2', '/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 1440594) exited with error code 1 and haven't collected any data. Disabli>
Jan 26 02:45:59 n2 systemd-journal.plugin[1440538]: heartbeat randomness of 337000 is too big for a tick of 100000 - setting it to 29000
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Name resolution is disabled, collector will not parse "hostnames" list.
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440617]: Cannot find file '/usr/lib/netdata/conf.d/tc-qos-helper.conf'.
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440630]: Cannot find file '/etc/netdata/tc-qos-helper.conf'.
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="env HTTP_PROXY '', HTTPS_PROXY ''" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="instance is started" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="loading config file" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="found '/usr/lib/netdata/conf.d/go.d.conf" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="config successfully loaded" plugin=go.d component=agent
root@n2:~# journalctl -u netdata --namespace=netdata
Jan 26 02:45:14 n2 netdata[810983]: Deleting chart 'cgroup_qemu_minbu-domain-com.cpu_limit' ('cgroup_qemu_minbu-domain-com.cpu_limit') from disk...
Jan 26 02:45:14 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, clean rrdhost database - next: stop aclk threads
Jan 26 02:45:14 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, stop aclk threads - next: stop all remaining worker threads
Jan 26 02:45:14 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 2 services [ COLLECTORS ANALYTICS ] to exit: 'P[cgroups]' (811518), 'ANALYTICS' (814453)
Jan 26 02:45:14 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:15 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:16 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:17 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:18 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:19 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:20 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:21 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:22 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:23 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:24 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:24 n2 netdata[810983]: SERVICE CONTROL: the following 1 service(s) [ COLLECTORS ] take too long to exit: 'P[cgroups]' (811518); giving up on them...
Jan 26 02:45:24 n2 netdata[810983]: NETDATA SHUTDOWN: in   10117 ms, (TIMEOUT) stop all remaining worker threads - next: cancel main threads
Jan 26 02:45:24 n2 netdata[810983]: EXIT: Stopping main thread: DYNCFG
Jan 26 02:45:24 n2 netdata[810983]: EXIT: Stopping main thread: P[cgroups]
Jan 26 02:45:24 n2 netdata[810983]: Waiting 2 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: cleaning up...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:29 n2 netdata[810983]: Main thread P[cgroups] takes too long to exit. Giving up...
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in    5005 ms, cancel main threads - next: close SQL context db
Jan 26 02:45:29 n2 netdata[810983]: Closing context SQLite database
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, close SQL context db - next: closed SQL main db
Jan 26 02:45:29 n2 netdata[810983]: Closing SQLite database
Jan 26 02:45:29 n2 netdata[810983]: No statements pending to finalize
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, closed SQL main db - next: remove pid file
Jan 26 02:45:29 n2 netdata[810983]: EXIT: cannot unlink pidfile '/var/run/netdata/netdata.pid'.
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, remove pid file - next: free openssl structures
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, free openssl structures - next: remove incomplete shutdown file
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in       0 ms, remove incomplete shutdown file - next: exit
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: completed in 22026 ms - netdata is now exiting - bye bye...
Jan 26 02:45:29 n2 netdata[811011]: EOF found in spawn pipe.
Jan 26 02:45:29 n2 netdata[811011]: Shutting down spawn server event loop.
Jan 26 02:45:29 n2 netdata[811011]: Shutting down spawn server loop complete.

@Eliot_503 glad you were able to get to the logs, if there isn’t anything that you can see and take action based on the logs I suggest you to open a bug report including these log details

1 Like

@hugo Thank you I opened a bug report as suggested hopefully it will be resolved soon.