I have installed netdata v2.1.0-119-nightly build in one of our Ubuntu 20.04.6 vm .
When we select time range as more than 12 hours netdata is getting restarted and starting from scratch.As result all existing data is lost.This is really blocking our testing.
Could you please anyone help us to resolve this issue or workaround?
Please let me know if you need anyother informations
Hi @ilyam8 Thank you for the update. I am not able to get the pattern for restarting the service.Yesterday also netdata service got restarted even without doing any operation.
Observed the below messages from journactl log file in todays restart.
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“received terminated signal (15). Terminating…” plugin=go.d component=agent
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“discovery manager”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=discovery discoverer=file
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“service discovery” discoverer=net_listeners
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“all discoverers exited” plugin=go.d component=“service discovery” pipeline=“network listeners”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“service discovery” pipeline=“network listeners”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“filestatus manager”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“functions manager”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“service discovery”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=stopped plugin=go.d collector=logind job=logind
Jan 13 06:46:58 . netdata[2908192]: level=info msg=stopped plugin=go.d collector=systemdunits job=service-units
Jan 13 06:46:58 . netdata[2908192]: level=info msg=stopped plugin=go.d collector=monit job=local
Jan 13 06:46:58 . netdata[2908192]: level=info msg=stopped plugin=go.d collector=ntpd job=local
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=“job manager”
Jan 13 06:46:58 . netdata[2908192]: level=info msg=“instance is stopped” plugin=go.d component=agent
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967292)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/debugfs.plugin’ (pid 2908195) disconnected after 256026 successful data collections.
Jan 13 06:46:58 . netdata[2907806]: PARSER: read failed: POLLHUP.
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967292)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/apps.plugin’ (pid 2908214) disconnected after 112293111 successful data collections.
Jan 13 06:46:58 . spawn-plugins[2907835]: SPAWN SERVER: child with pid 3645342 (request 48) killed by signal 15: /bin/sh -c “exec /usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1”
Jan 13 06:46:58 . spawn-plugins[2907835]: SPAWN SERVER: child with pid 2908195 (request 10) killed by signal 15: /bin/sh -c "exec /usr/libexec/netdata/plugins.d/debugfs.plugin 1 "
Jan 13 06:46:58 . spawn-plugins[2907835]: SPAWN SERVER: child with pid 2908214 (request 11) killed by signal 15: /bin/sh -c "exec /usr/libexec/netdata/plugins.d/apps.plugin 1 "
Jan 13 06:46:58 . netdata[2907806]: Shutting down command server.
Jan 13 06:46:58 . netdata[2907806]: Shutting down command event loop.
Jan 13 06:46:58 . netdata[2907806]: PARSER: read failed: POLLHUP.
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967292)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/nfacct.plugin’ (pid 2908218) disconnected after 426705 successful data collections.
Jan 13 06:46:58 . netdata[2907806]: Shutting down command loop complete.
Jan 13 06:46:58 . netdata[2907806]: Command server has stopped.
Jan 13 06:46:58 . netdata[2907806]: Flushing DBENGINE dirty pages…
Jan 13 06:46:58 . netdata[2907806]: NETDATA SHUTDOWN: initializing shutdown with code 0…
Jan 13 06:46:58 . netdata[2907806]: Shutdown process started
Jan 13 06:46:58 . netdata[2907806]: shutdown step: [1/24] - {at off} started ‘create shutdown file’…
Jan 13 06:46:58 . netdata[2907806]: PARSER: read failed: POLLHUP.
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967292)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/systemd-journal.plugin’ (pid 2908202) disconnected after 3 successful data collections.
Jan 13 06:46:58 . spawn-plugins[2907835]: SPAWN SERVER: child with pid 2908202 (request 9) killed by signal 15: /bin/sh -c "exec /usr/libexec/netdata/plugins.d/systemd-journal.plugin 1 "
Jan 13 06:46:58 . netdata[2907806]: PARSER: read failed: POLLHUP.
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967292)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/go.d.plugin’ (pid 2908192) disconnected after 3840854 successful data collections.
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: closing all web server sockets…
Jan 13 06:46:58 . netdata[2907806]: all static web threads stopped.
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: cleaning up…
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, stopping plugin thread: plugin:ebpf
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: cleanup completed.
Jan 13 06:46:58 . netdata[2907806]: stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends
Jan 13 06:46:58 . netdata[2907806]: PARSER: thread cancelled while waiting for data.
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: buffered reader not OK (4294967288)
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/ebpf.plugin’ (pid 2908187) disconnected after 136545 successful data collections.
Jan 13 06:46:58 . ebpf.plugin[2908187]: Read error on stdin
Jan 13 06:46:58 . spawn-plugins[2907835]: SPAWN SERVER: child with pid 2908187 (request 5) exited with exit code 1: /bin/sh -c "exec /usr/libexec/netdata/plugins.d/ebpf.plugin 1 "
Jan 13 06:46:58 . netdata[2907806]: PLUGINSD: ‘host:.’, ‘/usr/libexec/netdata/plugins.d/ebpf.plugin’ (pid 2908187) exited with error code 1, but has given useful output in the past (136545 times). Waiting a bit before starting it again.
Jan 13 06:46:58 . netdata[2907806]: ACLK SYNC: Shutting down ACLK synchronization event loop
Jan 13 06:46:59 . netdata[2907806]: waiting for discovery thread to finish…
Jan 13 06:46:59 . netdata[2907806]: discovery thread stopped
Jan 13 06:46:59 . netdata[2907806]: STATSD: data collection thread 1 found stopped.
Jan 13 06:46:59 . netdata[2907806]: STATSD: closing sockets…
Jan 13 06:46:59 . netdata[2907806]: STATSD: cleanup completed.
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [1/24] - {at off} finished ‘create shutdown file’ in 1s 465ms 659us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [2/24] - {at 1s 466ms 573us} started ‘destroy main spawn server’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [2/24] - {at 1s 466ms 573us} finished ‘destroy main spawn server’ in 18us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [3/24] - {at 1s 466ms 595us} started ‘dbengine exit mode’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [3/24] - {at 1s 466ms 595us} finished ‘dbengine exit mode’ in 4us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [4/24] - {at 1s 466ms 604us} started ‘close webrtc connections’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [4/24] - {at 1s 466ms 604us} finished ‘close webrtc connections’ in 768us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [5/24] - {at 1s 467ms 376us} started ‘disable maintenance, new queries, new web requests, new streaming connections and aclk’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [5/24] - {at 1s 467ms 376us} finished ‘disable maintenance, new queries, new web requests, new streaming connections and aclk’ in 4us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [6/24] - {at 1s 467ms 383us} started ‘stop maintenance thread’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [6/24] - {at 1s 467ms 383us} finished ‘stop maintenance thread’ in 3us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [7/24] - {at 1s 467ms 389us} started ‘stop exporters, health and web servers threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [7/24] - {at 1s 467ms 389us} finished ‘stop exporters, health and web servers threads’ in 3us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [8/24] - {at 1s 467ms 394us} started ‘stop collectors and streaming threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [8/24] - {at 1s 467ms 394us} finished ‘stop collectors and streaming threads’ in 901us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [9/24] - {at 1s 468ms 307us} started ‘stop replication threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [9/24] - {at 1s 468ms 307us} finished ‘stop replication threads’ in 3us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [10/24] - {at 1s 468ms 313us} started ‘disable ML detection and training threads’…
Jan 13 06:46:59 . netdata[2907806]: ML: Closing sqlite database
Jan 13 06:46:59 . netdata[2907806]: All threads finished.
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [10/24] - {at 1s 468ms 313us} finished ‘disable ML detection and training threads’ in 79ms 764us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [11/24] - {at 1s 548ms 129us} started ‘stop context thread’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [11/24] - {at 1s 548ms 129us} finished ‘stop context thread’ in 9us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [12/24] - {at 1s 548ms 142us} started ‘clear web client cache’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [12/24] - {at 1s 548ms 142us} finished ‘clear web client cache’ in 4us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [13/24] - {at 1s 548ms 150us} started ‘stop aclk threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [13/24] - {at 1s 548ms 150us} finished ‘stop aclk threads’ in 4us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [14/24] - {at 1s 548ms 158us} started ‘stop all remaining worker threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [14/24] - {at 1s 548ms 158us} finished ‘stop all remaining worker threads’ in 4us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [15/24] - {at 1s 548ms 165us} started ‘cancel main threads’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [15/24] - {at 1s 548ms 165us} finished ‘cancel main threads’ in 485us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [16/24] - {at 1s 548ms 654us} started ‘prepare metasync shutdown’…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [16/24] - {at 1s 548ms 654us} finished ‘prepare metasync shutdown’ in 397us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [17/24] - {at 1s 549ms 55us} started ‘stop collection for all hosts’…
Jan 13 06:46:59 . netdata[2907806]: Flushing DBENGINE dirty pages…
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [17/24] - {at 1s 549ms 55us} finished ‘stop collection for all hosts’ in 13ms 996us
Jan 13 06:46:59 . netdata[2907806]: shutdown step: [18/24] - {at 1s 565ms 600us} started ‘wait for dbengine collectors to finish’…
Jan 13 06:46:59 . netdata[2907806]: DBENGINE: flushing at 1.79% { hot: off, dirty: 14.89MiB }…
Jan 13 06:47:00 . netdata[2907806]: DBENGINE: flushing completed!
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [18/24] - {at 1s 565ms 600us} finished ‘wait for dbengine collectors to finish’ in 101ms 857us
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [19/24] - {at 1s 667ms 467us} started ‘stop dbengine tiers’…
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [19/24] - {at 1s 667ms 467us} finished ‘stop dbengine tiers’ in 886us
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [20/24] - {at 1s 668ms 368us} started ‘stop metasync threads’…
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [20/24] - {at 1s 668ms 368us} finished ‘stop metasync threads’ in 112us
Jan 13 06:47:00 . netdata[2907806]: shutdown step: [21/24] - {at 1s 668ms 485us} started ‘close SQL databases’…
Jan 13 06:47:00 . netdata[2907806]: No statements pending to finalize
Jan 13 06:47:00 . netdata[2907806]: CONTEXT: Closing sqlite database
Jan 13 06:47:00 . netdata[2907806]: METADATA: Closing sqlite database
Thanks for the update.Netdata is receiving the SIGTERM signal and started restarting the service.Not sure which one send s this signal.
Jan 15 06:56:03 . netdata[3669415]: DBENGINE: reclaimed 6585848 bytes of disk space.
Jan 15 07:19:00 . cgroup-name.sh[1033662]: cgroup ‘user.slice_user-1000.slice_session-17780.scope’ is called ‘user.slice_user-1000.slice_session-17780.scope’, labels ‘’
Jan 15 07:20:05 . netdata[3669415]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 15 07:20:05 . netdata[3669415]: Shutting down command server.
Jan 15 07:20:06 . netdata[3669813]: level=info msg=“received terminated signal (15). Terminating…” plugin=go.d component=agent
Pattern for restarting the service:
Jan 03 07:02:08 . netdata[11725]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 04 06:53:08 . netdata[14536]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 06 07:13:12 . netdata[776693]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 08 06:59:08 . netdata[2324260]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 08 12:44:03 . netdata[3853911]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 09 06:49:19 . netdata[2425]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 10 07:11:17 . netdata[585917]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 11 06:38:44 . netdata[1369595]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 12 07:04:32 . netdata[2123410]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 13 06:46:58 . netdata[2907806]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Jan 15 07:20:05 . netdata[3669415]: SIGNAL: Received SIGTERM. Cleaning up to exit…
Ok Fine.But why in that case all previous data are lost.it showing empty and starting monotring from time of restart.Our application statistics is getting lost.Also please let us know is there way to disable auto update ?