Hello, I recently installed a fresh copy of Ubuntu LTS 20.04 with a virtualmin server configured on it, running a couple websites. I’m having a strange issue with the netdata dashboard on both the agent and cloud side of things where every minute or so I get a gap of a few seconds of lost data - on everything. I’ve tried googling this issue and cannot seem to find other occurrences like mine.
The odd thing is - my server is hardly being utilized. It is using 5% CPU at max, and has plenty of RAM and bandwidth available. The gaps also do appear to be pretty consistent, appearing every 30-35 seconds. Does anyone have any ideas about what may be causing this?
Hi @Kalradia , welcome to our community!
Could you please share some extra context with me, to understand whether this is a bug? What’s your
OS / Environment
To get this information, execute the following commands based on your operating system:
– uname -a; grep -Hv "^#" /etc/*release # Linux
– uname -a; uname -K # BSD
– uname -a; sw_vers # macOS
Netdata version
Provide output of netdata -W buildinfo.
Version: netdata v1.32.1-30-nightly
Configure options: '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--with-bundled-lws' '--with-bundled-protobuf' 'CFLAGS=-O2' 'LDFLAGS='
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK Next Generation: YES
ACLK-NG New Cloud Protocol: YES
ACLK Legacy: YES
TLS Host Verification: YES
Machine Learning: YES
Libraries:
protobuf: YES (bundled)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
LWS: YES static v3.2.2
mosquitto: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: YES
IPMI: NO
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: NO
Installation method: I installed via the long command that was given when I opened an account on the netdata cloud website. It was something about claiming a node.
Error log:
2021-12-29 02:30:40: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 02:30:40: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 02:30:40: netdata INFO : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 02:30:40: netdata INFO : ACLK_Main : Attempting connection now
2021-12-29 02:30:40: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 02:30:40: netdata INFO : ACLK_Main : Getting Cloud /env successful
2021-12-29 02:30:40: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 02:30:40: netdata INFO : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 02:30:41: netdata INFO : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 02:30:41: netdata INFO : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 02:30:41: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 02:30:41: netdata INFO : ACLK_Main : ACLK connection successfully established
2021-12-29 02:30:41: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 05:49:00: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 05:49:00: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 05:49:00: netdata INFO : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 05:49:00: netdata INFO : ACLK_Main : Attempting connection now
2021-12-29 05:49:00: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 05:49:00: netdata INFO : ACLK_Main : Getting Cloud /env successful
2021-12-29 05:49:00: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 05:49:00: netdata INFO : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 05:49:00: netdata INFO : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 05:49:00: netdata INFO : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 05:49:00: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 05:49:01: netdata INFO : ACLK_Main : ACLK connection successfully established
2021-12-29 05:49:01: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 06:15:10: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 06:15:10: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 06:15:10: netdata INFO : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 06:15:10: netdata INFO : ACLK_Main : Attempting connection now
2021-12-29 06:15:10: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 06:15:10: netdata INFO : ACLK_Main : Getting Cloud /env successful
2021-12-29 06:15:11: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 06:15:11: netdata INFO : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 06:15:11: netdata INFO : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 06:15:11: netdata INFO : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 06:15:11: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 06:15:11: netdata INFO : ACLK_Main : ACLK connection successfully established
2021-12-29 06:15:11: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 07:04:52: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 07:04:52: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 07:04:52: netdata INFO : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 07:04:52: netdata INFO : ACLK_Main : Attempting connection now
2021-12-29 07:04:53: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:04:53: netdata INFO : ACLK_Main : Getting Cloud /env successful
2021-12-29 07:04:53: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:04:53: netdata INFO : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 07:04:53: netdata INFO : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 07:04:53: netdata INFO : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 07:04:53: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 07:04:53: netdata INFO : ACLK_Main : ACLK connection successfully established
2021-12-29 07:04:53: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 07:29:34: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 07:29:34: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 07:29:34: netdata INFO : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 07:29:34: netdata INFO : ACLK_Main : Attempting connection now
2021-12-29 07:29:34: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:29:34: netdata INFO : ACLK_Main : Getting Cloud /env successful
2021-12-29 07:29:34: netdata INFO : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:29:34: netdata INFO : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 07:29:35: netdata INFO : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 07:29:35: netdata INFO : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 07:29:35: netdata INFO : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 07:29:35: netdata INFO : ACLK_Main : ACLK connection successfully established
2021-12-29 07:29:35: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
Can you post a screenshot on the Agent as well? I assume the gaps appear at the same spots. Doesn’t seem to be Cloud/ACLK related… The first thing that came to my mind was resource starvation but you already mentioned it is mostly idle.
Are those data from the agent itself? Do you by any chance have streaming (parent/child) enabled and those charts are of a child?
@underhood Here is a screenshot from the agent. The gaps are roughly in the same spots, yes. I believe I’ve noticed a couple discrepancies from time to time, but the mass majority is similar.
As far as the parent/child question goes… this is my only server. So I imagine it wouldn’t be a parent/child thing, unless there’s something I’m unaware of. I only ran the install process as normal a few days after configuring my ubuntu and virtualmin setup.
Well this is super weird. I refreshed the agent page and I was able to see everything that was missing before. And it appears to be constant now. I was unable to get a screenshot in time, but this one starts around where the previous ended.
To confuse you even further, I just had an email informing me of an alert where it was unable to get data for 8 seconds. So I followed the link in the email and it opened a page where the data is missing. I now have 2 tabs open where one is showing the full data, and the newest tab is showing data with gaps.
I sent the whole folder in an email to the above support address. Please be aware that the server has been offline for a few days and I just cut it back on this morning.
Local time: Fri 2022-01-07 08:57:48 CST
Universal time: Fri 2022-01-07 14:57:48 UTC
RTC time: Fri 2022-01-07 14:57:47
Time zone: America/Chicago (CST, -0600)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
To continue on your last output, can you also please paste the output of journalctl -u systemd-timesyncd ? It should show if the timesynd daemon is fixing your clock. Also, if you could also watch for a while the output of command watch 'timedatectl status && timedatectl timesync-status' and check if your RTC time appears to drift over time?
-- Reboot --
Jan 11 12:00:19 *** systemd[1]: Starting Network Time Synchronization...
Jan 11 12:00:19 *** systemd[1]: Started Network Time Synchronization.
Jan 11 12:00:23 *** systemd-timesyncd[743]: Network configuration changed, trying to establish connection.
Jan 11 12:00:24 *** systemd-timesyncd[743]: Network configuration changed, trying to establish connection.
Jan 11 12:00:52 *** systemd-timesyncd[743]: Initial synchronization to time server 91.189.89.199:123 (ntp.ubuntu.com).
Jan 11 16:00:34 *** systemd[1]: Stopping Network Time Synchronization...
Jan 11 16:00:34 *** systemd[1]: systemd-timesyncd.service: Succeeded.
Jan 11 16:00:34 *** systemd[1]: Stopped Network Time Synchronization.
-- Reboot --
Jan 12 08:03:34 *** systemd[1]: Starting Network Time Synchronization...
Jan 12 08:03:34 *** systemd[1]: Started Network Time Synchronization.
Jan 12 08:03:38 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:03:38 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:03:39 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:04:09 *** systemd-timesyncd[740]: Initial synchronization to time server 91.189.91.157:123 (ntp.ubuntu.com).
It’s not allowing me to run the second command “Failed to create bus connection: No such file or directory”. However, I did use the date command and didn’t notice anything funny. It stays in sync with my computer clock.