Constant gaps in all charts

Hello, I recently installed a fresh copy of Ubuntu LTS 20.04 with a virtualmin server configured on it, running a couple websites. I’m having a strange issue with the netdata dashboard on both the agent and cloud side of things where every minute or so I get a gap of a few seconds of lost data - on everything. I’ve tried googling this issue and cannot seem to find other occurrences like mine.

The odd thing is - my server is hardly being utilized. It is using 5% CPU at max, and has plenty of RAM and bandwidth available. The gaps also do appear to be pretty consistent, appearing every 30-35 seconds. Does anyone have any ideas about what may be causing this?

Hi @Kalradia , welcome to our community!
Could you please share some extra context with me, to understand whether this is a bug? What’s your

  • OS / Environment
    To get this information, execute the following commands based on your operating system:
    uname -a; grep -Hv "^#" /etc/*release # Linux
    uname -a; uname -K # BSD
    uname -a; sw_vers # macOS
  • Netdata version
    Provide output of netdata -W buildinfo.
  • Installation method
  • Error logs
    grep -i ACLK /var/log/netdata/error.log

Also, could you please send me a screenshot of the Cloud UI, to see how this gap is virtualized?
Thanks!

Hey Georgia, thank you!

OS/Environment:

5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
/etc/lsb-release:DISTRIB_ID=Ubuntu
/etc/lsb-release:DISTRIB_RELEASE=20.04
/etc/lsb-release:DISTRIB_CODENAME=focal
/etc/lsb-release:DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
/etc/os-release:NAME="Ubuntu"
/etc/os-release:VERSION="20.04.3 LTS (Focal Fossa)"
/etc/os-release:ID=ubuntu
/etc/os-release:ID_LIKE=debian
/etc/os-release:PRETTY_NAME="Ubuntu 20.04.3 LTS"
/etc/os-release:VERSION_ID="20.04"
/etc/os-release:HOME_URL="https://www.ubuntu.com/"
/etc/os-release:SUPPORT_URL="https://help.ubuntu.com/"
/etc/os-release:BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
/etc/os-release:PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
/etc/os-release:VERSION_CODENAME=focal
/etc/os-release:UBUNTU_CODENAME=focal

Netdata:

Version: netdata v1.32.1-30-nightly
Configure options:  '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--with-bundled-lws' '--with-bundled-protobuf' 'CFLAGS=-O2' 'LDFLAGS='
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES
    ACLK Next Generation:       YES
    ACLK-NG New Cloud Protocol: YES
    ACLK Legacy:                YES
    TLS Host Verification:      YES
    Machine Learning:           YES
Libraries:
    protobuf:                YES (bundled)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    LWS:                     YES static v3.2.2
    mosquitto:               YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    YES
    IPMI:                    NO
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: NO

Installation method: I installed via the long command that was given when I opened an account on the netdata cloud website. It was something about claiming a node.

Error log:

2021-12-29 02:30:40: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 02:30:40: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : Attempting connection now
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 02:30:40: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 02:30:41: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 02:30:41: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 02:30:41: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 02:30:41: netdata INFO  : ACLK_Main : ACLK connection successfully established
2021-12-29 02:30:41: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 05:49:00: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : Attempting connection now
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 05:49:00: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 05:49:01: netdata INFO  : ACLK_Main : ACLK connection successfully established
2021-12-29 05:49:01: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 06:15:10: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 06:15:10: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 06:15:10: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 06:15:10: netdata INFO  : ACLK_Main : Attempting connection now
2021-12-29 06:15:10: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 06:15:10: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 06:15:11: netdata INFO  : ACLK_Main : ACLK connection successfully established
2021-12-29 06:15:11: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 07:04:52: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 07:04:52: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 07:04:52: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 07:04:52: netdata INFO  : ACLK_Main : Attempting connection now
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 07:04:53: netdata INFO  : ACLK_Main : ACLK connection successfully established
2021-12-29 07:04:53: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.
2021-12-29 07:29:34: netdata ERROR : ACLK_Main : Connection Error or Dropped
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : Attempting connection now
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-12-29 07:29:34: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-12-29 07:29:35: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2021-12-29 07:29:35: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2021-12-29 07:29:35: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-12-29 07:29:35: netdata INFO  : ACLK_Main : ACLK connection successfully established
2021-12-29 07:29:35: netdata ERROR : ACLK_Main : Sending `connect` payload immediately as popcorning was finished already.

Here is what it looks like on the cloud website:

It may also be worth noting that I get a lot of these emails throughout the day: “Warning, python.d_job_last_collected_secs = 6 seconds ago”

Can you post a screenshot on the Agent as well? I assume the gaps appear at the same spots. Doesn’t seem to be Cloud/ACLK related… The first thing that came to my mind was resource starvation but you already mentioned it is mostly idle.

Are those data from the agent itself? Do you by any chance have streaming (parent/child) enabled and those charts are of a child?

@underhood Here is a screenshot from the agent. The gaps are roughly in the same spots, yes. I believe I’ve noticed a couple discrepancies from time to time, but the mass majority is similar.

As far as the parent/child question goes… this is my only server. So I imagine it wouldn’t be a parent/child thing, unless there’s something I’m unaware of. I only ran the install process as normal a few days after configuring my ubuntu and virtualmin setup.

Would you be willing to share an error.log from this agent? timotej [snake thing] netdata [dot] cloud

or you can find me on our netdata discord thing

Well this is super weird. I refreshed the agent page and I was able to see everything that was missing before. And it appears to be constant now. I was unable to get a screenshot in time, but this one starts around where the previous ended.

And now the cloud is working too. Did you guys just do something? This is so strange.

OK I am confused now TBH :smiley: How long you had this issue? Could it be some browser quirk that went away after refresh?

To confuse you even further, I just had an email informing me of an alert where it was unable to get data for 8 seconds. So I followed the link in the email and it opened a page where the data is missing. I now have 2 tabs open where one is showing the full data, and the newest tab is showing data with gaps.

Hey @Kalradia , could you please take a snapshot from the agent? It will really help us troubleshoot this odd issue.
Thanks a lot!

Sure, the agent appears to be doing it again, too. Which is so odd, because I still have the cloud tab open that is working just fine.

Here’s a screenshot of the agent (left) against the cloud tab (right) that is currently working:

You can use the instructions here to export the snapshot Import, export, and print a snapshot | Learn Netdata and then email us the file to support(@) netdata (.) cloud .

My bad, I misunderstood right after I sent that message. I will email the file in a moment.

Could you share with us logs from /var/log/netdata? Is bare metal machine netdata is running on or is it VM?

I sent the whole folder in an email to the above support address. Please be aware that the server has been offline for a few days and I just cut it back on this morning.

It is a bare metal server, no VM.

Thank you @kalradia, I’ve passed the files to our engineering team.

@Kalradia we haven’t yet been able to pinpoint where the issue is, but we suspect there could be some time drifting/syncing problems.

In order to further investigate this could you ran the following command and share the result?

timedatectl status 
timedatectl timesync-status

Here are the command results:

               Local time: Fri 2022-01-07 08:57:48 CST
           Universal time: Fri 2022-01-07 14:57:48 UTC
                 RTC time: Fri 2022-01-07 14:57:47
                Time zone: America/Chicago (CST, -0600)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no
       Server: 91.189.91.157 (ntp.ubuntu.com)
Poll interval: 32s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: 8E036402
    Precision: 1us (-24)
Root distance: 65.360ms (max: 5s)
       Offset: +5.495106s
        Delay: 25.912ms
       Jitter: 721.234ms
 Packet count: 31
    Frequency: +0.253ppm

Hi @kalradia!

To continue on your last output, can you also please paste the output of journalctl -u systemd-timesyncd ? It should show if the timesynd daemon is fixing your clock. Also, if you could also watch for a while the output of command watch 'timedatectl status && timedatectl timesync-status' and check if your RTC time appears to drift over time?

Thank you very much!

Sure thing. Here’s the first command:

-- Reboot --
Jan 11 12:00:19 *** systemd[1]: Starting Network Time Synchronization...
Jan 11 12:00:19 *** systemd[1]: Started Network Time Synchronization.
Jan 11 12:00:23 *** systemd-timesyncd[743]: Network configuration changed, trying to establish connection.
Jan 11 12:00:24 *** systemd-timesyncd[743]: Network configuration changed, trying to establish connection.
Jan 11 12:00:52 *** systemd-timesyncd[743]: Initial synchronization to time server 91.189.89.199:123 (ntp.ubuntu.com).
Jan 11 16:00:34 *** systemd[1]: Stopping Network Time Synchronization...
Jan 11 16:00:34 *** systemd[1]: systemd-timesyncd.service: Succeeded.
Jan 11 16:00:34 *** systemd[1]: Stopped Network Time Synchronization.
-- Reboot --
Jan 12 08:03:34 *** systemd[1]: Starting Network Time Synchronization...
Jan 12 08:03:34 *** systemd[1]: Started Network Time Synchronization.
Jan 12 08:03:38 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:03:38 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:03:39 *** systemd-timesyncd[740]: Network configuration changed, trying to establish connection.
Jan 12 08:04:09 *** systemd-timesyncd[740]: Initial synchronization to time server 91.189.91.157:123 (ntp.ubuntu.com).

It’s not allowing me to run the second command “Failed to create bus connection: No such file or directory”. However, I did use the date command and didn’t notice anything funny. It stays in sync with my computer clock.