About the Help category

If you are encountering a problem or you have a question about Netdata Agent and Netdata Cloud, this is the place to discuss it.

Node connected and collecting data locally, but Netdata Cloud shows dashes (no metrics) — infinite checkpoint/snapshot loop**


Hello,

I have two identical VPS servers (OVH, Debian 12, HestiaCP 1.9, Netdata v2.9.0) in the same Space. The second node (vps2.alte.pl) works perfectly and shows all metrics in Netdata Cloud. The first node (vps1.alte.pl) connects successfully but never shows any metrics — only dashes.

Node details:

  • Hostname: vps1.alte.pl
  • node_id: 85b3879e-ee03-402c-a844-79024a79ba4b
  • claim_id: 844e427a-a3dd-4c2c-aaff-689067be12bc
  • uid: 9ca2c3d3-fdf9-4c5b-bf2b-5e2d6abd9d7f

What works:

  • ACLK connection is established successfully (ACLK: connection successfully established)
  • Agent is claimed (agent-claimed: true, aclk-available: true)
  • Local data collection works perfectly — system.cpu, system.ram, system.net, disk.* and 1800+ charts are all available via the local API at http://127.0.0.1:19999
  • Example: curl http://127.0.0.1:19999/api/v1/data?chart=system.cpu returns real values

What does NOT work:

  • Netdata Cloud shows dashes for all metrics (CPU, Memory, Load, Disk Read)
  • The node is stuck in an infinite loop visible in the logs

Error from journalctl:

RRDCONTEXT: received checkpoint command for claim id '844e427a-a3dd-4c2c-aaff-689067be12bc', node id '85b3879e-ee03-402c-a844-79024a79ba4b', while node 'vps1.alte.pl' has an active context streaming.
RRDCONTEXT: received version hash 1710 for host 'vps1.alte.pl', does not match our version hash 1830. Sending snapshot of all contexts.
NODES INFO: 0 nodes loading contexts, 0 receiving replication, 0 sending replication, 1 pending context post processing (host 'vps1.alte.pl')

This loop repeats continuously — Cloud sends a checkpoint command while the agent is already streaming contexts, causing a version hash mismatch, which triggers a new snapshot, which causes another checkpoint, and so on.

What I have already tried:

  • Multiple reinstalls (full purge including /var/lib/netdata, /var/cache/netdata, /etc/netdata)
  • Clearing all cache: rm -rf /var/cache/netdata/* /var/lib/netdata/context* /var/lib/netdata/db
  • Removing cloud identity: rm -rf /var/lib/netdata/cloud.d/ /var/lib/netdata/registry/
  • Generating a new UUID manually via uuidgen
  • Re-claiming multiple times with netdata-claim.sh
  • Each time a new node_id and claim_id are generated, the same loop immediately starts again

My conclusion:
This appears to be a state issue on the Netdata Cloud backend side. The Cloud keeps sending checkpoint commands that conflict with the agent’s active context streaming, and this cannot be resolved from the agent side alone.

Could you please manually clear/reset the state for this node on the backend? I would really appreciate your help.

Thank you!

Hello,

Thank you for the detailed report and for the troubleshooting steps you’ve already taken.

We’ve investigated on the Cloud backend side and don’t see a loop from our end (only one context snapshot has been processed for this node, and there are no errors in our logs). This suggests the issue may be on the agent side rather than the backend.

To help us narrow this down, could you run the following command on vps1.alte.pl while the loop is actively occurring:

curl -s 'http://127.0.0.1:19999/api/v1/contexts?options=queue,flags'

This will show us whether any specific contexts are constantly being flagged for update on the agent side, which would explain the version hash mismatch you’re seeing. Please share the full output.

Kind Regards,
Kanela