Missing system.cpu graph in Netdata cloud

Problem/Question

I have a streaming netdata setup with 1 parent and 3 child nodes. One of the nodes’ (named phoenix below)system.cpu data is missing from Netdata cloud.

Relevant docs you followed/actions you took to solve the issue

  1. Restarted netdata agent (both parent and child)
  2. Went through error logs on parent, no errors that stand out.
  3. Went through error logs on child, no errors that stand out.
  4. Went through the local agent dashboard, system.cpu shows up fine.
  5. Other metrics also show up fine.

Environment/Browser/Agent’s version etc

Environment: Proxmox VE 7.3
Agent version: v1.39.1
Browsers: Firefox, Edge

What I expected to happen

The system.cpu metric should be displayed.

hi @bluefog, thanks for contacting us,

At this moment we see that the system.cpuchart at phoenix is collected properly, so data should be returned to the browser. Could you try again in order to discard it was a temporary issue?

Thanks
Juan

Hey @Juan - thanks for taking a look. It seems like there’s some replication lag/delay which affects some metrics depending on when I observe the dashboard. For example, right now I see the system.load and system.net graphs missing data for last about 5 hours for this node.

It looks like something specific for this node. Are there any tweaks I can try to avoid this delay?

Hi @bluefog

A couple of questions: The children are not claimed themselves, right? Only the parent is?

Could you check the phoenix’s local dashboard through the parent? I.e. open parent’s local dashboard, then click on phoenix and check if the metrics missing from the cloud appear there? I know you did so in point 4, but not sure if you checked the child’s dashboard or through the parent. This check should rule out or not streaming issues from child to parent.

Thanks!

Hey @Manolis_Vasilakis

Yes, only the parent (artemis) is added to cloud, rest are not.

Right now I see the following in the overview (now its missing data for a different node).

The agents local dashboard (arete) shows up as this:

The dashboard through the parent (artemis) shows up as this (missing data here):

So the problem seems to be between the parent → child connection here? I have reduced the collection rate with update every 5 to see if that helps.

Following up on this, have been checking up on the graphs for a few times and it seems like the issue is fixed. Reducing the collection rate to every 5 seconds seems to have helped avoid the replication lag/missing data here. Thanks @Juan @Manolis_Vasilakis for taking a look!