Metrics missing for nodes in cloud

Problem/Question

In cloud platform viewing list of nodes their metric data is missing. Viewing a specific node no metrics are shown (blank page.)

Environment/Browser

netdata v1.31.0
Brave Version 1.29.77 Chromium: 93.0.4577.63 (Official Build) (64-bit)
Edge Version 93.0.961.38 (Official build) (64-bit)

New Kubernetes install using Helm chart with claim options. All nodes claimed and visible in cloud.

What I expected to happen

Metrics for nodes to be available and to reflect what is visible locally.

Connecting locally to the parent instance I am able to view the nodes and their metrics without issue.

Node list:


Node view:

Local:

# Helm values.yaml
service:
  type: ClusterIP
  port: 19999
  annotations: {}
# Using ProjectContour/HTTPProxy
ingress:
  enabled: false
rbac:
  create: true
  pspEnabled: true
serviceAccount:
  create: true
  name: netdata
parent:
  env:
    DO_NOT_TRACK: 1
  database:
    persistence: true
    storageclass: "rook-ceph-block"
    volumesize: 8Gi
  alarms:
    persistence: true
    storageclass: "rook-ceph-block"
    volumesize: 200Mi
  claiming:
    enabled: true
    token: "..."
    rooms:
      - "..."
    url: "https://app.netdata.cloud"
child:
  env:
    DO_NOT_TRACK: 1
  claiming:
    enabled: true
    token: "..."
    rooms:
      - "..."
    url: "https://app.netdata.cloud"

Please go to http://yourhost:19999/api/v1/info and look for the lines like the following, towards the end:

	"cloud-available": true,
	"aclk-implementation": "legacy",
	"agent-claimed": true,
	"aclk-available": true,

If you see all of flags as true, then we’ll need to do some debugging.
You can try restarting the node with the problem and see if you start getting back the charts.
Please let us know if it continues to happen, so we can investigate further.

Checking the API page I do see those values set:

	"cloud-enabled": true,
	"cloud-available": true,
	"aclk-implementation": "legacy",
	"agent-claimed": true,
	"aclk-available": true,

I have restarted all the netdata pods but no change in result.

This is very strange. Please take the entire error.log from one of those nodes and send it to chris@netdata.cloud. We’ll need to investigate further.

Ok, I know what’s happening. You can see data for the parent only, but not for the children, even though you’ve claimed the children. We have an issue that we will need to address ASAP in the helmchart, where the children don’t actually store any of their collected data and the request being done to the parent for the data isn’t handled properly either. Please first try in your values.yaml the suggestions in Memory modes in parent and child should be changed · Issue #229 · netdata/helmchart · GitHub (especially the mode ram for the chidlren) and redeploy the helm chart. I expect that you’ll start seeing the charts for the children nodes as well after that.

I updated the values for the helm chart as suggested.

parent:
  env:
    DO_NOT_TRACK: 1
  configs:
    netdata:
      enabled: true
      path: /etc/netdata/netdata.conf
      data: |
        [global]
          memory mode = dbengine
        [plugins]
          cgroups = no
          tc = no
          enable running new plugins = no
          check for new plugins every = 72000
          python.d = no
          charts.d = no
          go.d = no
          node.d = no
          apps = no
          proc = no
          idlejitter = no
          diskspace = no

child:
  env:
    DO_NOT_TRACK: 1
  configs:
    netdata:
      enabled: true
      path: /etc/netdata/netdata.conf
      data: |
        [global]
          memory mode = ram
        [health]
          enabled = no

After upgrading the deployment with the new values I am now seeing data in cloud as expected. Thank you for your assistance.

Awesome! PR changing default memory modes by M4itee · Pull Request #230 · netdata/helmchart · GitHub was just merged and fixes the defaults. Apologies for the inconvenience, we should have caught this earlier.