Unable to see child nodes of in Netdata Cloud

Problem/Question

I installed Netdata agents on AWS EKS using the official helm chart using the command provided from a created war room:

helm install netdata netdata/netdata
–set image.tag=stable
–set parent.claiming.enabled=“true”
–set parent.claiming.token=XXX
–set parent.claiming.rooms=9b74ca59-d5f2-4846-967c-ff3e0283147a
–set child.claiming.enabled=“true”
–set child.claiming.token=XXX
–set child.claiming.rooms=9b74ca59-d5f2-4846-967c-ff3e0283147a

However, when I open the specific room->Nodes there are only 2 nodes:
netdata-k8s-state-6b9dbf4564-vfpd2
netdata-parent-758bff4c75-h5l2t

When I open the room with all nodes I see there the 2 nodes from above + 3 actual nodes (my EKS has 3 nodes). Tried to delete the room and install everything again, but it didn’t help. I checked that daemon set for netdata agent has the properly set env var NETDATA_CLAIM_ROOMS with the same room id.

Environment/Browser/Agent’s version etc

AWS EKV v.1.23
Helm netdata-3.7.43
App v1.38.1

What I expected to happen

When I open the war room I should see all Kubernetes nodes

Apologies for the late reply @alex.grinko , this was missed between some other issues.
It should be impossible for the parent node to be in that room but the children not. We’ll investigate and let you know.

Hello @alex.grinko :wave:

thanks for using Netdata and for reaching out to us.

So we tried to reproduce your case using a AWS EKS cluster with k8s version of 1.23.16 and we managed to successfully see the child nodes in the provided room (as well as the all nodes room of course)

A couple of questions/verifications:

  1. Make sure to include a double-dash in each set argument (instead of a single one)
  2. Also a backslash preceded by a space for each line of the command in case you did not execute the command in a single line.
  3. Are you using any kind of a service mesh (for example HCP Consul)?
  4. Could you please share the helm template you used?

Thanks in advance.

Hello papazach, and thank you for the reply.
Here are the answers:

  1. I checked that all additional set parameters in the helm command are successfully applied for netdata daemon set. I see the following env vars:
   - name: NETDATA_CLAIM_URL
     value: https://api.netdata.cloud
   - name: NETDATA_CLAIM_TOKEN
     value: XXX
   - name: NETDATA_CLAIM_ROOMS
     value: 9b74ca59-d5f2-4846-967c-ff3e0283147a

So room id is correctly passed to netdata agent
2) I tried to execute helm install command as in a single line and it didn’t help
3) No service mesh. Just a brand new EKS v1.23.14
4) I’m using netdata/netdata helm chart v3.7.45 from Netdata Agent Helmchart Repo for Kubernetes (k8s)
I can invite you to our war room if it gives you more debug information

Maybe room id isn’t correct? I take it from Room->Nodes->Add nodes->Kubernetes->Helm

Nope the room id seems ok.
I’ve looped in some SRE reinforcements to help us understand what is the issue here exactly.

Could you try and completely remove the chart (making sure that all PVs are removed as well) and retry?

In the meantime I could also provide a slight workaround to achieve the same result as we try to solve this.

You can navigate to room 9b74ca59-d5f2-4846-967c-ff3e0283147a and hit the cog on the top of the screen:

Then navigate to nodes:

and in the right hand side from the space nodes pool you can pick the nodes you want to have in the room.

Hey papazach,

I deleted the helm chart and then all PV with netdata in the name. Installed again and got the same issue.
Thank you for the hint about adding nodes manually, it worked!

Hello @alex.grinko

In the default setup, that you seem to have, Childs that are working as a daemonset, put their data into a kubernetes node’s dir directly. This is done to keep the data about claiming to be dependent on the node itself - node will appear under the same host in a cloud offering and you will have 0 problems with the data history. In short it allows users for updates and restarts without mentioned this to worry about.
My thinking here is, that there is a small possibility that during the experimentation or maybe by bug on our side (that we will need to check), your configs are simply wrong (old) and for some reason they are not updating themselves when you are deploying a helmchart again.
Accuracy of this theory can be checked in 2 ways (of course if you did not change the default config of it):

  1. Force AWS to bring up the new node - scale the cluster nodes and if the new node will appear without any problems in the room in question, above theory is true.
  2. You can alter the settings of where config is stored on the node. The setting is baked into our helm → child.persistence.hostPath it defaults to /var/lib/netdata-k8s-child. So if you would to alter the config to for example /var/lib/netdata-k8s-kid we should have blank directory on the node where config needs to be generated from scratch. Again - redeployment of netdata or helm upgrade should give immediate results. It will generate new ClaimingIDs.

Lastly - did you change something between helm deployments? I’m thinking mostly about things like changes in permissions, service accounts, namespaces, any other parameters in our yaml values.yaml - things like that. I would like to replicate this issue on my own.

Hello Mateusz_Bularz,

You’re right! I added a new node and it appeared in the desired room. So, I checked volume “persistencevarlibdir” which is defined as:

  • hostPath:
    path: /var/lib/netdata-k8s-child/var/lib/netdata
    type: DirectoryOrCreate
    name: persistencevarlibdir

So, the issue is that the path was created when I installed helm for the first time without specifying the war room id. All the further installations of helm didn’t overwrite the content of the folder.
Maybe it’s better to define persistencevarlibdir as emptyDir, so the content will be erased once netdata pod is deleted?

I would like to replicate this issue on my own.

I think you can do it by installing helm without specifying war room for child nodes and then upgrade the release by providing an argument –set child.claiming.rooms=XXX

Anyway, the mystery is solved. You guys know better how to fix the issue.

Thanks to everyone who participated!

@alex.grinko that’s great news indeed!
As for why we are storing it (by default) - otherwise when you are doing for example netdata upgrade to newer version your nodes would double because they would have different Claim IDs. It would look a bit funny. For netdata they would be 2 different ones while for you it is still the same one. with the same name.

If you want to test the experience on your own you just need to disable persistance storage for childs - it is in the helm chart as an option too!

Cheers!