Parent Pod: "Failed to connect to https://app.netdata.cloud, return code 6", Child Nodes Working

jotojoto1324 · March 23, 2022, 11:39am

Suggested template:

Problem/Question

Parent pod is crash looping with:

ls: /var/run/balena.sock: No such file or directory
ls: /var/run/docker.sock: No such file or directory
Unable to communicate with Netdata daemon, querying config from disk instead.
Unable to communicate with Netdata daemon, querying config from disk instead.
Token: ****************
Base URL: https://app.netdata.cloud
Id: [ID]
Rooms: [ROOMS]
Hostname: [PODNAME]
Proxy: 
Netdata user: netdata
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 1 failed. Retry in 1s.
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 2 failed. Retry in 2s.
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 3 failed. Retry in 3s.
grep: /var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
grep: /var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
Failed to claim node with the following error message:"Unknown HTTP error message"

Netdata is running on a MicroK8s (1.22.6) cluster. It is provisioned (via Flux) with Helm with the following config:

spec:
  values:
    child:
      claiming:
        enabled: true
        rooms: [ROOMS]
      envFrom:
        - secretRef:
            name: netdata-secrets
    notifications:
      slackurl: [SLACKURL]
    parent:
      alarms:
        storageclass: nfs-hdd
        volumesize: 1Gi
      claiming:
        enabled: true
        rooms: [ROOMS]
      database:
        storageclass: nfs-ssd
        volumesize: 10Gi
      envFrom:
        - secretRef:
            name: netdata-secrets
      livenessProbe:
        failureThreshold: 10
        periodSeconds: 60
        timeoutSeconds: 10
      readinessProbe:
        failureThreshold: 10
        periodSeconds: 60
        timeoutSeconds: 10
    replicaCount: 1
  interval: 1m0s
  releaseName: netdata
  targetNamespace: netdata

Child pods all connect quickly, without any obvious issues. Parent pod is stuck crash looping.

Relevant docs you followed/actions you took to solve the issue

I’ve tried forcing the parent pod to other nodes in the cluster, disabling database.persistence, extending thresholds on probes (as seen in the current config above), and haven’t been able to get the parent pods healthy. Child pods are using the same room value as parent and pulling the token from the same kube secret.

Environment/Browser/Agent’s version etc

Netdata Docker Image Version: v1.33.1

What I expected to happen

Parent pods to start and become healthy.

Manolis_Vasilakis · March 23, 2022, 2:10pm

Hi @jotojoto1324 ! Welcome !

So the claim script fails to connect to the cloud. Return code 6 (if you have curl, it should be from that or else wget) indicates that it couldn’t resolve the host. Is there an easy way to test that the network where the parent node lives is ok ?

jotojoto1324 · March 23, 2022, 3:48pm

There is a child pod running on the same node as the parent pod and it doesn’t seem to have any issues. I did notice the child pod uses host networking by default and the parent doesn’t.

I will ssh into the parent pod and see if I can resolve the host from there.

jotojoto1324 · April 4, 2022, 9:19pm

I ended up discovering Debugging DNS Resolution | Kubernetes was the issue. I resolved it by implementing the solution proposed in the article:

This can be fixed manually by using kubelet’s --resolv-conf flag to point to the correct resolv.conf (With systemd-resolved , this is /run/systemd/resolve/resolv.conf ).

And then restarting kubelite, then restarting coredns, then restarting the netdata parent pod. Things look good now. Thanks for the assist (the “couldn’t resolve host” tip was very helpful).

Manolis_Vasilakis · April 5, 2022, 6:32am

Thank you for the follow up!! Glad you got it solved!

Topic		Replies	Views
Unable to see child pods/nodes in Netdata Cloud Help cloud	11	1905	September 15, 2021
Cannot connect node to cloud Help cloud	13	2713	September 16, 2021
Netdata on Kubernetes Media Center	17	3608	September 18, 2020
Failed to connect to https://app.netdata.cloud, return code 6 Help	4	991	October 14, 2022
Issue with adding nodes to the Netdata Cloud Help cloud	5	1240	November 26, 2020

Parent Pod: "Failed to connect to https://app.netdata.cloud, return code 6", Child Nodes Working

Problem/Question

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

What I expected to happen

Related topics