Suggested template:
Problem/Question
Parent pod is crash looping with:
ls: /var/run/balena.sock: No such file or directory
ls: /var/run/docker.sock: No such file or directory
Unable to communicate with Netdata daemon, querying config from disk instead.
Unable to communicate with Netdata daemon, querying config from disk instead.
Token: ****************
Base URL: https://app.netdata.cloud
Id: [ID]
Rooms: [ROOMS]
Hostname: [PODNAME]
Proxy:
Netdata user: netdata
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 1 failed. Retry in 1s.
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 2 failed. Retry in 2s.
Failed to connect to https://app.netdata.cloud, return code 6
Connection attempt 3 failed. Retry in 3s.
grep: /var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
grep: /var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
Failed to claim node with the following error message:"Unknown HTTP error message"
Netdata is running on a MicroK8s (1.22.6) cluster. It is provisioned (via Flux) with Helm with the following config:
spec:
values:
child:
claiming:
enabled: true
rooms: [ROOMS]
envFrom:
- secretRef:
name: netdata-secrets
notifications:
slackurl: [SLACKURL]
parent:
alarms:
storageclass: nfs-hdd
volumesize: 1Gi
claiming:
enabled: true
rooms: [ROOMS]
database:
storageclass: nfs-ssd
volumesize: 10Gi
envFrom:
- secretRef:
name: netdata-secrets
livenessProbe:
failureThreshold: 10
periodSeconds: 60
timeoutSeconds: 10
readinessProbe:
failureThreshold: 10
periodSeconds: 60
timeoutSeconds: 10
replicaCount: 1
interval: 1m0s
releaseName: netdata
targetNamespace: netdata
Child pods all connect quickly, without any obvious issues. Parent pod is stuck crash looping.
Relevant docs you followed/actions you took to solve the issue
I’ve tried forcing the parent pod to other nodes in the cluster, disabling database.persistence, extending thresholds on probes (as seen in the current config above), and haven’t been able to get the parent pods healthy. Child pods are using the same room value as parent and pulling the token from the same kube secret.
Environment/Browser/Agent’s version etc
Netdata Docker Image Version: v1.33.1
What I expected to happen
Parent pods to start and become healthy.