Netdata on Kubernetes

Install Netdata on a Kubernetes cluster

Monitor a Kubernetes (k8s) cluster with Netdata

Recently updated by our doc team, two comprehensive guides on setting up Netdata on k8s!

Hey @Luis-Johnstone ,

To force the refresh of the Dashboard, you only need to append the update_always=true argument to the URL:
http://192.168.1.150:19999/#menu_system_submenu_cpu;theme=slate;help=true;update_always=true

We intend to offer proper support for kubernetes, including better visualization, optimized for the unique experience kubernetes offers (e.g ephemeral nodes). But, this is not on the committed roadmap, thus we can’t say in good conscience when it’s going to be shipped, or give more details about it.

if I understand what you say correctly, the streaming functionality is intended so that the child nodes replicate their database to the master, so that the master not only can offer the same metrics but also can apply alarms on them. Depending on your use-case, this setup might make sense to you, or you might prefer to have the data live on each child node and access them through netdata cloud, leveraging the extra functionality, such as custom dashboards or metric correlations.

I hope that I helped!

Keep the feedback coming, we can’t get enough of it :muscle:

OK, that fixed it. I changed the listen port from 19999 to 19998 on the physical host in /etc/netdata/netdata.conf

Looks good so far!! :grinning:

So, I’m getting my head around how this works:

I’m guessing from my playtime so far that this makes the agent on the host itself redundant since each child pod looks to be showing all the same information (plus more)…Is that the idea?
If so, what happens if I hook this up to send stats up to my tenant in the Netdata cloud and then re-deploy the helm chart a few times; am I going to wind up with a consistent node-identity; or will I end up with either lots of orphaned nodes with the same name or a bunch of nodes with the same name but incremented numbers attached to them etc?
Happy to try it ofc but just curious as I’ve got my workspaces up there setup nicely now

One curious thing though: I spun up another node and added it to the cluster (child service came up fine with the modified host port) but I noticed the “k8s kubelet” and “k8s kubeproxy” menu’s on the right but those didn’t appear on the original node that was deployed to. Seems a bit odd given that the first node was and still is the only master…

Is there a way for me to specify certain settings in the values.yaml for the Web UI? For example I like having my charts always refresh rather than the default of “On Focus”. If I set it in the running UI then as soon as I switch to a different child node and back then the setting is reverted. Ideally, could we get the config stored in a Persistent Volume or something?

Also, do you guys have changes planned for representing/navigating the sections on each child node dedicated to specific pods? I ask because I have only circa 8 containers per node and the UI is rather cluttered: I can imagine a whole lot of scrolling and stuttering of the browser on a production system. I’ve felt like that right-side pane needed a search box and maybe this is the requirement for one?

Luis keep us updated! @rybue thanks again for chiming in. You are helping a lot in this community :slight_smile:

Yeah, that seems to be an issue. I’m not sure how your Kubernetes configured, but it looks like netdata pod conflicting with other processes on the same port.
You can try to reconfigure your host netdata to run on a different port, to see if it solve the issue :slight_smile:

luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-zq2vl -c sd
{"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"k8s config provider","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"registered: '[file exporter (/export/go.d.yml)]'"}
{"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s discovery manager]"}
{"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"received a new config, starting a new pipeline ('k8s/cmap/default/netdata-child-sd-config-map:config.yml')"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"file export","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"k8s discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s pod discovery]"}
{"level":"info","component":"k8s pod discovery","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"build manager","time":"2020-09-17 20:47:45","message":"built 1 config(s) for target 'kube-system_coredns-7944c66d8d-4v9q6_coredns_tcp_9153'"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6': new/stale config(s) 1/0"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"file export","time":"2020-09-17 20:47:46","message":"wrote 1 config(s) to '/export/go.d.yml'"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"received '2' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
kubectl logs netdata-child-zq2vl -c netdata
Netdata entrypoint script starting
2020-09-17 21:29:42: netdata INFO  : MAIN : CONFIG: cannot load cloud config '/var/lib/netdata/cloud.d/cloud.conf'. Running with internal defaults.
2020-09-17 21:29:42: netdata INFO  : MAIN : Found 0 legacy dbengines, setting multidb diskspace to 256MB
2020-09-17 21:29:42: netdata INFO  : MAIN : Created file '/var/lib/netdata/dbengine_multihost_size' to store the computed value
2020-09-17 21:29:42: netdata INFO  : MAIN : Using host prefix directory '/host'
2020-09-17 21:29:42: netdata INFO  : MAIN : SIGNAL: Not enabling reaper
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Invalid listen port 0 given. Defaulting to 19999. (errno 22, Invalid argument)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv4 bind() on ip '0.0.0.0' port 19999, socktype 1 failed. (errno 98, Address in use)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '0.0.0.0', port 19999
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv6 bind() on ip '::' port 19999, socktype 1 failed. (errno 98, Address in use)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '::', port 19999
2020-09-17 21:29:42: netdata FATAL : MAIN : LISTENER: Cannot listen on any API socket. Exiting... # : Invalid argument

2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: netdata prepares to exit with code 1...
2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: cleaning up the database...
2020-09-17 21:29:42: netdata INFO  : MAIN : Cleaning up database [0 hosts(s)]...
2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: all done - netdata is now exiting - bye bye...

Please note that I am running netdata on the k8s/k3s host node… :grin:

It is GitHub - netdata/agent-service-discovery

its purpose is to identify applications running inside the containers and create configuration files that is used by netdata plugins.

I see now it is netdata is the container that is failing to start :smiley:

Also, the issue may be that netdata child tries to connect to parent, but parent not actually serving any connections, as we can see from here netdata-parent-cfb988d65-rkz5m 0/1 Running
Looks like Readiness probe is failed there.

You could also post events from the parent pod :slight_smile:

Ok, it looks like problem now not with pulling image(as it get created sucessfully), but something goes wrong when container is started.
Could you post logs from both containers in netdata-child-zq2vl pod?

kubectl logs netdata-child-zq2vl -c sd
kubectl logs netdata-child-zq2vl -c netdata

@Rybue Thanks for the reply!
OK, so the main container is set to latest and the sd one is set to v0.2.1
Here’s the output:

luis@pi-node1:~/k8s/netdata$ kubectl get po
NAME                             READY   STATUS    RESTARTS   AGE
netdata-parent-cfb988d65-rkz5m   0/1     Running   0          14s
netdata-child-zq2vl              1/2     Error     1          15s
luis@pi-node1:~/k8s/netdata$
luis@pi-node1:~/k8s/netdata$ kubectl describe po netdata-child-zq2vl
Name:         netdata-child-zq2vl
Namespace:    default
Priority:     0
Node:         pi-node1/192.168.178.81
Start Time:   Thu, 17 Sep 2020 21:47:31 +0100
Labels:       app=netdata
              controller-revision-hash=65778dd95d
              pod-template-generation=1
              release=netdata
              role=child
Annotations:  checksum/config: dbf27785c04d58fa098895f1e45be1b72b4ea76b283ec2d0d373412977e44329
              container.apparmor.security.beta.kubernetes.io/netdata: unconfined
Status:       Running
IP:           192.168.178.81
IPs:
  IP:           192.168.178.81
Controlled By:  DaemonSet/netdata-child
Init Containers:
  init-nodeuid:
    Container ID:  containerd://6f5ff79976b7eb57caeb397cd4780746fc6f6d7074d4b69cc3f7c805197a8a66
    Image:         netdata/wget
    Image ID:      docker.io/netdata/wget@sha256:44e7a2be59451de7fda0bef7f35caeeb34a5e9c96949b17069ec7b62d7545af2
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
       TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token); URL="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${MY_NODE_NAME}"; HEADER="Authorization: Bearer ${TOKEN}";
      DATA=$(wget -q -T 5 --no-check-certificate --header "${HEADER}" -O - "${URL}"); [ -z "${DATA}" ] && exit 1;
      UID=$(echo "${DATA}" | grep -m 1 uid | grep -o ":.*" | tr -d ": \","); [ -z "${UID}" ] && exit 1;
      echo -n "${UID}" > /nodeuid/netdata.public.unique.id;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 17 Sep 2020 21:47:35 +0100
      Finished:     Thu, 17 Sep 2020 21:47:35 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      MY_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /nodeuid from nodeuid (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
Containers:
  netdata:
    Container ID:   containerd://f75510bcd3b0c280208e144d1479d0a23e0128d10c0e16f18afdf8dd35b79504
    Image:          netdata/netdata:latest
    Image ID:       docker.io/netdata/netdata@sha256:06ca7394e515561613324e6700b49deb1bb92de787f9f78bc98b76bc5d2a7462
    Port:           19999/TCP
    Host Port:      19999/TCP
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 17 Sep 2020 21:47:43 +0100
      Finished:     Thu, 17 Sep 2020 21:47:44 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 17 Sep 2020 21:47:39 +0100
      Finished:     Thu, 17 Sep 2020 21:47:40 +0100
    Ready:          False
    Restart Count:  1
    Liveness:       http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
    Environment:
      MY_POD_NAME:                     netdata-child-zq2vl (v1:metadata.name)
      MY_NODE_NAME:                     (v1:spec.nodeName)
      MY_POD_NAMESPACE:                default (v1:metadata.namespace)
      NETDATA_PLUGINS_GOD_WATCH_PATH:  /etc/netdata/go.d/sd/go.d.yml
    Mounts:
      /etc/netdata/go.d.conf from config (rw,path="go.d")
      /etc/netdata/go.d/k8s_kubelet.conf from config (rw,path="kubelet")
      /etc/netdata/go.d/k8s_kubeproxy.conf from config (rw,path="kubeproxy")
      /etc/netdata/go.d/sd/ from sd-shared (rw)
      /etc/netdata/netdata.conf from config (rw,path="netdata")
      /etc/netdata/stream.conf from config (rw,path="stream")
      /host/proc from proc (ro)
      /host/sys from sys (rw)
      /var/lib/netdata/registry/ from nodeuid (rw)
      /var/run/docker.sock from run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
  sd:
    Container ID:   containerd://5c652252424cd16b7d37b47b5559b3a00d7ca3c49e71b337ba20ed2a08b26426
    Image:          netdata/agent-sd:v0.2.1
    Image ID:       docker.io/netdata/agent-sd@sha256:31cdb9c2c6b4e87deb075e1c620f8cb03c4ae9627f0c21cfebdbb998f5a325fa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 17 Sep 2020 21:47:40 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      NETDATA_SD_CONFIG_MAP:  netdata-child-sd-config-map:config.yml
      MY_POD_NAMESPACE:       default (v1:metadata.namespace)
      MY_NODE_NAME:            (v1:spec.nodeName)
    Mounts:
      /export/ from sd-shared (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:
  sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      netdata-conf-child
    Optional:  false
  nodeuid:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  sd-shared:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  netdata-token-mbdkj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  netdata-token-mbdkj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  <unknown>          default-scheduler  Successfully assigned default/netdata-child-zq2vl to pi-node1
  Normal   Pulling    25s                kubelet, pi-node1  Pulling image "netdata/wget"
  Normal   Pulled     24s                kubelet, pi-node1  Successfully pulled image "netdata/wget"
  Normal   Created    24s                kubelet, pi-node1  Created container init-nodeuid
  Normal   Started    23s                kubelet, pi-node1  Started container init-nodeuid
  Normal   Pulling    19s                kubelet, pi-node1  Pulling image "netdata/agent-sd:v0.2.1"
  Normal   Started    18s                kubelet, pi-node1  Started container sd
  Normal   Pulled     18s                kubelet, pi-node1  Successfully pulled image "netdata/agent-sd:v0.2.1"
  Normal   Created    18s                kubelet, pi-node1  Created container sd
  Normal   Pulling    17s (x2 over 22s)  kubelet, pi-node1  Pulling image "netdata/netdata:latest"
  Normal   Pulled     16s (x2 over 21s)  kubelet, pi-node1  Successfully pulled image "netdata/netdata:latest"
  Normal   Created    16s (x2 over 20s)  kubelet, pi-node1  Created container netdata
  Normal   Started    15s (x2 over 19s)  kubelet, pi-node1  Started container netdata
  Warning  BackOff    13s                kubelet, pi-node1  Back-off restarting failed container

@ilyam8
What is the sd container used for? If I disable it what won’t work?
Would be good to have a short description on the docker page? :slight_smile:

both latest and v0.2.1 have linux/arm64 platform

https://hub.docker.com/r/netdata/agent-sd/tags

agent-sd is optional, can be disabled in values.yaml

Hi Luis!

Could you post events from the failing pod(kubectl describe) when you using v0.2.1 image tag?

1 Like

I’m still getting the same error in three scenarios:

  1. Fresh pull from git and then helm install
  2. Fresh pull from git and then helm install with values.yaml with tag set to “latest”
  3. Fresh pull from git and then helm install with values.yaml with tag set to “v0.2.1” for the sd image

Should I be doing something different? :slight_smile:

Thanks @ilyam8 for this. Luis, if you manage to run this successfully, come back and tell us about it!

Warning Failed 44s kubelet, pi-node4 Failed to pull image “netdata/agent-sd:v0.1.0”: rpc error: code = NotFound desc = failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found

failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform

Indeed, our netdata/agent-sd image has no linux/arm64 platform support

I added it in Update test_and_deploy.yml · netdata/agent-service-discovery@26c687f · GitHub

linux/arm64 is in latest and v0.2.1

1 Like

I’m not using docker, I’m using k3s with containerd backend.
I’ve tried deploying the helm chart with the default values.yaml but modifying the image tags to “latest”:

replicaCount: 1
deploymentStrategy:
  type: Recreate

image:
  repository: netdata/netdata
  tag: latest
  pullPolicy: Always

sd:
  repository: netdata/agent-sd
  tag: latest
  pullPolicy: Always
  child:
    enabled: true
    configmap:
      name: netdata-child-sd-config-map
      key: config.yml
      # if 'from' is {} the ConfigMap is not generated
      from:
        file: sdconfig/child.yml
        value: {}
    resources: {}
    # limits:
    #  cpu: 50m
    #  memory: 60Mi
    # requests:
    #  cpu: 50m
    #  memory: 60Mi

These are the commands I ran:

git clone GitHub - netdata/helmchart: Helm chart for kubernetes deployments netdata-helmchart
helm install netdata ./netdata-helmchart -f values.yaml

NAME: netdata
LAST DEPLOYED: Wed Sep 16 22:15:16 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. netdata will be available on http://netdata.k8s.local/, on the exposed port of your ingress controller

In a production environment, you
 You can get that port via `kubectl get services`. e.g. in the following example, the http exposed port is 31737, the https one is 30069.
 The hostname netdata.k8s.local will need to be added to /etc/hosts, so that it resolves to the exposed IP. That IP depends on how your cluster is set up:
        - When no load balancer is available (e.g. with minikube), you get the IP shown on `kubectl cluster-info`
        - In a production environment, the command `kubectl get services` will show the IP under the EXTERNAL-IP column

The port can be retrieved in both cases from `kubectl get services`

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
exiled-tapir-nginx-ingress-controller        LoadBalancer   10.98.132.169    <pending>     80:31737/TCP,443:30069/TCP   11h





luis@pi-node1:~/k8s/netdata$ kubectl get po
NAME                              READY   STATUS             RESTARTS   AGE
netdata-parent-85b8ddd7f8-6m95r   1/1     Running            0          8m2s
netdata-child-sz6b4               0/2     CrashLoopBackOff   6          8m2s


luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-sz6b4
error: a container name must be specified for pod netdata-child-sz6b4, choose one of: [netdata sd] or one of the init containers: [init-nodeuid]




luis@pi-node1:~/k8s/netdata$ uname -a
Linux pi-node1 5.4.0-1018-raspi #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

Running on aarch64 ubuntu 16.04

Tried docker first

user@ubuntu1604-aarch64:~/netdata$ docker run -d --name=netdata \
>   -p 19999:19999 \
>   -v netdatalib:/var/lib/netdata \
>   -v netdatacache:/var/cache/netdata \
>   -v /etc/passwd:/host/etc/passwd:ro \
>   -v /etc/group:/host/etc/group:ro \
>   -v /proc:/host/proc:ro \
>   -v /sys:/host/sys:ro \         ^C
user@ubuntu1604-aarch64:~/netdata$ sudo docker run -d --name=netdata \
>   -p 19999:19999 \
>   -v netdatalib:/var/lib/netdata \
>   -v netdatacache:/var/cache/netdata \
>   -v /etc/passwd:/host/etc/passwd:ro \
>   -v /etc/group:/host/etc/group:ro \
>   -v /proc:/host/proc:ro \
>   -v /sys:/host/sys:ro \
>   -v /etc/os-release:/host/etc/os-release:ro \
>   --restart unless-stopped \
>   --cap-add SYS_PTRACE \
>   --security-opt apparmor=unconfined \
>   netdata/netdata
Unable to find image 'netdata/netdata:latest' locally
latest: Pulling from netdata/netdata
4f861a20f507: Pull complete 
7bb4d159526d: Pull complete 
e2e87b7a7de9: Pull complete 
2c8445a10990: Pull complete 
adbf0b90c51f: Pull complete 
f7f8b8493280: Pull complete 
db8649e50b77: Pull complete 
37e23bb8abd9: Pull complete 
02279201ef13: Pull complete 
fabbb19aede9: Pull complete 
Digest: sha256:eb3b37414ecb87e7b64949826bc56e3f27d24fa0ef2e29e58de1dc4972a534e1
Status: Downloaded newer image for netdata/netdata:latest
be6767de06e7b973a8acb39648bf409916a8cf45f09aed4bb3887bd88e39d384

Which netdata images are you pulling and can you try with :latest?

@zack

This looks like it’ll be awesome.

Not sure if this is a bug or an RFC but I just tried to deploy this to an ARM64 cluster and the netdata-child pods get stuck as they can’t seem to pull their images for the architecture (output below).
My immediate thought was that you guys didn’t support ARM yet but the netdata-parent pod starts up fine…

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    <unknown>            default-scheduler  Successfully assigned default/netdata-child-7c4mc to pi-node4
  Warning  FailedMount  6m47s                kubelet, pi-node4  MountVolume.SetUp failed for volume "config" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling      6m46s                kubelet, pi-node4  Pulling image "netdata/wget"
  Normal   Pulled       6m42s                kubelet, pi-node4  Successfully pulled image "netdata/wget"
  Normal   Created      6m42s                kubelet, pi-node4  Created container init-nodeuid
  Normal   Started      6m42s                kubelet, pi-node4  Started container init-nodeuid
  Warning  Failed       44s                  kubelet, pi-node4  Failed to pull image "netdata/agent-sd:v0.1.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/netdata/agent-sd:v0.1.0": failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found
  Normal   Pulling      43s (x2 over 6m41s)  kubelet, pi-node4  Pulling image "netdata/netdata:v1.24.0"
  Normal   Created      42s (x2 over 46s)    kubelet, pi-node4  Created container netdata
  Normal   Pulled       42s (x2 over 49s)    kubelet, pi-node4  Successfully pulled image "netdata/netdata:v1.24.0"
  Normal   Started      41s (x2 over 46s)    kubelet, pi-node4  Started container netdata
  Normal   BackOff      40s (x3 over 41s)    kubelet, pi-node4  Back-off pulling image "netdata/agent-sd:v0.1.0"
  Warning  Failed       40s (x3 over 41s)    kubelet, pi-node4  Error: ImagePullBackOff
  Warning  BackOff      31s (x2 over 40s)    kubelet, pi-node4  Back-off restarting failed container
  Normal   Pulling      31s (x2 over 46s)    kubelet, pi-node4  Pulling image "netdata/agent-sd:v0.1.0"
  Warning  Failed       30s (x2 over 44s)    kubelet, pi-node4  Error: ErrImagePull