Install Netdata on a Kubernetes cluster
Monitor a Kubernetes (k8s) cluster with Netdata
Recently updated by our doc team, two comprehensive guides on setting up Netdata on k8s!
Install Netdata on a Kubernetes cluster
Monitor a Kubernetes (k8s) cluster with Netdata
Recently updated by our doc team, two comprehensive guides on setting up Netdata on k8s!
Hey @Luis-Johnstone ,
To force the refresh of the Dashboard, you only need to append the update_always=true
argument to the URL:
http://192.168.1.150:19999/#menu_system_submenu_cpu;theme=slate;help=true;update_always=true
We intend to offer proper support for kubernetes, including better visualization, optimized for the unique experience kubernetes offers (e.g ephemeral nodes). But, this is not on the committed roadmap, thus we can’t say in good conscience when it’s going to be shipped, or give more details about it.
if I understand what you say correctly, the streaming functionality is intended so that the child nodes replicate their database to the master, so that the master not only can offer the same metrics but also can apply alarms on them. Depending on your use-case, this setup might make sense to you, or you might prefer to have the data live on each child node and access them through netdata cloud, leveraging the extra functionality, such as custom dashboards or metric correlations.
I hope that I helped!
Keep the feedback coming, we can’t get enough of it
OK, that fixed it. I changed the listen port from 19999 to 19998 on the physical host in /etc/netdata/netdata.conf
Looks good so far!!
So, I’m getting my head around how this works:
I’m guessing from my playtime so far that this makes the agent on the host itself redundant since each child pod looks to be showing all the same information (plus more)…Is that the idea?
If so, what happens if I hook this up to send stats up to my tenant in the Netdata cloud and then re-deploy the helm chart a few times; am I going to wind up with a consistent node-identity; or will I end up with either lots of orphaned nodes with the same name or a bunch of nodes with the same name but incremented numbers attached to them etc?
Happy to try it ofc but just curious as I’ve got my workspaces up there setup nicely now
One curious thing though: I spun up another node and added it to the cluster (child service came up fine with the modified host port) but I noticed the “k8s kubelet” and “k8s kubeproxy” menu’s on the right but those didn’t appear on the original node that was deployed to. Seems a bit odd given that the first node was and still is the only master…
Is there a way for me to specify certain settings in the values.yaml for the Web UI? For example I like having my charts always refresh rather than the default of “On Focus”. If I set it in the running UI then as soon as I switch to a different child node and back then the setting is reverted. Ideally, could we get the config stored in a Persistent Volume or something?
Also, do you guys have changes planned for representing/navigating the sections on each child node dedicated to specific pods? I ask because I have only circa 8 containers per node and the UI is rather cluttered: I can imagine a whole lot of scrolling and stuttering of the browser on a production system. I’ve felt like that right-side pane needed a search box and maybe this is the requirement for one?
Luis keep us updated! @rybue thanks again for chiming in. You are helping a lot in this community
Yeah, that seems to be an issue. I’m not sure how your Kubernetes configured, but it looks like netdata pod conflicting with other processes on the same port.
You can try to reconfigure your host netdata to run on a different port, to see if it solve the issue
luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-zq2vl -c sd
{"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"k8s config provider","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"registered: '[file exporter (/export/go.d.yml)]'"}
{"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s discovery manager]"}
{"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"received a new config, starting a new pipeline ('k8s/cmap/default/netdata-child-sd-config-map:config.yml')"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"file export","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"k8s discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s pod discovery]"}
{"level":"info","component":"k8s pod discovery","time":"2020-09-17 20:47:40","message":"instance is started"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"build manager","time":"2020-09-17 20:47:45","message":"built 1 config(s) for target 'kube-system_coredns-7944c66d8d-4v9q6_coredns_tcp_9153'"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6': new/stale config(s) 1/0"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"file export","time":"2020-09-17 20:47:46","message":"wrote 1 config(s) to '/export/go.d.yml'"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"received '2' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"received '8' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"received '1' group(s)"}
{"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
kubectl logs netdata-child-zq2vl -c netdata
Netdata entrypoint script starting
2020-09-17 21:29:42: netdata INFO : MAIN : CONFIG: cannot load cloud config '/var/lib/netdata/cloud.d/cloud.conf'. Running with internal defaults.
2020-09-17 21:29:42: netdata INFO : MAIN : Found 0 legacy dbengines, setting multidb diskspace to 256MB
2020-09-17 21:29:42: netdata INFO : MAIN : Created file '/var/lib/netdata/dbengine_multihost_size' to store the computed value
2020-09-17 21:29:42: netdata INFO : MAIN : Using host prefix directory '/host'
2020-09-17 21:29:42: netdata INFO : MAIN : SIGNAL: Not enabling reaper
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Invalid listen port 0 given. Defaulting to 19999. (errno 22, Invalid argument)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv4 bind() on ip '0.0.0.0' port 19999, socktype 1 failed. (errno 98, Address in use)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '0.0.0.0', port 19999
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv6 bind() on ip '::' port 19999, socktype 1 failed. (errno 98, Address in use)
2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '::', port 19999
2020-09-17 21:29:42: netdata FATAL : MAIN : LISTENER: Cannot listen on any API socket. Exiting... # : Invalid argument
2020-09-17 21:29:42: netdata INFO : MAIN : EXIT: netdata prepares to exit with code 1...
2020-09-17 21:29:42: netdata INFO : MAIN : EXIT: cleaning up the database...
2020-09-17 21:29:42: netdata INFO : MAIN : Cleaning up database [0 hosts(s)]...
2020-09-17 21:29:42: netdata INFO : MAIN : EXIT: all done - netdata is now exiting - bye bye...
Please note that I am running netdata on the k8s/k3s host node…
It is GitHub - netdata/agent-service-discovery
its purpose is to identify applications running inside the containers and create configuration files that is used by netdata plugins.
I see now it is netdata is the container that is failing to start
Also, the issue may be that netdata child tries to connect to parent, but parent not actually serving any connections, as we can see from here netdata-parent-cfb988d65-rkz5m 0/1 Running
Looks like Readiness probe is failed there.
You could also post events from the parent pod
Ok, it looks like problem now not with pulling image(as it get created sucessfully), but something goes wrong when container is started.
Could you post logs from both containers in netdata-child-zq2vl
pod?
kubectl logs netdata-child-zq2vl -c sd
kubectl logs netdata-child-zq2vl -c netdata
@Rybue Thanks for the reply!
OK, so the main container is set to latest and the sd one is set to v0.2.1
Here’s the output:
luis@pi-node1:~/k8s/netdata$ kubectl get po
NAME READY STATUS RESTARTS AGE
netdata-parent-cfb988d65-rkz5m 0/1 Running 0 14s
netdata-child-zq2vl 1/2 Error 1 15s
luis@pi-node1:~/k8s/netdata$
luis@pi-node1:~/k8s/netdata$ kubectl describe po netdata-child-zq2vl
Name: netdata-child-zq2vl
Namespace: default
Priority: 0
Node: pi-node1/192.168.178.81
Start Time: Thu, 17 Sep 2020 21:47:31 +0100
Labels: app=netdata
controller-revision-hash=65778dd95d
pod-template-generation=1
release=netdata
role=child
Annotations: checksum/config: dbf27785c04d58fa098895f1e45be1b72b4ea76b283ec2d0d373412977e44329
container.apparmor.security.beta.kubernetes.io/netdata: unconfined
Status: Running
IP: 192.168.178.81
IPs:
IP: 192.168.178.81
Controlled By: DaemonSet/netdata-child
Init Containers:
init-nodeuid:
Container ID: containerd://6f5ff79976b7eb57caeb397cd4780746fc6f6d7074d4b69cc3f7c805197a8a66
Image: netdata/wget
Image ID: docker.io/netdata/wget@sha256:44e7a2be59451de7fda0bef7f35caeeb34a5e9c96949b17069ec7b62d7545af2
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token); URL="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${MY_NODE_NAME}"; HEADER="Authorization: Bearer ${TOKEN}";
DATA=$(wget -q -T 5 --no-check-certificate --header "${HEADER}" -O - "${URL}"); [ -z "${DATA}" ] && exit 1;
UID=$(echo "${DATA}" | grep -m 1 uid | grep -o ":.*" | tr -d ": \","); [ -z "${UID}" ] && exit 1;
echo -n "${UID}" > /nodeuid/netdata.public.unique.id;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 17 Sep 2020 21:47:35 +0100
Finished: Thu, 17 Sep 2020 21:47:35 +0100
Ready: True
Restart Count: 0
Environment:
MY_NODE_NAME: (v1:spec.nodeName)
Mounts:
/nodeuid from nodeuid (rw)
/var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
Containers:
netdata:
Container ID: containerd://f75510bcd3b0c280208e144d1479d0a23e0128d10c0e16f18afdf8dd35b79504
Image: netdata/netdata:latest
Image ID: docker.io/netdata/netdata@sha256:06ca7394e515561613324e6700b49deb1bb92de787f9f78bc98b76bc5d2a7462
Port: 19999/TCP
Host Port: 19999/TCP
State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 17 Sep 2020 21:47:43 +0100
Finished: Thu, 17 Sep 2020 21:47:44 +0100
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 17 Sep 2020 21:47:39 +0100
Finished: Thu, 17 Sep 2020 21:47:40 +0100
Ready: False
Restart Count: 1
Liveness: http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
Environment:
MY_POD_NAME: netdata-child-zq2vl (v1:metadata.name)
MY_NODE_NAME: (v1:spec.nodeName)
MY_POD_NAMESPACE: default (v1:metadata.namespace)
NETDATA_PLUGINS_GOD_WATCH_PATH: /etc/netdata/go.d/sd/go.d.yml
Mounts:
/etc/netdata/go.d.conf from config (rw,path="go.d")
/etc/netdata/go.d/k8s_kubelet.conf from config (rw,path="kubelet")
/etc/netdata/go.d/k8s_kubeproxy.conf from config (rw,path="kubeproxy")
/etc/netdata/go.d/sd/ from sd-shared (rw)
/etc/netdata/netdata.conf from config (rw,path="netdata")
/etc/netdata/stream.conf from config (rw,path="stream")
/host/proc from proc (ro)
/host/sys from sys (rw)
/var/lib/netdata/registry/ from nodeuid (rw)
/var/run/docker.sock from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
sd:
Container ID: containerd://5c652252424cd16b7d37b47b5559b3a00d7ca3c49e71b337ba20ed2a08b26426
Image: netdata/agent-sd:v0.2.1
Image ID: docker.io/netdata/agent-sd@sha256:31cdb9c2c6b4e87deb075e1c620f8cb03c4ae9627f0c21cfebdbb998f5a325fa
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 17 Sep 2020 21:47:40 +0100
Ready: True
Restart Count: 0
Environment:
NETDATA_SD_CONFIG_MAP: netdata-child-sd-config-map:config.yml
MY_POD_NAMESPACE: default (v1:metadata.namespace)
MY_NODE_NAME: (v1:spec.nodeName)
Mounts:
/export/ from sd-shared (rw)
/var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
run:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: netdata-conf-child
Optional: false
nodeuid:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
sd-shared:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
netdata-token-mbdkj:
Type: Secret (a volume populated by a Secret)
SecretName: netdata-token-mbdkj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/netdata-child-zq2vl to pi-node1
Normal Pulling 25s kubelet, pi-node1 Pulling image "netdata/wget"
Normal Pulled 24s kubelet, pi-node1 Successfully pulled image "netdata/wget"
Normal Created 24s kubelet, pi-node1 Created container init-nodeuid
Normal Started 23s kubelet, pi-node1 Started container init-nodeuid
Normal Pulling 19s kubelet, pi-node1 Pulling image "netdata/agent-sd:v0.2.1"
Normal Started 18s kubelet, pi-node1 Started container sd
Normal Pulled 18s kubelet, pi-node1 Successfully pulled image "netdata/agent-sd:v0.2.1"
Normal Created 18s kubelet, pi-node1 Created container sd
Normal Pulling 17s (x2 over 22s) kubelet, pi-node1 Pulling image "netdata/netdata:latest"
Normal Pulled 16s (x2 over 21s) kubelet, pi-node1 Successfully pulled image "netdata/netdata:latest"
Normal Created 16s (x2 over 20s) kubelet, pi-node1 Created container netdata
Normal Started 15s (x2 over 19s) kubelet, pi-node1 Started container netdata
Warning BackOff 13s kubelet, pi-node1 Back-off restarting failed container
@ilyam8
What is the sd container used for? If I disable it what won’t work?
Would be good to have a short description on the docker page?
both latest
and v0.2.1
have linux/arm64
platform
https://hub.docker.com/r/netdata/agent-sd/tags
agent-sd
is optional, can be disabled in values.yaml
Hi Luis!
Could you post events from the failing pod(kubectl describe
) when you using v0.2.1
image tag?
I’m still getting the same error in three scenarios:
Should I be doing something different?
Thanks @ilyam8 for this. Luis, if you manage to run this successfully, come back and tell us about it!
Warning Failed 44s kubelet, pi-node4 Failed to pull image “netdata/agent-sd:v0.1.0”: rpc error: code = NotFound desc = failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found
failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform
Indeed, our netdata/agent-sd
image has no linux/arm64
platform support
I added it in Update test_and_deploy.yml · netdata/agent-service-discovery@26c687f · GitHub
linux/arm64
is in latest and v0.2.1
I’m not using docker, I’m using k3s with containerd backend.
I’ve tried deploying the helm chart with the default values.yaml but modifying the image tags to “latest”:
replicaCount: 1
deploymentStrategy:
type: Recreate
image:
repository: netdata/netdata
tag: latest
pullPolicy: Always
sd:
repository: netdata/agent-sd
tag: latest
pullPolicy: Always
child:
enabled: true
configmap:
name: netdata-child-sd-config-map
key: config.yml
# if 'from' is {} the ConfigMap is not generated
from:
file: sdconfig/child.yml
value: {}
resources: {}
# limits:
# cpu: 50m
# memory: 60Mi
# requests:
# cpu: 50m
# memory: 60Mi
These are the commands I ran:
git clone GitHub - netdata/helmchart: Helm chart for kubernetes deployments netdata-helmchart
helm install netdata ./netdata-helmchart -f values.yaml
NAME: netdata
LAST DEPLOYED: Wed Sep 16 22:15:16 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. netdata will be available on http://netdata.k8s.local/, on the exposed port of your ingress controller
In a production environment, you
You can get that port via `kubectl get services`. e.g. in the following example, the http exposed port is 31737, the https one is 30069.
The hostname netdata.k8s.local will need to be added to /etc/hosts, so that it resolves to the exposed IP. That IP depends on how your cluster is set up:
- When no load balancer is available (e.g. with minikube), you get the IP shown on `kubectl cluster-info`
- In a production environment, the command `kubectl get services` will show the IP under the EXTERNAL-IP column
The port can be retrieved in both cases from `kubectl get services`
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
exiled-tapir-nginx-ingress-controller LoadBalancer 10.98.132.169 <pending> 80:31737/TCP,443:30069/TCP 11h
luis@pi-node1:~/k8s/netdata$ kubectl get po
NAME READY STATUS RESTARTS AGE
netdata-parent-85b8ddd7f8-6m95r 1/1 Running 0 8m2s
netdata-child-sz6b4 0/2 CrashLoopBackOff 6 8m2s
luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-sz6b4
error: a container name must be specified for pod netdata-child-sz6b4, choose one of: [netdata sd] or one of the init containers: [init-nodeuid]
luis@pi-node1:~/k8s/netdata$ uname -a
Linux pi-node1 5.4.0-1018-raspi #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
Running on aarch64 ubuntu 16.04
Tried docker first
user@ubuntu1604-aarch64:~/netdata$ docker run -d --name=netdata \
> -p 19999:19999 \
> -v netdatalib:/var/lib/netdata \
> -v netdatacache:/var/cache/netdata \
> -v /etc/passwd:/host/etc/passwd:ro \
> -v /etc/group:/host/etc/group:ro \
> -v /proc:/host/proc:ro \
> -v /sys:/host/sys:ro \ ^C
user@ubuntu1604-aarch64:~/netdata$ sudo docker run -d --name=netdata \
> -p 19999:19999 \
> -v netdatalib:/var/lib/netdata \
> -v netdatacache:/var/cache/netdata \
> -v /etc/passwd:/host/etc/passwd:ro \
> -v /etc/group:/host/etc/group:ro \
> -v /proc:/host/proc:ro \
> -v /sys:/host/sys:ro \
> -v /etc/os-release:/host/etc/os-release:ro \
> --restart unless-stopped \
> --cap-add SYS_PTRACE \
> --security-opt apparmor=unconfined \
> netdata/netdata
Unable to find image 'netdata/netdata:latest' locally
latest: Pulling from netdata/netdata
4f861a20f507: Pull complete
7bb4d159526d: Pull complete
e2e87b7a7de9: Pull complete
2c8445a10990: Pull complete
adbf0b90c51f: Pull complete
f7f8b8493280: Pull complete
db8649e50b77: Pull complete
37e23bb8abd9: Pull complete
02279201ef13: Pull complete
fabbb19aede9: Pull complete
Digest: sha256:eb3b37414ecb87e7b64949826bc56e3f27d24fa0ef2e29e58de1dc4972a534e1
Status: Downloaded newer image for netdata/netdata:latest
be6767de06e7b973a8acb39648bf409916a8cf45f09aed4bb3887bd88e39d384
Which netdata images are you pulling and can you try with :latest?
This looks like it’ll be awesome.
Not sure if this is a bug or an RFC but I just tried to deploy this to an ARM64 cluster and the netdata-child pods get stuck as they can’t seem to pull their images for the architecture (output below).
My immediate thought was that you guys didn’t support ARM yet but the netdata-parent pod starts up fine…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/netdata-child-7c4mc to pi-node4
Warning FailedMount 6m47s kubelet, pi-node4 MountVolume.SetUp failed for volume "config" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulling 6m46s kubelet, pi-node4 Pulling image "netdata/wget"
Normal Pulled 6m42s kubelet, pi-node4 Successfully pulled image "netdata/wget"
Normal Created 6m42s kubelet, pi-node4 Created container init-nodeuid
Normal Started 6m42s kubelet, pi-node4 Started container init-nodeuid
Warning Failed 44s kubelet, pi-node4 Failed to pull image "netdata/agent-sd:v0.1.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/netdata/agent-sd:v0.1.0": failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found
Normal Pulling 43s (x2 over 6m41s) kubelet, pi-node4 Pulling image "netdata/netdata:v1.24.0"
Normal Created 42s (x2 over 46s) kubelet, pi-node4 Created container netdata
Normal Pulled 42s (x2 over 49s) kubelet, pi-node4 Successfully pulled image "netdata/netdata:v1.24.0"
Normal Started 41s (x2 over 46s) kubelet, pi-node4 Started container netdata
Normal BackOff 40s (x3 over 41s) kubelet, pi-node4 Back-off pulling image "netdata/agent-sd:v0.1.0"
Warning Failed 40s (x3 over 41s) kubelet, pi-node4 Error: ImagePullBackOff
Warning BackOff 31s (x2 over 40s) kubelet, pi-node4 Back-off restarting failed container
Normal Pulling 31s (x2 over 46s) kubelet, pi-node4 Pulling image "netdata/agent-sd:v0.1.0"
Warning Failed 30s (x2 over 44s) kubelet, pi-node4 Error: ErrImagePull