Is there a way to properly install netdata on a compute instance which gets wiped and reloaded from an image each time it reboots?
I got close to a solution. I got netdata installed in the base image, which gets started through systemd. I thought I could add these lines to /usr/lib/systemd/system/netdata.service
, I put them after ExecStartPre=/bin/chown -R netdata /run/netdata
and before PermissionsStartOnly=true
:
ExecStartPost=/bin/sh -c 'wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh'
ExecStartPost=/bin/sh /tmp/netdata-kickstart.sh --claim-token secret --claim-url https://app.netdata.cloud
This works, except the node gets registered as a new node, while the previous instance of the node is dead and I have to delete it from the dashboard. I didn’t think this would happen since the secret claim-token remains static…
The behavior you’re seeing is because you’re getting a new install (and therefore a new node ID) each time you create the node. Avoiding that requires persisting /var/lib/netdata/registry
and /var/lib/netdata/cloud.d
across the restart. However, some features of the Cloud may not work as expected if you take this approach, because you’ll lose access to historical data for the node whenever it restarts.
Given this, a better approach is probably to set up a persistent node with Netdata, claim that to the Cloud, and then have your epehemeral node stream all of it’s metrics to that node. Making that work will still require persisting /var/lib/netdata/registry
so that the node ID does not change, but it will let you continue to access historical metrics from that ephemeral node on the Cloud.
1 Like
Looking into revisiting this now since the built in monitoring we have on Bright Cluster Manager has been unreliable. I’m not sure how to persist data like /var/lib/netdata/registry
from compute nodes but there must be a way so I’ll check into the manual. Our head node would be the persistent node and has been running Netdata normally for us. You’re saying I can setup netdata on the compute/ephemeral nodes like I had before, and then there is some option to stream data that I enable, and persist /var/lib/netdata/registry
and that should do it? Thanks!