How to install/run when system is wiped after each reboot?

Is there a way to properly install netdata on a compute instance which gets wiped and reloaded from an image each time it reboots?

I got close to a solution. I got netdata installed in the base image, which gets started through systemd. I thought I could add these lines to /usr/lib/systemd/system/netdata.service, I put them after ExecStartPre=/bin/chown -R netdata /run/netdata and before PermissionsStartOnly=true:

ExecStartPost=/bin/sh -c 'wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh'
ExecStartPost=/bin/sh /tmp/netdata-kickstart.sh --claim-token secret --claim-url https://app.netdata.cloud

This works, except the node gets registered as a new node, while the previous instance of the node is dead and I have to delete it from the dashboard. I didn’t think this would happen since the secret claim-token remains static…

The behavior you’re seeing is because you’re getting a new install (and therefore a new node ID) each time you create the node. Avoiding that requires persisting /var/lib/netdata/registry and /var/lib/netdata/cloud.d across the restart. However, some features of the Cloud may not work as expected if you take this approach, because you’ll lose access to historical data for the node whenever it restarts.

Given this, a better approach is probably to set up a persistent node with Netdata, claim that to the Cloud, and then have your epehemeral node stream all of it’s metrics to that node. Making that work will still require persisting /var/lib/netdata/registry so that the node ID does not change, but it will let you continue to access historical metrics from that ephemeral node on the Cloud.

1 Like

Looking into revisiting this now since the built in monitoring we have on Bright Cluster Manager has been unreliable. I’m not sure how to persist data like /var/lib/netdata/registry from compute nodes but there must be a way so I’ll check into the manual. Our head node would be the persistent node and has been running Netdata normally for us. You’re saying I can setup netdata on the compute/ephemeral nodes like I had before, and then there is some option to stream data that I enable, and persist /var/lib/netdata/registry and that should do it? Thanks!