Backing up Netdata

Just a basic question here as the docs are strangely silent on the topic of backing up Netdata.

Obviously, backing up config files etc is straight-forward enough and so I’d like to clarify regarding the data stored in the dbengine.
My assumption is that I can simply stop the Netdata service and make a copy of the dbengine folder:

/var/cache/netdata/dbengine/

and then restart the Netdata agent service.

I’m also guessing that, if necessary, the following link has sufficient information about restoring the files in terms of permissions:

So specifically we’re setting permissions for the user and group with the name “netdata” to 0750 for directories and 0660 for files.

Is it possible/advisable to take backups without stopping the agent? Is there a way to do so while ensuring the consistency of the backup?

I guess that this meshes with the idea that one might wish to make frequent backups of the netdata dbengine, especially if the node is a Production parent with many child nodes and the data is valuable. Ands so I wonder if there would be any serious impact on the netdata agent on a busy parent node in the case of taking frequent backups.
What is frequent? You guys tell me what is advisable given the above :slight_smile:

hmm interesting question.

@Manolis_Vasilakis or @Austin_Hemmelgarn you guys any idea on what might be a good way to do this?

Perhaps could also be some way to use a parent node and then just backup that as needed and am sure that would not affect the child if you really wanted clear separation. Although I’m no expert here but the two guys above would know best i think.

Hi Luis.

Mostly yes, correct, but I would add to the backup the whole /var/cache/netdata/ directory, which also includes a couple of sqlite (*.db) databases.

The point of getting this backup from a running agent though is more complicated. It is conceivable (although pretty low chances) that a backup could be made at a point in time where dbengine’s files are not 100% ok to restore from (e.g. datafile written, but not it’s journalfile), or sqlite is in middle of updates/writes. However even in these cases, the agent will try to restore functionality at best as it can.

I also don’t think there is a problem with getting frequent backups. Although a dbengine redesigning is coming, do note that dbengine will commit data to file aprox. every 17 minutes, so more frequent than this would not make much sense.