Would be possible to create a Parent Child Deployment with a bunch of systems but working offline? Let me explain. I would like to setup a Parent Server but instead of having a child connected instantly (Online) I would like to have a child that is storing it’s metrics or it’s Streaming data on a file version instead of using TCP. Then on period of date/time I would like to sync thoose data either by rsync or UUCP and update the Parent for this child data. One scenario would work like the above. An other scenario would be to somehow sync the dbengine of Child with a parent with some DB mechanism ? An other solution would be an automated netdata export/snapshot, maybe via cron ? and then I can sync the file.
You could try to experiment with replication. Basically setup the child/parent as you would do, let them run, no connection between them, then at some point, e.g. once a day, open the connection, and the parent should request past metrics from the child.
The default period to replicate is 86400 seconds, but you can adjust as needed. The settings for this are in stream.conf, i.e.
# Replication
# Enable replication for all hosts using this api key. Default: enabled
#enable replication = yes
# How many seconds to replicate from each child. Default: a day
#seconds to replicate = 86400
# The duration we want to replicate per each step.
#replication_step = 600
In theory also you could capture the traffic from the child to a file, then re-play it on the parent, but that would be more involved.
Assuming that during the day might be 1 total connection eg each day at 2:00 noon hours. On what file these data are kept? How I can see the volume in term of MB of these data? This would be usuful to calculate the average day of Data Traffic. Maybe such information must already be included in Netdata Dashboard under Netdata Tab but I am not aware of
Since I am streaming child/parent replication is enabled by default, but if I am stop parent netdata for an hour and then start it again, there is a gap of 1 hour. That way working is correct ?
Hmm, assuming you run a recent agent also on the child, replication should be working, i.e. when you start the parent again that 1 hour of metrics should be replicated from the child.
If you check on the child it’s access.log file, does it show any requests for REPLAY ?
The size of that data, could be calculated (not sure if we have a chart for this already), but do also keep in mind that we use compression during streaming and replication.
Indeed, yes… In the scenario I suggested you would need to have dbengine likely as a memory mode on the child, to hold those metrics until it’s time to replicate them…
Oh ok confirmed, now that makes sense. So memory mode should be anything except none is the answer here in order for replication work. If I’d like to keep my usage resource of my child pretty low what would be the next preferred option if dbengine is not an option?
You could use memory mode ram, but dbengine in general will use less ram (with some disk usage of course). Assuming you’d be willing to store the streaming data to a file, dbengine seems more close to that…
Great,
Q1: if id like to keep 1 day of metrics with dbengine, what kind of options can I use in netdata.conf ?
Q2: At the moment my Parent can show my data starting from 12-Feb but not earlier. Below is the netdata.conf. What did I made wrong here?
[global]
run as user = netdata
# default storage size - increase for longer data retention
page cache size = 32
dbengine multihost disk space = 512
[db]
mode = dbengine
# per second data collection
update every = 1
# enable only Tier 0 and Tier 1
storage tiers = 3
# Tier 0, per second data for a week
#dbengine multihost disk space MB = 1100
dbengine multihost disk space MB = 3100
# Tier 1, per minute data for a month
#dbengine tier 1 multihost disk space MB = 330
dbengine tier 1 multihost disk space MB = 530
# Tier 2, per hour data for a year
#dbengine tier 2 multihost disk space MB = 67
dbengine tier 2 multihost disk space MB = 167
[registry]
enabled = yes
registry to announce = http://netdata:19999
That’s very helpful, one more question,
Does this conf need to be done only for Parent or also for both parent and each child?
This is the [db] section , so the [global] should stay as I have it ? or should I delete the page cache size?
Also, average concurrent metrics let’s say 2500 , this is counted for 1 child, so if I got 2 child I have to enter 5000 ? so I have to calculate average metrics = total metrics of each server ?