I have a setup with one parent netdata node (dbengine) with multiple child nodes streaming data to it. It works well, but when one of the child nodes goes offline it disappears from the list of nodes.
Maybe this is expected behavior, but I was hoping it would still show up (although not marked as “live”) and that historical data would still be visible, since it should be stored on the parent node. Have I misunderstood how this works, or could it be misconfigured somehow?
This is precisely how it’s supposed to work. All the data is there, but we have an issue getting to it currently. Work is underway to rectify this situation. @Stelios_Fragkakis or @copi can you point to an existing issue or provide an ETA?
Thanks for the quick reply. From a UX perspective, maybe it would be worth considering removing the “live” badge until the feature is implemented if there’s no actual opposite, as its existence kind of implies that the node should be visible in the list even if it’s not live. If live nodes are the only ones that will show it makes the label kind of redundant and a bit confusing for an end-user.
Absolutely correct. Is it possible for you to create the bug report in GitHub - netdata/netdata-cloud: The public repository of Netdata Cloud, used as an issue tracker. ?
If not, we will do it for you.
If you could do it, that would be great. It’s not a huge deal for me, just a bit confusing, and I couldn’t easily find any documentation on what the expected behavior was.
I just noticed that a node did get tagged as “Off” after being shut down, and can still be inspected on the Netdata parent, even after refreshing the web ui.
I suspected this was a timing issue, i.e. maybe it still shows if the parent is up while the node goes down, but it disappears if the node is down when Netdata starts.
To test this, I restarted the Netdata service and saw the amount of nodes slowly being populated – unsurprisingly the offline node never made it into the list.
So I assume the problem is that the list on the left is, upon startup, populated based on which nodes are calling home, not what data is available in the database.
It would be neat if Netdata actually cached what nodes it had seen previously, regardless of whether they are currently streaming or not, so that the list could include nodes for which there’s only historical data available.
Maybe one solution could be that child nodes are saved on shutdown to a config file which is read upon start (which means users could also delete systems from the list manually by editing the file).
We are planning to add support for this (the work has already started) and should make it to the agent in the next release in about 6 weeks (in the nightlies before that)
You are correct the data is still there.
For example using the API you can actually check data for an offline node e.g. by visiting localhost:19999/host/child-node/api/v1/data?chart=system.cpu
where child_node is the child that remains offline after the parent restart.