Child node appears ass "off" eventhough I see live data

Hi,

I have set up streaming from clients to a parent node. Everything seemed ok until one client suddenly shows as “off” in the node list. The odd thing is that it is still sending live metrics and they show on the parent dashboard as they should.

What can be wrong and how do I remedy this?

This is the log on the child node (srv08):

2022-08-18 14:13:39: netdata ERROR : PLUGIN[timex] : STREAM srv08 [send]: not ready - discarding collected metrics. (errno 22, Invalid argument)
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : thread created with task id 49128
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : set name of thread 49128 to STREAM_SENDER[d
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : STREAM srv08 [send]: thread created (task id 49128)
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : STREAM srv08 [send to srv04.example.com]: connecting...
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : STREAM srv08 [send to srv04.example.com]: initializing communication...
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : STREAM srv08 [send to srv04.example.com]: waiting response from remote netdata...
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : Stream is uncompressed! One of the agents (srv04.example.com <-> srv08) does not support compression OR compression is disabled.
2022-08-18 14:13:39: netdata INFO  : STREAM_SENDER[srv08] : STREAM srv08 [send to srv04.example.com]: established communication with a parent using protocol version 4 - ready to send metrics...
2022-08-18 14:13:39: netdata INFO  : PLUGIN[proc] : STREAM srv08 [send]: sending metrics...

Odd thing is this:

  • if I turn off srv08, then srv07 also shows as “off”
  • if I then also turn off srv07, then srv06 also shows as “off”
  • if I then also turn off srv06, then print also shows as “off”

Hi @Forza !

Quick question, does every child have it’s own unique API_KEY (Streaming and replication | Learn Netdata) ?

No. They all use the same API key. I thought this was OK? I dont use the [MACHINE_GUID] option in stream.conf on the parent.

I did double check that all nodes do have a unique machine guid. Perhaps it is a problem because the children had an older version than the parent?

I decided to deploy a new version netdata on these nodes and reset the dbengine database. Now it all seems ok.

Good to see it worked. But will try to see if there is anything there weird, perhaps related to the registry or the api keys.

Thanks for the follow up!

1 Like