Host shows "off" but graphs are drawn

Hi.

I have a Netdata server configured to collect data from other hosts based on the streaming option. One day, one of my hosts shows as “off” but when I access it, the graphs are correctly drawn. The agent on this server is running, it has been restarted, and the agent and the server and the problem are the same all the time. Both the host and the server have Netdata version 1.36.1

Thanks for the help

This looks like a bug. Can you restart Netdata on host Gieta again? It looks like the cloud wasn’t properly updated with its live state after the recent restart. If it’s a one time thing we can perhaps ignore it, but if you see the issue persisting after the additional restart, we’ll need to investigate further.

Interesting problem. If the agent is turned off on the Gieta host, the “off” sign in the panel shows up to me on another host that has nothing to do with the Gieta host and the graphs for the Gieta stop being drawn. The other host, which is now “off”, has correctly drawn plots.

If I turn on an agent on the Gieta host, the other host jumps to “on” and the graphs for the Gieta host start to be drawn.

The Gieta host is “off” all the time and the graphs are correctly drawn after restarting the agent on this host.

Scratching my head a bit here, but:

  • Did you by any chance clone a VM or somehow else copied entire directories between the hosts at any time? Netdata keeps some identifying info on the filesystem, so things can get a bit screwed up this way, sometimes with only one of the two allowed to connect at the same time. Specifically, under /var/lib/netdata, you’ll find a cloud.d directory and a registry directory. The contents of these two should be different for each agent. If you see any of the two being identical between the two hosts, report it here and we’ll let you know how to clean it up.
  • You can see the hostname each agent thinks it has by calling http://localhost:19999/api/v1/info It’s also visible in http://localhost:19999/netdata.conf . I suggest you call such an endpoint on both and see what they say.

Hi,

I have the same issue. Initially I thought it was due to corruption in the dbengine and I removed old files and restarted the parent and nodes, but the issue has come back.

This problem was brought up again in Discord and the bug was identified:

From @Jacek_Kolasa :

We’ve found the issue, it was a bug on Dashboard UI side. We’ve been showing wrong badges when Agent had archived hosts. We’ll fix that, will let you know when it will be on nightlies. Thanks a lot for submitting it here!

1 Like

Great! Thank you for letting us know :slight_smile:

Hi.

Sorry to disappear but unfortunately no time :frowning: Do you need any additional information from me or do you have everything debugged?

Hi @ServerUser , I think we have what we need, thanks!