Netdata cloud: docker-based agent often shown as offline

Problem/Question

I have a docker-based Netdata agent container running in Azure container instances based on a custom docker image that uses FROM netdata/netdata:latest, installs some dependencies and sets up the configuration files.
I also use it as a statsd server for custom metrics I’m sending to it. It’s connected to Netdata cloud and sometimes I can access the metrics on app.netdata.cloud. However, often the node is shown as “offline” and the metrics are inaccessible; even though I can connect to the instance via SSH and the logs of the Netdata agent container look normal (it seems to be online).

Relevant docs you followed/actions you took to solve the issue

  • tried restarting the agent (after which it is mostly online for a while, but eventually it goes back offline)
  • change different settings in netdata.conf - problem persisted for all combinations I tried

Environment/Browser/Agent’s version etc

  • the docker container only exposes a UDP port for the statsd server - 19999 is not exposed, since we only use Netdata cloud
  • netdata agent version (netdata -v): netdata v1.36.0-333-nightly
  • netdatacli aclk-state (shows “online yes”, but the node is not shown as online in the cloud dashboard):
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 5
Claimed: Yes
Claimed Id: xxxx
Cloud URL: https://app.netdata.cloud
Online: Yes
Reconnect count: 4
Banned By Cloud: No
Last Connection Time: 2022-11-10 15:22:58
Last Connection Time + 3 PUBACKs received: 2022-11-10 15:22:58
Last Disconnect Time: 2022-11-10 15:22:56
Received Cloud MQTT Messages: 20
MQTT Messages Confirmed by Remote Broker (PUBACKs): 19

> Node Instance for mGUID: "xxxx" hostname "SandboxHost-xxxx"
	Claimed ID: xxxxx
	Node ID: xxxxx
	Streaming Hops: 0
	Relationship: self
	Alert Streaming Status:
		Updates: 1
		Batch ID: 1
		Last Acked Seq ID: 331
		Pending Min Seq ID: 0
		Pending Max Seq ID: 0
		Last Submitted Seq ID: 331

What I expected to happen

The node is shown as online and the data is accessible in Netdata cloud whenever it is actually online (i.e., continuously).

Hello @cgfloat,

first of all thanks for using Netdata.
Could you please provide in a DM you claim_id so as to take a look and understand what might be happening?

Thanks in advance :pray:

Also, your local Agent logs could be very handy to assist in investigating this.

You could get this using the docker logs ${netdata_container_id} command

Hi @papazach, thanks for looking into this. Unfortunately, I’m unsure how I can send you a DM - I don’t find any option in this forum. Maybe I lack permissions? Otherwise, could you share how your email address, perhaps?
Thanks in advance!

Hi @cgfloat

You can send the logs to manolis@netdata.cloud and we’ll check with @papazach

Hey @Manolis_Vasilakis we’re already in the talks in DMs with @cgfloat and analysing the extra logs provided :+1:

1 Like