I’m reporting the same problem as the original poster. I have installed netdata across my fleet of 63 RHEL 6 & 7 servers. Sometime in the overnight hours between May 18 to May 20 (can’t recall exactly which night), 55 of my servers became unreachable in Netdata Cloud. These servers all used the one-liner installation script with the argument to auto-update the agent to stable builds. The remaining 8 servers that are still functional installed Netdata via RPM and are not configured to auto-update. One night I went to bed and everything was fine. The next morning 55 servers were unreachable leading me to believe the agent happened to update overnight and a bug was introduced in this version.
A similar issue was reported recently here: https://community.netdata.cloud/t/claimed-nodes-not-reporting-to-cloud/1282/5 but I don’t know if that alleged solution is really accurate. Additionally, I had reported a similar issue last fall when first trying to get onboard with Netdata Cloud: https://github.com/netdata/netdata/issues/9624. Others suspected it was an issue with CA root certs but there was no definitive answer and a subsequent update to the agent seemed to fix the problem. A few other similar reports as well from the past: https://github.com/netdata/netdata/issues/9206 and https://github.com/netdata/netdata/issues/8966.
# /opt/netdata/bin/netdata -W buildinfo
Output of: /opt/netdata/bin/netdata -W buildinfo
Version: netdata v1.31.0
Configure options: '--prefix=/opt/netdata/usr' '--sysconfdir=/opt/netdata/etc' '--localstatedir=/opt/netdata/var' '--libexecdir=/opt/netdata/usr/libexec' '--libdir=/opt/netdata/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--enable-cloud' '--with-bundled-lws=externaldeps/libwebsockets' '--with-libJudy=externaldeps/libJudy' 'CFLAGS=-static -O3 -I/openssl-static/include' 'LDFLAGS=-static -L/openssl-static/lib' 'PKG_CONFIG_PATH=/openssl-static/lib/pkgconfig'
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
Cloud Implementation: Legacy
TLS Host Verification: YES
Libraries:
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
LWS: YES static v3.2.2
mosquitto: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: NO
IPMI: NO
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES
Output of `error.log` filtered for aclk
# error.log
2021-05-27 20:47:34: netdata INFO : ACLK_Main : Attempting to establish the agent cloud link
2021-05-27 20:47:34: netdata INFO : ACLK_Main : Retrieving challenge from cloud: app.netdata.cloud 443 /api/v1/auth/node/e8b2efa0-b9d7-11eb-a1df-b8ca3a6562a0/challenge
2021-05-27 20:47:34: netdata INFO : ACLK_Main : aclk_send_https_request GET
2021-05-27 20:47:34: netdata ERROR : ACLK_Main : Libwebsockets: Unable to open socket
(errno 97, Address family not supported by protocol)
2021-05-27 20:48:05: netdata ERROR : ACLK_Main : Servicing LWS took too long.
2021-05-27 20:48:05: netdata ERROR : ACLK_Main : Challenge failed: (errno 22, Invalid argument)
2021-05-27 20:48:05: netdata INFO : ACLK_Main : Retrying to establish the ACLK connection in 1024.000 seconds
2021-05-27 21:05:09: netdata INFO : ACLK_Main : Attempting to establish the agent cloud link
2021-05-27 21:05:09: netdata INFO : ACLK_Main : Retrieving challenge from cloud: app.netdata.cloud 443 /api/v1/auth/node/e8b2efa0-b9d7-11eb-a1df-b8ca3a6562a0/challenge
2021-05-27 21:05:09: netdata INFO : ACLK_Main : aclk_send_https_request GET
2021-05-27 21:05:09: netdata ERROR : ACLK_Main : Libwebsockets: Unable to open socket
(errno 97, Address family not supported by protocol)
2021-05-27 21:05:40: netdata ERROR : ACLK_Main : Servicing LWS took too long.
2021-05-27 21:05:40: netdata ERROR : ACLK_Main : Challenge failed: (errno 22, Invalid argument)
2021-05-27 21:05:40: netdata INFO : ACLK_Main : Retrying to establish the ACLK connection in 1024.000 seconds
Hope this additional info can help.