Claimed nodes not reporting to Cloud

I know that this problem has come up before, but I don’t see any solutions.

I have many nodes, but 5 of them are not reporting. All 5 are on Ubuntu 18.04. I installed NetData via the kickstart script. I then claimed the nodes. They show up in the Cloud dashboard, but no stats are shown. When I hover my mouse over the node, it says:

“The agent on this node has an older version and an update is required. Please upgrade to agent version v1.26 or above.”

Well, all 5 of these nodes are above that:

# netdata -V
netdata v1.30.1-140-nightly

I tried re-installing netdata on one of these, but no luck. I also tried restarting the netdata service, but that didn’t work either.

I have other Ubuntu 18.04 nodes that report to the cloud just fine. Also, directly visiting the dashboard on the node works just fine.

Hello, We are investigating the issue, and we will revert back to you.

@netdatauser can you take a look at:

  • what does error log say if you filter it for aclk keyword (grep -ai aclk error.log)?
  • What does netdata -W buildinfo say?

Thanks for the quick response!

what does error log say if you filter it for aclk keyword ( grep -ai aclk error.log )?

Looks like this is repeating in the log:

2021-05-12 09:27:55: netdata ERROR : ACLK_Main : Challenge failed: 
2021-05-12 09:27:55: netdata INFO  : ACLK_Main : Retrying to establish the ACLK connection in 1024.000 seconds
2021-05-12 09:44:59: netdata INFO  : ACLK_Main : Attempting to establish the agent cloud link
2021-05-12 09:44:59: netdata INFO  : ACLK_Main : Retrieving challenge from cloud: app.netdata.cloud 443 /api/v1/auth/node/<removed>/challenge
2021-05-12 09:44:59: netdata INFO  : ACLK_Main : aclk_send_https_request GET
2021-05-12 09:44:59: netdata ERROR : ACLK_Main : Libwebsockets: SSL error: unable to get local issuer certificate (preverify_ok=0;err=20;depth=1)

What does netdata -W buildinfo say?

Here is the output:

Version: netdata v1.30.1-142-nightly
Configure options:  '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--with-bundled-lws=externaldeps/libwebsockets' 'CFLAGS=-O2' 'LDFLAGS='
Features:
    dbengine:                YES
    Native HTTPS:            YES
    Netdata Cloud:           YES 
    Cloud Implementation:    Legacy
    TLS Host Verification:   YES
Libraries:
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    LWS:                     YES static v3.2.2
    mosquitto:               YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    YES
    IPMI:                    NO
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: NO

It seems like maybe “Libwebsockets: SSL error: unable to get local issuer certificate” is the problem? Not sure what that is looking for exactly though.

Yes it indeed looks like certificate verification is the problem for some reason.

app.netdata.cloud uses Let’s Encrypt certificate. I will install VM with Ubuntu 18.04 tomorrow (probably) to see if I can reproduce the issue.

@Austin_Hemmelgarn I remember we had long time ago issues with certificate verification on some systems but can’t quite recall what it was. Do you remember?

Yes, we had some issues at one point, but it was never on Ubuntu.

The only things that should cause this kind of error on Ubuntu are either not having the ca-certificates package installed or using a very atypical OpenSSL client configuration.

Something is odd about the ca-certificates on these specific hosts. I do have other Ubuntu 18.04 servers that work fine.

Using update-ca-certificates has some errors about newlines, and so some of the ca certs aren’t being linked. I’ll have to see if I can resolve that.

I found a script mistakenly named test in the $PATH that was related to some internal tools. This caused the update-ca-certificates to not complete properly as it was using the wrong test. As a result, there was missing root certs in the generated file(s).

Thanks for helping me find the errors in the log. That really got me going in the right direction.

3 Likes