Upgrade from 1.38 to 1.40 no cloud .. still unseen

I have 4 nodes here that are under 1.39.
Also I have now an agentupgraded with the help of
“wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart. sh --nightly-channel --static-only --claim-token xx-cJr23lkD8KfNYtTfWrG_zvxqefSx3lF8cCyZrfFmi08Z2gNeVjB3JTk606EyM9r8qr-xx --claim-rooms f8f41cf3-1d25-xx-8bb5-5f37af681d17 --claim-url https://app.netdata.cloud --claim-id “$(uuidgen)””
Reinstalled, I guess that worked.
"Unable to communicate with Netdata daemon, querying config from disk instead.
Token: ****************
Base URL: https://app.netdata.cloud
Id: 04d31624-xx-xx-bfc1-ce8271b28c9c
Rooms: f8f41cf3-xx-xx-xx-5f37af681d17
Hostname: xx01
Proxy:
Netdata User: netdata
Connection attempt 1 successful
uv_pipe_connect(): no such file or directory
Make sure that the netdata service is running.
The request was successful, but the agent could not be notified (0) - it needs a restart to connect to the cloud.
OK

— Successfully claimed node —
Official documentation can be found online at https://learn.netdata.cloud/docs/."
Restarting netdata does not help, how do I get my node back to the cloud?

My system is an old “Ubuntu 18.04.6 LTS” living as a virtual box guest

Hi @Bernd

According to the above, it appears the netdata agent is not running. Can you check, either with e.g. systemctl status netdata, or doing a ps aux | grep netdata ?

Thank you

$ /etc/init.d/netdata status
● netdata.service - Real time performance monitoring
   Loaded: loaded (/lib/systemd/system/netdata.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2023-06-19 13:46:22 CEST; 21min ago
  Process: 29974 ExecStartPre=/bin/chown -R netdata /run/netdata (code=exited, status=0/SUCCESS)
  Process: 29971 ExecStartPre=/bin/mkdir -p /run/netdata (code=exited, status=0/SUCCESS)
  Process: 29961 ExecStartPre=/bin/chown -R netdata /opt/netdata/var/cache/netdata (code=exited, status=0/SUCCESS)
  Process: 29950 ExecStartPre=/bin/mkdir -p /opt/netdata/var/cache/netdata (code=exited, status=0/SUCCESS)
 Main PID: 29975 (netdata)
    Tasks: 106 (limit: 4915)
   CGroup: /system.slice/netdata.service
           ├─29975 /opt/netdata/bin/srv/netdata -P /run/netdata/netdata.pid -D
           ├─30003 /opt/netdata/bin/srv/netdata --special-spawn-server
           ├─30225 bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1
           ├─30232 /opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin 1
           ├─30234 /opt/netdata/usr/libexec/netdata/plugins.d/debugfs.plugin 1
           ├─30248 /opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin 1
           ├─30260 /opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin 1
           └─30277 /opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin 1
ps aux |grep netdata
servera+ 12289  0.0  0.0  14784  1120 pts/0    S+   14:09   0:00 grep --color=auto netdata
netdata  29975  4.2  0.9 459048 93652 ?        SNsl 13:46   0:58 /opt/netdata/bin/srv/netdata -P /run/netdata/netdata.pid -D
netdata  30003  0.0  0.0  20672  4156 ?        SNl  13:46   0:00 /opt/netdata/bin/srv/netdata --special-spawn-server
netdata  30225  0.1  0.0   1928  1484 ?        SN   13:46   0:02 bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1
root     30232  4.5  0.0  10804  6152 ?        SNl  13:46   1:02 /opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin 1
root     30234  0.0  0.0   5128     4 ?        SN   13:46   0:00 /opt/netdata/usr/libexec/netdata/plugins.d/debugfs.plugin 1
root     30248  0.1  0.0  12660  3344 ?        SNl  13:46   0:01 /opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin 1
root     30260  0.0  0.0   5116     4 ?        SN   13:46   0:00 /opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin 1
netdata  30277  1.8  0.5 774660 54592 ?        SNl  13:46   0:24 /opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin 1

Seems ok

and port 19999 looks ok, with plenty of charts

Could you please grep your error.log (should be under /opt/netdata/var/log or similar) for ACLK messages? ACLK stands for Agent Cloud Link and should shed some light as to what is going on.

Sure

# grep ACLK /opt/netdata/var/log/netdata/error.log 
2023-06-19 13:21:27: netdata INFO  : MAIN : ACLK sync initialization completed
2023-06-19 13:21:27: netdata INFO  : ACLKSYNC : Starting ACLK synchronization thread
2023-06-19 13:21:27: netdata INFO  : ACLK_MAIN : thread created with task id 10475
2023-06-19 13:21:27: netdata INFO  : ACLK_MAIN : set name of thread 10475 to ACLK_MAIN
2023-06-19 13:21:27: netdata INFO  : ACLK_MAIN : Waiting for Cloud to be enabled
2023-06-19 13:34:26: netdata INFO  : ACLKSYNC : ACLK SYNC: Shutting down ACLK synchronization event loop
2023-06-19 13:34:26: netdata INFO  : ACLK_MAIN : thread with task id 10475 finished
2023-06-19 13:36:40: netdata INFO  : MAIN : ACLK sync initialization completed
2023-06-19 13:36:40: netdata INFO  : ACLKSYNC : Starting ACLK synchronization thread
2023-06-19 13:36:40: netdata INFO  : ACLK_MAIN : thread created with task id 20522
2023-06-19 13:36:40: netdata INFO  : ACLK_MAIN : set name of thread 20522 to ACLK_MAIN
2023-06-19 13:36:40: netdata INFO  : ACLK_MAIN : Waiting for Cloud to be enabled
2023-06-19 13:37:34: netdata INFO  : ACLKSYNC : ACLK SYNC: Shutting down ACLK synchronization event loop
2023-06-19 13:37:35: netdata INFO  : ACLK_MAIN : thread with task id 20522 finished
2023-06-19 13:42:04: netdata INFO  : MAIN : ACLK sync initialization completed
2023-06-19 13:42:04: netdata INFO  : ACLKSYNC : Starting ACLK synchronization thread
2023-06-19 13:42:04: netdata INFO  : ACLK_MAIN : thread created with task id 26638
2023-06-19 13:42:04: netdata INFO  : ACLK_MAIN : set name of thread 26638 to ACLK_MAIN
2023-06-19 13:42:04: netdata INFO  : ACLK_MAIN : Waiting for Cloud to be enabled
2023-06-19 13:46:20: netdata INFO  : ACLKSYNC : ACLK SYNC: Shutting down ACLK synchronization event loop
2023-06-19 13:46:20: netdata INFO  : ACLK_MAIN : thread with task id 26638 finished
2023-06-19 13:46:24: netdata INFO  : MAIN : ACLK sync initialization completed
2023-06-19 13:46:24: netdata INFO  : ACLKSYNC : Starting ACLK synchronization thread
2023-06-19 13:46:24: netdata INFO  : ACLK_MAIN : thread created with task id 30202
2023-06-19 13:46:24: netdata INFO  : ACLK_MAIN : set name of thread 30202 to ACLK_MAIN
2023-06-19 13:46:24: netdata INFO  : ACLK_MAIN : Waiting for Cloud to be enabled

Thanks!

Can you check if there’s anything under /opt/netdata/var/lib/netdata/cloud.d/ ?

Does this help?

# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 5
Claimed: No
Online: No
Reconnect count: 0
Banned By Cloud: No

No, it lies around here
cat /var/lib/netdata/cloud.d/cloud.conf
[global]
enabled = yes
cloud base url = https://app.netdata.cloud

How about this?

# diff -y /var/lib/netdata/netdata.api.key /opt/netdata/var/lib/netdata/netdata.api.key 
80291d61-b106-11ed-9455-0800277acf9a			      |	6ec9f8a6-0e93-11ee-b9b1-0800277acf9a

they differ

So it appears that the script somehow failed.

Static builds are installed under /opt. That’s where the claim script (run from kickstart) should have placed the claim information. Instead they are under /var which is when you have a netdata package installed.

Have you done an upgrade from a system package to static install?

We will try to replicate this, could be a bug in the kickstart script.

I could provide the history of commands which I submitted, email or so

Solved by myself. copied the /var/lib/netdata stuff to /opt/netdata
It seems the new node was already there but nt visible. in the cloud menue in top clicking on all nodes reveal both, the old and the new one :slight_smile:
There is still a long standing wish: Hide or destroy the old nodes which are no member anymore