Cloud complains that agents need update

Problem/Question

Hi,

from time to time cloud is complaining that my agents need to be updated to a version newer than v.1.26 when I have already installed v.1.31.

The solution to the problem is to restart the “netdata” daemon but should this be happening in the first place?

Environment/Browser

I have to say here that I am on stable release and not on nightly.
I am also using the NextGeneration (NG) ACLK.

Here is the output from Netdata-Updater

Updater

# ./netdata-updater
Checking if a newer version of the updater script is available.
Downloading newest version of updater script.
Thu Sep 16 09:14:35 EEST 2021 : INFO: Running on a terminal - (this script also supports running headless from crontab)
Thu Sep 16 09:14:35 EEST 2021 : INFO: Current Version: 00103100000000
Thu Sep 16 09:14:35 EEST 2021 : INFO: Latest Version: 00103100000000
Thu Sep 16 09:14:35 EEST 2021 : INFO: Newest version (current=00103100000000 >= latest=00103100000000) is already installed

and here is the buildinfo:

BuildInfo

# netdata -W buildinfo
Version: netdata v1.31.0
Configure options: ‘–prefix=/usr’ ‘–sysconfdir=/etc’ ‘–localstatedir=/var’ ‘–libexecdir=/usr/libexec’ ‘–libdir=/usr/lib’ ‘–with-zlib’ ‘–with-math’ ‘–with-user=netdata’ ‘–with-aclk-ng’ ‘CFLAGS=-O2’ ‘LDFLAGS=’
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
Cloud Implementation: Next Generation
TLS Host Verification: YES
Libraries:
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: YES
IPMI: NO
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: NO

What I expected to happen

I was not expecting to see an error/warning like this and that “netadata” daemon would not need to be restarted every now and then.

We saw last week that occasionally the cloud doesn’t receive the node “info” upon connection and I suspect this may be one of the effects.

The following is from the list of the nodes we have in the space that monitors our own infra. It’s ok that they’re both unreachable, they are old instances. But the question is why one says needs update and the other one doesn’t.

image

As I said, I suspected node info, so I ran the following in BQ:

SELECT * from `netdata-analytics-bi.cloud_prod.node_nodes_info_latest` where node_id in (
  SELECT id FROM `netdata-analytics-bi.cloud_prod.node_nodes_latest` 
  WHERE name in ("gke-production-main-ce0f778a-mj08","gke-production-main-ce0f778a-mb0c")
)

But I got proper node info results for both of them from the DB. We should check cockroachDB as well, to be sure.

I then suspected that we may be caching something on the browser. Wiped out everything, logged in again and got the same thing. The BE responses to the nodes call are big and I don’t know enough to check them, so someone will look further into what’s happening.

Great! Thanks for the feedback!

Will wait for an update. Let me know if you need anything from my side

Can you provide the first and last part of one of a couple of your nodes’ ids, which can be seen on the URL of the single node screen? (I really doubt the full id can be used for anything sinister, but better safe than sorry).

So in this case, I’d need 83d60d5a-%-590e21cf2fc5

Hi @Christopher_Akritid1,

here are two of them:

d6096841-%-c24f0106383b
038d7006-%-f2a9b2f4b74c

which were the ones that had issues last time.

I have to say here that since the last time I have restarted the service at the nodes is still OK but do not know when they will fail again.

We believe we identified the root cause in how we handle missing version information, that can happen in some cases. We’ll get it fixed very soon.

Excellent!

Thank you!!!

Just let us know when it’s actually fixed in order to keep an eye if it appears again in the future.

Hello, just deployed a hotfix for this. Thank you for reporting it.

Thanks for the update @novykh .

Enjoy your weekend!!