Nodes unreachable (errno 99, Cannot assign requested address)

Simply running the latest kickstart-static64.sh version with the --reinstall option should do it. The default is to use the nightlies (and passing --reinstall will cause it to skip reus9ing any installation options from the existing install), and because the static builds are pre-built there should be nothing needed to get ACLK-NG support (if there isn’t ACLK-NG support without any options passed to enable it for the nightly builds, then that’s a bug that I am not looking forwards to trying to debug…).

@Austin_Hemmelgarn Thanks, my initial testing shows that reinstalling using the latest nightly and enabling aclk-ng in netdata.conf has now restored connectivity to Netdata Cloud.

In general, I prefer to run only stable releases, but I do want to get my nodes back online immediately. If I were to reinstall using nightlies across my fleet to get them all reporting again, could I then just edit the .environment file and change back to RELEASE_CHANNEL="stable", thereby staying on the current nightly until the next stable version is released?

I actually don’t know if that would wait to upgrade until the next stable release, or just immediately revert to the previous stable release. I think it should behave as you expect, but it’s not exactly something we ever intended in terms of usage, so I’m just not sure.

My suggestion would be to test it on one system and see how it behaves. You can manually run /opt/netdata/usr/libexec/netdata/netdata-updater.sh --not-running-from-cron to explicitly run the updater code early so you don’t have to wait for the cronjob to run to see if it works or not. If it decides to wait for the next stable, you should see comparatively little output, but if it decides to roll things back, the output will be similar to what you saw when you reinstalled with the nightly version.

@Austin_Hemmelgarn circling back to this. When manually editing the .environment file to switch back to stable builds, it does actually immediately revert back to the previous stable release, thereby breaking the Netdata Cloud connectivity again. Given this scenario, it means I couldn’t switch back to stable builds until after ACLK-NG is available in a stable build.

I know I can check up on releases here, but is there a way I can be notified when updates are released, ie: newsletter?

GitHub watch options, Custom, Select Releases.

Cloud users also receive the newsletters that point back to the release notes.
We had a sign up for newsletters only, but I can’t find it any more, will ask about it.

1 Like

Hello. I can reproduce this issue even with the aclk-ng installation argument. 3 out of the 8 servers we have are showing unreachable. We run Ubuntu 20.04.1 LTS GNU/Linux 5.4.0-47-generic x86_64 on most of the two of them and one is on 18.04 LTS. If you need logs or more information, do ask.

Nikos

1 Like

Pinging @underhood who is leading the efforts in aclk-ng :slight_smile:

@eramsorgr We cannot be sure this is the same issue at this moment.

Can you provide error.log filtered for aclk? e.g. grep -ai aclk /var/log/netdata/error.log?

@underhood there seems to be no error logs in two of the three node (that don’t show up) even after restarting the netdata agent and attempt to manually run the update cronjob (in case something was broken in the installations) the local dashboard works fine but ‘unreachable’ on cloud.

However, the third node that is facing the issue has an error log and it’s quite big. It’s mostly repeating itself as the agent is (speculating here) trying to connect to the cloud. Here’s the log file from my cdn

Cheers,
Nikos

1 Like

@eramsorgr would you be willing to share
/api/v1/charts and /api/v1/info from local dashboard of the agent you gave me error.log from. You can send it to me timotej on netdata.cloud in case posting publicly is an issue

@underhood I have sent both logs on your Discord account since I found you there and I believe its easier to share those files. Let me know if you have received them.

Nikos

Thanks I got it. I think I know the problem already.
It is indeed different problem from what guys above mentioned.
I will have to take some time to fix this properly. Would you mind using ACLK Legacy until I do so?

No problem and sure thing. I will try reverting to ACLK legecy and just to confirm, I should simply remove it from the config right?

@eramsorgr you can check what ACLK implementations are available in your agent (depending how you built/installed it) by running netdata -W buildinfo . In case both are available switching to Legacy is simple configuration change in netdata.conf.

Alrighty, @underhood it seems to be showing as reachable after setting the ACLK mode to legacy. So far so good. Gonna try this to the other two nodes and edit this post.

EDIT: Seems like the rest two nodes, the ones that also showed no errors in the error logs (netdata works perfectly fine locally) are still unreachable.

@eramsorgr seems our assumption that all 3 nodes are experiencing the same issue is wrong… Do they have the Legacy available as well? If it is not the config setting will be ignored.

If they do can we take a peek at error.log of those as well?

Yeah they have legacy available, @underhood however using grep or even nano to see the error.log outputs nothing, it’s squeaky clean for some reason! No clues.

are they native nodes? dockerized? Could it be the install prefix is something like /opt in which case the path would be /opt/netdata/var/log/netdata/error.log?

Oops noticed the wrong mention : P

Netdata was installed in all nodes via running the one-line installer, not dockerized. No results in both nodes in /opt/…

Seems like most nodes are up, now one that was working is now showing unreachable.
ACLK error logs

I am unsure if it’s using ACLK ng or legacy, the config value is commented out.