Netdata Community

Cannot move node

I moved a VM to another host and I copied the /var/lib/netdata folder so that it would retain the cloud configuration. The agent is running, but it can’t connect to the cloud, and in the error log I see

2021-06-21 07:00:46: netdata INFO  : ACLK_Main : Attempting connection now
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2021-06-21 07:00:47: netdata ERROR : ACLK_Main : Decryption of the challenge failed: error:04099079:rsa routines:RSA_padding_check_PKCS1_OAEP_mgf1:oaep decoding error
2021-06-21 07:00:47: netdata ERROR : ACLK_Main : Output buffer for encoding size=512 is not large enough for 18446744073709551615-bytes input
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 403
2021-06-21 07:00:47: netdata ERROR : ACLK_Main : ACLK_OTP Password HTTP code not 201 Created (got 403)
2021-06-21 07:00:47: netdata ERROR : ACLK_Main : Cloud returned EC="TODO trace-id", Msg-Key:"ErrIncorrectResponse", Msg:"incorrect challenge response", BlockRetry:false, Backoff:0s (-1 unset by cloud)
2021-06-21 07:00:47: netdata ERROR : ACLK_Main : Error passing Challenge/Response to get OTP
2021-06-21 07:00:47: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds

2021-06-21 07:00:47: netdata INFO  : ACLK_Main : Attempting connection now
2021-06-21 07:00:48: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2021-06-21 07:00:48: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2021-06-21 07:00:48: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 409
2021-06-21 07:00:48: netdata ERROR : ACLK_Main : ACLK_OTP Challenge HTTP code not 200 OK (got 409)
2021-06-21 07:00:48: netdata ERROR : ACLK_Main : Cloud returned EC="TODO trace-id", Msg-Key:"ErrDuplicatedChallenge", Msg:"delay retry 1m0s: duplicated challenge", BlockRetry:false, Backoff:60s (-1 unset by cloud)
2021-06-21 07:00:48: netdata ERROR : ACLK_Main : Error passing Challenge/Response to get OTP
2021-06-21 07:00:48: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 60.000 seconds

especially the ACLK_Main : Output buffer for encoding size=512 is not large enough for 18446744073709551615-bytes input seems concerning?

I also don’t have the old metrics any more, can I copy those from somewhere? I still have the old image available.

@wmertens thanks for the report. Could you paste output of netdata -W buildinfo of the offending agent?
I will try to investigate the issue ASAP.

@underhood

Version: netdata v1.31.0
Configure options:  '--disable-dependency-tracking' '--prefix=/nix/store/mhms2gjllhpqmq66qvv7a4fwcw8z6xcn-netdata-1.31.0' '--localstatedir=/var' '--sysconfdir=/etc' '--enable-cloud' '--with-aclk-ng' 'CC=gcc' 'CXX=g++' 'PKG_CONFIG=pkg-config' 'PKG_CONFIG_PATH=/nix/store/iajslqx0fpnvfvgdxxipx3lr5yg59k1m-curl-7.76.1-dev/lib/pkgconfig:/nix/store/5q2d222hp22aq05pnp4yfszd3kyk3y42-nghttp2-1.43.0-dev/lib/pkgconfig:/nix/store/ilfrz428psdz4gzs4p68xna76gv585pq-libidn-1.36-dev/lib/pkgconfig:/nix/store/7whk47dml5f1dpvy6cgvaf80ll1h7pkd-zlib-1.2.11-dev/lib/pkgconfig:/nix/store/az4514nlmahn2vna0vjbir7c5gd3lsir-libkrb5-1.18-dev/lib/pkgconfig:/nix/store/7mdjirqjkx4g66bna1c7p4dglzp5j3yr-openssl-1.1.1k-dev/lib/pkgconfig:/nix/store/pl4dm5p8rcxldvn5jwydi2isr13lw52w-libssh2-1.9.0-dev/lib/pkgconfig:/nix/store/lyb99rmgggp7fdfmbpqa9nqmwvv25ldc-brotli-1.0.9-dev/lib/pkgconfig:/nix/store/ky4crjw22rm0fas3ar0074ciwybbdg10-libcap-2.48-dev/lib/pkgconfig:/nix/store/sj22y1anmk91csb5ccia48010p3f1a91-attr-2.4.48-dev/lib/pkgconfig:/nix/store/lp97ksl2l0d0z2mgslv6jcivajj1yywr-util-linux-2.36.2-dev/lib/pkgconfig:/nix/store/rcbinhm2wic6a12mpfs4v56x5kchzjv9-libuv-1.41.0/lib/pkgconfig:/nix/store/dav96yaasmxgpwajbpnyd90zdigjjb2c-lz4-1.9.3-dev/lib/pkgconfig:/nix/store/dvqbn4hikq1yd74bz15hhzkmxx3xj3x2-freeipmi-1.6.8/lib/pkgconfig:/nix/store/jk1dqr21vhcndq8s3jmy38ah9v66dya8-libmnl-1.0.4/lib/pkgconfig:/nix/store/4q27irlas45yknk3f1b4x2wprbil445m-libnetfilter_acct-1.0.3/lib/pkgconfig:/nix/store/w1nk8cbv3a8mg6jyywd51pfx55bh7zvb-json-c-0.15-dev/lib/pkgconfig'
Features:
    dbengine:                YES
    Native HTTPS:            YES
    Netdata Cloud:           YES 
    Cloud Implementation:    Next Generation
    TLS Host Verification:   YES
Libraries:
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  YES
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    NO
    IPMI:                    YES
    NFACCT:                  YES
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: NO

(incidentally, should we try building with jemalloc and/or tcalloc for some great benefit?)

If you want to try the binaries yourself, install Nix on any Linux and then run nix-shell -p netdata, this will make it ephemerally avaliable.

@wmertens I will try the Nix agent. The one you move from is the same version? The error log posted seems at first sight not directly related to the transfer of the credentials.

I seem to have an issue using netdata from nix-shell. From claiming to netdata not starting due to missing files etc. I guess will have to study the thing a bit first.

Oh right, the configuration actually comes from the “module” which is NixOS-only, not nixpkgs (which you are using now). So the binary is the same but it’s missing the config file and directory configuration.

This is the systemd configuration it makes: nixpkgs/netdata.nix at 3be5e9248e23f5dbf0863c9fc6a0f2cbe1ef3484 · NixOS/nixpkgs · GitHub

and the plugins are actually wrapped in a separate package before passing it to netdata: nixpkgs/netdata.nix at 3be5e9248e23f5dbf0863c9fc6a0f2cbe1ef3484 · NixOS/nixpkgs · GitHub

@wcmertens what is the right way to start using netdata in nixos env? Should I use nixos instead of nixpkgs?

Yes installing NixOS would be best, then you have to put in the system configuration

services.netdata.enable = true;

nix-env -iA nixos.netdata seems to install netdata without cloud support for me on NixOS VM. Guess I have to make it work with nixpkgs somehow

Ah yes, that’s being worked on in master. Edit your /root/.nix-channels file to point to nixos-unstable, and run nixos-rebuild switch --update

@underhood yesterday a patch was merged into nixos-unstable that fixes communication for netdatacli. So with the following configuration:

  services.netdata.enable = true;

  environment.systemPackages = with pkgs; [ netdata ];

you can just run the netdata commands

Thanks I will try that again ASAP. Currently a bit in crunch mode for some new features to be released. But didn’t forget about this.