After some struggles to update netdata agent, finally it got updated, but now the service fails to start.

Hi guys,

I have a Debian 10 which was running I think 1.34.0 up until now, I think the install type was legacy-build.
But I got notified that I should update it as soon as possible.

So I tried using the kickstart script to update it, but it failed. Then I tried reinstalling it, still didn’t update it. I think there was some kind of error related to "changed its 'Suite' value from 'stable' to 'oldstable'" but sadly I wasn’t paying attention.

Somehow, after lots of messing around, I managed to uninstall it, and then install it with the kickstart script.

Now it says that the install type is binpkg-deb and finally it is 1.36.0.

But now the service can’t be started. It’s stuck in activating phase because it exits with FAILURE all the time. I tried solutions on Google but they didn’t help. :\

netdata.service - Real time performance monitoring
   Loaded: loaded (/lib/systemd/system/netdata.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu 2022-08-25 13:28:27 CEST; 4s ago
  Process: 4757 ExecStart=/usr/sbin/netdata -D $EXTRA_OPTS (code=exited, status=1/FAILURE)
 Main PID: 4757 (code=exited, status=1/FAILURE)

Some of my other instances were easily updated with the kickstart script, their install type is (well, at least now for sure) binpkg-deb.

Could you help me please? Thanks in advance!

Hi @Tudvari

Can you check /var/log/netdata/error.log ? There should be some indication on why it fails to start there. If you’d like you can send it to manolis@netdata.cloud to have a look as well.

Thanks!

I tried to have a fresh log, so I started/restarted the service… but nothing new will be printed in the log, even if the service says that it got restarted 3 seconds ago.

Why I’m saying this is because the last log was like 15 minutes ago, related to some of our docker containers, while netdata shouldn’t be related to docker at all. And last time I tried starting netdata was like 1.5 hours ago.

But maybe these could be helpful, I found them in a 20 minutes old section of the log:

2022-08-25 13:29:35: charts.d: INFO: main: Configuration file ‘/usr/lib/netdata/conf.d/charts.d.conf’ loaded.
2022-08-25 13:29:35: charts.d: WARNING: main: Configuration file ‘/etc/netdata/charts.d.conf’ not found.
libbpf: prog ‘netdata_syscall_sync’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_syscall_sync’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_msync.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF SYNC : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_msync.5.4.o
libbpf: prog ‘netdata_syscall_sync’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_syscall_sync’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fsync.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF SYNC : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fsync.5.4.o
libbpf: prog ‘netdata_syscall_shmget’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_syscall_shmget’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_shm.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF SHM : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_shm.5.4.o
libbpf: prog ‘netdata_syscall_sync’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_syscall_sync’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fdatasync.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF SYNC : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fdatasync.5.4.o
libbpf: prog ‘netdata_syscall_sync’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_syscall_sync’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_sync_file_range.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF SYNC : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_sync_file_range.5.4.o
libbpf: prog ‘netdata_sys_open’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_sys_open’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fd.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF FD : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_fd.5.4.o
libbpf: prog ‘netdata_release_task’: BPF program load failed: Invalid argument
libbpf: failed to load program ‘netdata_release_task’
libbpf: failed to load object ‘/usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_process.5.4.o’
2022-08-25 13:29:35: ebpf.plugin ERROR : EBPF PROCESS : ERROR: loading BPF object file failed /usr/libexec/netdata/plugins.d/ebpf.d/rnetdata_ebpf_process.5.4.o

Suddenly the dashboard started to display the node as live, while last time I checked 2 hours ago, it was offline, due to the obvious reason that the service wasn’t running.

And now it’s being monitored while the service is down:

Ok, that is weird…

Can you please paste the output of netdata -W buildinfo ?

Sure:

Version: netdata v1.36.1
Configure options:  '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--with-user=netdata' '--with-math' '--with-zlib' '--with-webdir=/var/lib/netdata/www' '--disable-dependency-tracking' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security'
Install type: binpkg-deb
    Binary architecture: x86_64
    Packaging distro:
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES
    ACLK Next Generation:       YES
    ACLK-NG New Cloud Protocol: YES
    ACLK Legacy:                NO
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         NO
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    YES
    EBPF:                    YES
    IPMI:                    YES
    NFACCT:                  YES
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES

Something must have gone wrong during installation…

Could you try to completely remove netdata using the package manager, and then try to install again ? You could use the kickstart command.

i.e. using sudo dpkg --remove netdata & sudo apt-get purge netdata ?

After removing it I made sure that the service is non-existant, and yes it was.

Then I installed it with the kickstart script: “Successfully installed the Netdata Agent.”

But service status says:

● netdata.service - Real time performance monitoring
   Loaded: loaded (/lib/systemd/system/netdata.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2022-08-30 11:07:02 CEST; 7s ago
  Process: 17707 ExecStart=/usr/sbin/netdata -D $EXTRA_OPTS (code=exited, status=1/FAILURE)
 Main PID: 17707 (code=exited, status=1/FAILURE)

Aug 30 11:07:02 backup-hu netdata[17707]: EXIT: all done - netdata is now exiting - bye bye...

I tried restarting it, stopping and starting it, but all it changed is that the "now exiting - bye bye " line disappeared. :\

Can you share please the error.log to manolis@netdata.cloud ? Thanks.

1 Like

Sadly I still have this issue.

Few more hints: The PID of the netdata service increases every 20-30 seconds, I guess it tries to start again, getting a new PID.

I can’t even see any error logs, because in /var/log/netdata there is nothing. At first there were just old and empty current log there so I deleted them to see what happens… and still no log is being written.

Sometimes I see things like this in the systemctl status netdata command:

Sep 12 17:17:41 backup-hu netdata[14143]: EXIT: netdata prepares to exit with code 1…
Sep 12 17:17:41 backup-hu systemd[1]: netdata.service: Failed with result ‘exit-code’.
Sep 12 17:17:41 backup-hu netdata[14143]: EXIT: cleaning up the database…
Sep 12 17:17:41 backup-hu netdata[14143]: Cleaning up database [0 hosts(s)]…
Sep 12 17:17:41 backup-hu netdata[14143]: EXIT: removing netdata PID file ‘/var/run/netdata/netdata.pid’…
Sep 12 17:17:41 backup-hu netdata[14143]: EXIT: cannot unlink pidfile ‘/var/run/netdata/netdata.pid’.
Sep 12 17:17:41 backup-hu netdata[14143]: EXIT: all done - netdata is now exiting - bye bye…

Let me know if I can get some clues. I tried removing and purging netdata multiple times and then using the install script, but it always results in the same. (Always did it as root)

Current buildinfo:

Version: netdata v1.36.1
Configure options: ‘–build=x86_64-linux-gnu’ ‘–includedir=${prefix}/include’ ‘–mandir=${prefix}/share/man’ ‘–infodir=${prefix}/share/info’ ‘–disable-silent-rules’ ‘–libdir=${prefix}/lib/x86_64-linux-gnu’ ‘–libexecdir=${prefix}/lib/x86_64-linux-gnu’ ‘–disable-maintainer-mode’ ‘–prefix=/usr’ ‘–sysconfdir=/etc’ ‘–localstatedir=/var’ ‘–libdir=/usr/lib’ ‘–libexecdir=/usr/libexec’ ‘–with-user=netdata’ ‘–with-math’ ‘–with-zlib’ ‘–with-webdir=/var/lib/netdata/www’ ‘–disable-dependency-tracking’ ‘build_alias=x86_64-linux-gnu’ ‘CFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security’ ‘LDFLAGS=-Wl,-z,relro’ ‘CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2’ ‘CXXFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security’
Install type: binpkg-deb
Binary architecture: x86_64
Packaging distro:
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK Next Generation: YES
ACLK-NG New Cloud Protocol: YES
ACLK Legacy: NO
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: NO
Libraries:
protobuf: YES (system)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: YES
EBPF: YES
IPMI: YES
NFACCT: YES
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES

Bump :frowning: I still can’t find out how to repair this. Maybe I have some dangling netdata files somewhere, corrupting new install. Any idea how to find and delete all old files?

@Tudvari sorry to hear that you are still experiencing , just to confirm:

@Austin_Hemmelgarn maybe you can help here?

Hi, I did it as Manolis suggested earlier:

i.e. using sudo dpkg --remove netdata & sudo apt-get purge netdata

Is the uninstall page you’ve suggestes better for my case?

Hi @tudvari, sorry for the long wait to reply…

Let’s start with the error.log. If netdata runs as the netdata user, can you check please that the directory /var/log/netdata has enough permissions for it to write? Can you also please try a touch /var/log/netdata/error.log ?

It seems in general that it goes into a crash loop…

Hi,

This is the netdata folder’s permissions:
drwxr-s--- 2 netdata netdata 4096 Aug 15 15:56 netdata

Hmm, I can’t even enter to the folder. Thus I could only touch an error.log file there as root.
And by the way I can’t switch to netdata user. But maybe that’s intended.

Ok, we need to make sure to uninstall netdata and also delete any old directories, so we can make sure that the re-install will re-create what it’s needed from the start.

So after the sudo dpkg --remove netdata & sudo apt-get purge netdata please make sure that the following directories are also removed. If not, please do so manually:

sudo rm -rf /var/log/netdata/
sudo rm -rf /var/lib/netdata/
sudo rm -rf /var/cache/netdata/
sudo rm -rf /etc/netdata/

Then try to re-install using the kickstart script, and we’ll try from there.

Thanks!

Hi, thanks for helping!

Maybe it could be a help that when I tried to just copy paste you remove and purge command, this was the answer:

[1] 16113
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

And then Removing netdata... but it never ended so I exited the process and entered the commands separatedly.

Then I ran all the other commands, and then ran the kickstart script. It was over rather quickly but it seemed like there were no errors.

–2022-09-22 17:15:17-- https://my-netdata.io/kickstart.sh
Resolving my-netdata.io (my-netdata.io)… 188.114.96.9, 188.114.97.9, 2a06:98c1:3120::9, …
Connecting to my-netdata.io (my-netdata.io)|188.114.96.9|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘/tmp/netdata-kickstart.sh’

/tmp/netdata-kickstart.sh [ <=> ] 75.12K --.-KB/s in 0.02s

2022-09-22 17:15:18 (4.29 MB/s) - ‘/tmp/netdata-kickstart.sh’ saved [76921]

— Using /tmp/netdata-kickstart-MYVpYrAPCl as a temporary directory. —
— Checking for existing installations of Netdata… —
— No existing installations of netdata found, assuming this is a fresh install. —
— Attempting to install using native packages… —
— Repository configuration is already present, attempting to install netdata. —
[/tmp/netdata-kickstart-MYVpYrAPCl]$ sudo env apt-get install netdata
Reading package lists… Done
Building dependency tree
Reading state information… Done
The following packages were automatically installed and are no longer required:
fonts-font-awesome fonts-glyphicons-halflings freeipmi-common libbrotli1 libc-ares2 libfreeipmi17 libipmimonitoring6
libjs-bootstrap libnode64 nodejs nodejs-doc python3-yaml
Use ‘sudo apt autoremove’ to remove them.
The following NEW packages will be installed:
netdata
0 upgraded, 1 newly installed, 0 to remove and 134 not upgraded.
Need to get 0 B/23.3 MB of archives.
After this operation, 117 MB of additional disk space will be used.
Selecting previously unselected package netdata.
(Reading database … 80186 files and directories currently installed.)
Preparing to unpack …/netdata_1.36.1_amd64.deb …
Unpacking netdata (1.36.1) …
Setting up netdata (1.36.1) …
Created symlink /etc/systemd/system/multi-user.target.wants/netdata.service → /lib/systemd/system/netdata.service.
Processing triggers for systemd (241-7~deb10u7) …
OK

Thu 22 Sep 2022 05:15:23 PM CEST : INFO: netdata-updater.sh: Auto-updating has been ENABLED through cron, updater script linked to /etc/cron.daily/netdata-updater

Thu 22 Sep 2022 05:15:23 PM CEST : INFO: netdata-updater.sh: If the update process fails and you have email notifications set up correctly for cron on this system, you should receive an email notification of the failure.
Thu 22 Sep 2022 05:15:23 PM CEST : INFO: netdata-updater.sh: Successful updates will not send an email.
Successfully installed the Netdata Agent.

Official documentation can be found online at Getting started with Netdata | Learn Netdata.

Looking to monitor all of your infrastructure with Netdata? Check out Netdata Cloud at https://app.netdata.cloud.

Join our community and connect with us on:

After that I checked the service status and it still fails.

admin@backup-hu:~$ sudo systemctl status netdata
● netdata.service - Real time performance monitoring
Loaded: loaded (/lib/systemd/system/netdata.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Thu 2022-09-22 17:15:23 CEST; 23s ago
Process: 29335 ExecStart=/usr/sbin/netdata -D $EXTRA_OPTS (code=exited, status=1/FAILURE)
Main PID: 29335 (code=exited, status=1/FAILURE)

Do you have now anything in /var/log/netdata/error.log ? I think still the system is in a somewhat strange state (perhaps rebooting could clear some temporary locks from dpkg)…

If you don’t still have an error.log, one try is to execute netdata manually, and see if it complains about something… Stop the service (sudo systemctl stop netdata), then try to log as user netdata ( sudo su -s /bin/bash netdata), then try to execute the netdata binary (/usr/sbin/netdata -D) and see if there is any output…

No files in the netdata folder. I tried rebooting between removing and installing netdata, but I’ll check it again.

I manually started netdata, and the output is:

2022-09-23 11:07:30: netdata INFO : MAIN : CONFIG: cannot load cloud config ‘/var/lib/netdata/cloud.d/cloud.conf’. Running with internal defaults.
2022-09-23 11:07:30: netdata FATAL : netdata : Cannot change directory to ‘/opt/netdata’ # : No such file or directory
2022-09-23 11:07:30: netdata INFO : MAIN : EXIT: netdata prepares to exit with code 1…
2022-09-23 11:07:30: netdata INFO : MAIN : EXIT: cleaning up the database…
2022-09-23 11:07:30: netdata INFO : MAIN : Cleaning up database [0 hosts(s)]…
2022-09-23 11:07:30: netdata INFO : MAIN : EXIT: all done - netdata is now exiting - bye bye…

/opt/netdata → This means that at some point at least, a static package was installed on the system (that is not a package from the package manager). So there might be some leftovers from it.

You did delete /etc/netdata right? Can you please send me the netdata.conf from there please to manolis@netdata.cloud?

1 Like