Cpu traps / segfault

Since a few days ago our central netdata metrics collector have started to cause cpu traps. It runs as a docker container.

The host syslog error message is:

Jun 9 06:29:59 netdata01 kernel: [156866.969743] traps: WEB_SERVER[stat[24373] general protection ip:7fe2b2ff22d9 sp:7fe2b03c8878 error:0 in ld-musl-x86_64.so.1[7fe2b2feb000+47000]

The error happens approximately every 7 minute. Sometimes every 1 min. And sometimes as long as 12 minutes in between crashed.

I have tried version 1.26.0, 1.30.1 and 1.31.0 and each have precisely same error.

When running as a native systemd application directly on host Ubuntu, the error message in syslog is:

Jun 10 08:53:54 netdata01 kernel: [62290.509176] WEB_SERVER[stat[23533]: segfault at 55570589ba34 ip 000055570589ba34 sp 00007f9ca5486290 error 15

In both cases no crashdump information is available in apport. Just messages list this:

ERROR: apport (pid 26217) Wed Jun 9 06:29:59 2021: host pid 24140 crashed in a container without apport support
ERROR: apport (pid 24822) Thu Jun 10 08:53:54 2021: host pid 23366 crashed in a separate mount namespace, ignoring

The Ubuntu host is a VM running on vmware. I have tried to migrate the VM to different physical hardware, and the errors persist regardless of underlying hardware.

What can I do to identify cause?
Any suggestions for workaround?

1 Like

Hi @sdo and welcome back!

I have contacted our engineers and we will see this shortly. It sounds indeed disturbing. We are sorry that you experience this.

We will get it resolved :muscle:

Hi @sdo, thanks for reporting this!

I have a couple questions mostly meant to understand a little bit better what your setup looks like:

  • Which installation method did you use to install Netdata?
  • What is the output of cat /etc/lsb-release /etc/os-release and netdata -W buildinfo?
  • Is it possible to provide the contents of netdata’s error.log?

Finally, there was a similar issue opened on Github a while ago. It might be worthwhile to follow the steps mentioned in that issue to rule out this one being a duplicate.

1 Like

Hi @sdo,

I’m curious if there’s any update on this issue. Is the Netdata agent still crashing or did you manage to fix the problem in some other way?