Since a few days ago our central netdata metrics collector have started to cause cpu traps. It runs as a docker container.
The host syslog error message is:
Jun 9 06:29:59 netdata01 kernel: [156866.969743] traps: WEB_SERVER[stat general protection ip:7fe2b2ff22d9 sp:7fe2b03c8878 error:0 in ld-musl-x86_64.so.1[7fe2b2feb000+47000]
The error happens approximately every 7 minute. Sometimes every 1 min. And sometimes as long as 12 minutes in between crashed.
I have tried version 1.26.0, 1.30.1 and 1.31.0 and each have precisely same error.
When running as a native systemd application directly on host Ubuntu, the error message in syslog is:
Jun 10 08:53:54 netdata01 kernel: [62290.509176] WEB_SERVER[stat: segfault at 55570589ba34 ip 000055570589ba34 sp 00007f9ca5486290 error 15
In both cases no crashdump information is available in apport. Just messages list this:
ERROR: apport (pid 26217) Wed Jun 9 06:29:59 2021: host pid 24140 crashed in a container without apport support
ERROR: apport (pid 24822) Thu Jun 10 08:53:54 2021: host pid 23366 crashed in a separate mount namespace, ignoring
The Ubuntu host is a VM running on vmware. I have tried to migrate the VM to different physical hardware, and the errors persist regardless of underlying hardware.
What can I do to identify cause?
Any suggestions for workaround?