Suddenly receiving critical "10min_dbengine_global_fs_errors" alerts across multiple CentOS 8 Stream KVM nodes

EB68 · June 17, 2023, 2:19am

Hi all,

I run several CentOS 8 Stream KVM servers with a variety of production VMs on them. I run netdata on the KVM nodes themselves. Earlier this afternoon, the netdata instances on all of my KVM nodes began to raise critical “10min_dbengine_global_fs_errors” alerts.

I checked the netdata error.log on each KVM node and found that they are flooded with this error message:

2023-06-16 18:55:33: netdata ERROR : LIBUV_WORKER : DBENGINE: error while reading extent from datafile 4037 of tier 0, at offset 2809856 (43348 bytes) to extract page (PD) from 1686945705 (2023-06-16 13:01:45) to 1686946728 (2023-06-16 13:18:48) of metric 8dbcd724-ac64-48f5-a556-de6b5542388a: header is INVALID (errno 22, Invalid argument)

I tried restarting netdata, but it didn’t help. I then upgraded netdata to the latest version, but that didn’t help either.

I thought the errors might be related to hitting some kind of max open file limit, but that doesn’t seem to be the cause either:

[root@sea4 ~]# cat /proc/sys/fs/file-nr
2960 0 6514653
[root@sea4 ~]#

I’m not an expert on file system limits, so it’s possible I’m not using the correct command to investigate possible filesystem limits…

Could anyone recommend some next steps to troubleshoot and identify the root cause of this issue? It’s a little concerning that this started happening to all of my production servers at once and in the absence of any upgrades, deployments, etc.

Thank you!

Manolis_Vasilakis · June 19, 2023, 12:07pm

Hi @EB68

What version of netdata are you running? Is this a new, old install, any recent updates?

EB68 · June 23, 2023, 9:52pm

This ended up being a bug with whatever Netdata version was released on June 15th/16th nightly edge build.

The issue cleared itself up uniformly across all of my servers after Netdata processed an automatic upgrade.

Not sure what it was, but it definitely appears to have been something internal to Netdata and not a filesystem issue.

Topic		Replies	Views
Updated my servers yesterday to latest version and suddenly overwhelmed with 10min_dbengine_global_fs_errors Help agent	0	192	September 15, 2023
10min_dbengine_global_fs_errors Alerts	4	6651	February 16, 2023
Error netdata.dbengine_long_term_page_stats streaming / dbengine storage tiers Help agent , alerts	4	349	July 18, 2023
Agent monitoring, 10min_dbengine_global_fs_errors alert with low number of errors Help agent	11	1161	January 31, 2022
10min_dbengine_global_io_errors Alerts	20	5537	November 22, 2023

Suddenly receiving critical "10min_dbengine_global_fs_errors" alerts across multiple CentOS 8 Stream KVM nodes

Related topics