10min_dbengine_global_fs_errors

10min_dbengine_global_fs_errors

Netdata | DB engine

The Database Engine works like a traditional database. It dedicates a certain amount of RAM to data caching and indexing, while the rest of the data resides compressed on disk. Unlike other memory modes, the amount of historical metrics stored is based on the amount of disk space you allocate and the effective compression ratio, not a fixed number of metrics collected.

By using both RAM and disk space, the database engine allows for long-term storage of per-second metrics inside of the agent itself.

The Netdata Agent monitors the number of filesystem errors in the last 10 minutes. The Dbengine is experiencing filesystem errors (too many open files, wrong permissions, etc.).

This alert is triggered in warning state when the number of filesystem errors is greater than 0.

See more about DB engine

I had two nodes with this alert activated. I checked the /var/log/netdata/error.log on the first node and saw lines similar to this at the end (these are from the second node):

2023-01-15 01:20:43: netdata ERROR : MAIN : Failed to open file "/var/cache/netdata/dbengine/datafile-1-0000000510.ndf". (errno 24, Too many open files)
2023-01-15 01:20:44: netdata INFO  : MAIN : Creating new data and journal files in path /var/cache/netdata/dbengine-tier1
2023-01-15 01:20:44: netdata LOG FLOOD PROTECTION too many logs (201 logs in 36 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1164 seconds.

The first two lines had been repeated many times. Although these nodes are currently idle, so I don’t understand what the problem is. When i checked the number of open files, this was the result:

# cat /proc/sys/fs/file-nr
0	0	4194304
# 

Zero open files? That doesn’t make sense.
I thought I’d just restart the netdata service, but shortly after that, the node has become unresponsive! There isn’t even any response on the console! So I’ll have to power cycle it I guess.

The other node is still responsive but it also returns the same 0 number of open files. Also here are the latest files in /var/log/netdata:

-rw-r--r-- 1 netdata netdata 3.9K Jan 13 03:05 health.log-20230113.gz
-rw-r--r-- 1 netdata netdata 4.8K Jan 13 03:05 access.log-20230113.gz
-rw-r--r-- 1 netdata netdata 138K Jan 13 03:24 error.log-20230113.gz
-rw-r--r-- 1 netdata netdata    0 Jan 14 03:19 error.log
-rw-r--r-- 1 netdata netdata    0 Jan 14 03:19 access.log
-rw-r--r-- 1 netdata netdata    0 Jan 14 03:19 health.log
-rw-r--r-- 1 netdata netdata 3.2M Jan 15 01:20 error.log-20230114
-rw-r--r-- 1 netdata netdata 401K Jan 15 01:29 health.log-20230114
-rw-r--r-- 1 netdata netdata 207K Jan 15 01:35 access.log-20230114

Isn’t it strange that the log files from yesterday are all 0 bytes and don’t have the date prepended to the filename, but the latest log files have data in them but have the date prepended? Usually the current logs don’t have the date prepended… so what the heck is going on here?

Hello @AGI-chandler, could you please provide me with some information? Please share your netdata -W buildinfo for the nodes in question and the OS of your system (plus the deployment option, for instance, if it’s a Netdata container)

Gosh dude I can’t remember, let me see if I can dig that up… For some reason, the last emails I have regarding 10min_dbengine_global_fs_errors are from 12/22… way before this… but if it became unresponsive, then that leaves just 1. Well there may have been something bigger going on, because I posted about another problem I had later as well. Since the reboot, they both have been fine… Yes it’s coming back to me, so it must be these 2 nodes, currently with 8d20h and 9d20h uptimes: both running CentOS Linux release 8.4.2105 and both deployed using the kickstart.sh script. Thanks

$ netdata -W buildinfo
Version: netdata v1.37.1
Configure options:  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--datadir=/usr/share' '--includedir=/usr/include' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--disable-dependency-tracking' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CXXFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'PKG_CONFIG_PATH=:/usr/lib/pkgconfig:/usr/share/pkgconfig'
Install type: binpkg-rpm
    Binary architecture: x86_64
    Packaging distro:  
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES 
    ACLK:                       YES
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         NO
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    YES
    EBPF:                    YES
    IPMI:                    YES
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES
Debug/Developer Features:
    Trace Allocations:       NO
$ 
$ netdata -W buildinfo
Version: netdata v1.37.1
Configure options:  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--datadir=/usr/share' '--includedir=/usr/include' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--disable-dependency-tracking' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CXXFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'PKG_CONFIG_PATH=:/usr/lib/pkgconfig:/usr/share/pkgconfig'
Install type: binpkg-rpm
    Binary architecture: x86_64
    Packaging distro:  
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES 
    ACLK:                       YES
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         NO
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    YES
    EBPF:                    YES
    IPMI:                    YES
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES
Debug/Developer Features:
    Trace Allocations:       NO
$ 

I’m seeing a similar issue on a ProxMox host I have Netdata on. I get these errors routinely, all day, that error and clear rapidly with a high count of “3”. Trying to get a better understanding of how to manage them and if I should raise the threshold to reduce the warnings as I’m not experiencing any other adverse system issue.

netdata -W buildinfo

Version: netdata v1.38.0-81-g3c4676c9b
Configure options: ‘–prefix=/opt/netdata/usr’ ‘–sysconfdir=/opt/netdata/etc’ ‘–localstatedir=/opt/netdata/var’ ‘–libexecdir=/opt/netdata/usr/libexec’ ‘–libdir=/opt/netdata/usr/lib’ ‘–with-zlib’ ‘–with-math’ ‘–with-user=netdata’ ‘–enable-cloud’ ‘–without-bundled-protobuf’ ‘–disable-dependency-tracking’ ‘CFLAGS=-static -O2 -I/openssl-static/include -pipe’ ‘LDFLAGS=-static -L/openssl-static/lib’ ‘PKG_CONFIG_PATH=/openssl-static/lib/pkgconfig’
Install type: kickstart-static
Binary architecture: x86_64
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK: YES
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: YES
Libraries:
protobuf: YES (system)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: YES
IPMI: NO
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES
Debug/Developer Features:
Trace Allocations: NO