Problem/Question
I have a similar issue to this. Our netdata instances seem to not respect settings from netdata.conf. We run netdata v1.36.1 on tens of CentOS Linux release 7.9.2009 (Core) nodes.
[root@host netdata]# du -h | sort -hr | head
1.5G .
1.2G ./netdata
886M ./netdata/dbengine
After a restart of the service, disk usage drops but climbs back up later.
[root@host netdata]# du -h
248M ./dbengine
502M .
The netdata-meta.db file stays under 250M, while the dbengine directory grows to fill the /var partition.
Our /etc/netdata/netdata.conf:
[global]
...
dbengine disk space = 256
[db]
dbengine multihost disk space MB = 256
[root@host netdata]# curl -s localhost:19999/netdata.conf
...
[global]
run as user = netdata
process scheduling policy = idle
OOM score = 1000
# glibc malloc arena max for plugins = 1
# glibc malloc arena max for netdata = 1
# libuv worker threads = 16
# hostname = host
# host access prefix =
# enable metric correlations = yes
# metric correlations method = ks2
# timezone = Europe/Amsterdam
# pthread stack size = 8388608
[db]
dbengine disk space MB = 256
# option 'retention' is not used.
retention = 3600
dbengine multihost disk space MB = 256
# update every = 1
# mode = dbengine
# dbengine page cache with malloc = no
# dbengine page cache size MB = 32
# memory deduplication (ksm) = yes
# cleanup obsolete charts after secs = 3600
# gap when lost iterations above = 1
# storage tiers = 1
# dbengine page fetch timeout secs = 3
# dbengine page fetch retries = 3
# dbengine page descriptors in file mapped memory = no
# cleanup orphan hosts after secs = 3600
# delete obsolete charts files = yes
# delete orphan hosts files = yes
# dbengine pages per extent = 64
# enable zero metrics = no
...
Looking in /var/log/netdata/error.log, housekeeping is being done:
2023-01-03 10:03:15: netdata INFO : MAIN : Deleting data file "/var/cache/netdata/dbengine/datafile-1-0000000501.ndf".
2023-01-03 10:03:15: netdata INFO : MAIN : Deleting data and journal file pair.
2023-01-03 10:03:15: netdata INFO : MAIN : Deleted journal file "/var/cache/netdata/dbengine/journalfile-1-0000000501.njf".
2023-01-03 10:03:15: netdata INFO : MAIN : Deleted data file "/var/cache/netdata/dbengine/datafile-1-0000000501.ndf".
2023-01-03 10:03:15: netdata INFO : MAIN : Reclaimed 14479360 bytes of disk space.
The pattern seems to be that after a restart of the netdata service, the size of the dbengine directory remains the same for about a day, and then keeps growing. I found no cron jobs that interfere with netdata housekeeping.
Are we setting the limit settings in netdata.conf correctly?
Relevant docs you followed/actions you took to solve the issue
Tried following combinations of settings with same result, based on
- netdata/README.md at master Β· netdata/netdata Β· GitHub
- Change how long Netdata stores metrics | Learn Netdata
- netdata/README.md at master Β· netdata/netdata Β· GitHub
1: [global] dbengine disk space = 256 [db] empty
2: [global] empty [db] dbengine multihost disk space MB = 256
3: [global] dbengine disk space = 256 [db] dbengine multihost disk space MB = 256
Environment/Browser/Agentβs version etc
[root@host netdata]# netdata -V
netdata v1.36.1
[root@host netdata]# which netdatacli
/sbin/netdatacli
[root@host netdata]# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 3
Claimed: No
Online: No
Reconnect count: 0
Banned By Cloud: No
[root@host netdata]# netdata -v
netdata v1.36.1
[root@host netdata]# netdata -W buildinfo
Version: netdata v1.36.1
Configure options: '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-plugin-freeipmi' '--with-bundled-protobuf' '--with-zlib' '--with-math' '--with-user=netdata' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
Install type: custom
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK Next Generation: YES
ACLK-NG New Cloud Protocol: YES
ACLK Legacy: NO
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: NO
Libraries:
protobuf: YES (bundled)
jemalloc: NO
JSON-C: YES
libcap: YES
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: NO
IPMI: YES
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES
[root@host netdata]# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 3
Claimed: No
Online: No
Reconnect count: 0
Banned By Cloud: No
[root@host netdata]# uname -a
Linux host 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
What I expected to happen
Housekeeping keeps the /var/cache/netdata/dbengine directory at a maximum size of the configuration parameter; 256M.