dbengine directory growing despite configured disk usage limits

Problem/Question

I have a similar issue to this. Our netdata instances seem to not respect settings from netdata.conf. We run netdata v1.36.1 on tens of CentOS Linux release 7.9.2009 (Core) nodes.

[root@host netdata]# du -h | sort -hr | head
1.5G .
1.2G ./netdata
886M ./netdata/dbengine

After a restart of the service, disk usage drops but climbs back up later.

[root@host netdata]# du -h
248M ./dbengine
502M .

The netdata-meta.db file stays under 250M, while the dbengine directory grows to fill the /var partition.

Our /etc/netdata/netdata.conf:

[global]
    ...
    dbengine disk space = 256

[db]
    dbengine multihost disk space MB = 256

[root@host netdata]# curl -s localhost:19999/netdata.conf
...
[global]
        run as user = netdata
        process scheduling policy = idle
        OOM score = 1000
        # glibc malloc arena max for plugins = 1
        # glibc malloc arena max for netdata = 1
        # libuv worker threads = 16
        # hostname = host
        # host access prefix =
        # enable metric correlations = yes
        # metric correlations method = ks2
        # timezone = Europe/Amsterdam
        # pthread stack size = 8388608

[db]
        dbengine disk space MB = 256

        # option 'retention' is not used.
        retention = 3600
        dbengine multihost disk space MB = 256
        # update every = 1
        # mode = dbengine
        # dbengine page cache with malloc = no
        # dbengine page cache size MB = 32
        # memory deduplication (ksm) = yes
        # cleanup obsolete charts after secs = 3600
        # gap when lost iterations above = 1
        # storage tiers = 1
        # dbengine page fetch timeout secs = 3
        # dbengine page fetch retries = 3
        # dbengine page descriptors in file mapped memory = no
        # cleanup orphan hosts after secs = 3600
        # delete obsolete charts files = yes
        # delete orphan hosts files = yes
        # dbengine pages per extent = 64
        # enable zero metrics = no
...

Looking in /var/log/netdata/error.log, housekeeping is being done:

2023-01-03 10:03:15: netdata INFO  : MAIN : Deleting data file "/var/cache/netdata/dbengine/datafile-1-0000000501.ndf".
2023-01-03 10:03:15: netdata INFO  : MAIN : Deleting data and journal file pair.
2023-01-03 10:03:15: netdata INFO  : MAIN : Deleted journal file "/var/cache/netdata/dbengine/journalfile-1-0000000501.njf".
2023-01-03 10:03:15: netdata INFO  : MAIN : Deleted data file "/var/cache/netdata/dbengine/datafile-1-0000000501.ndf".
2023-01-03 10:03:15: netdata INFO  : MAIN : Reclaimed 14479360 bytes of disk space.

The pattern seems to be that after a restart of the netdata service, the size of the dbengine directory remains the same for about a day, and then keeps growing. I found no cron jobs that interfere with netdata housekeeping.

Are we setting the limit settings in netdata.conf correctly?

Relevant docs you followed/actions you took to solve the issue

Tried following combinations of settings with same result, based on

1: [global] dbengine disk space = 256 [db] empty
2: [global] empty [db] dbengine multihost disk space MB = 256
3: [global] dbengine disk space = 256 [db] dbengine multihost disk space MB = 256

Environment/Browser/Agent’s version etc

[root@host netdata]# netdata -V
netdata v1.36.1
[root@host netdata]# which netdatacli
/sbin/netdatacli
[root@host netdata]# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 3
Claimed: No
Online: No
Reconnect count: 0
Banned By Cloud: No

[root@host netdata]# netdata -v
netdata v1.36.1
[root@host netdata]# netdata -W buildinfo
Version: netdata v1.36.1
Configure options:  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-plugin-freeipmi' '--with-bundled-protobuf' '--with-zlib' '--with-math' '--with-user=netdata' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
Install type: custom
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES
    ACLK Next Generation:       YES
    ACLK-NG New Cloud Protocol: YES
    ACLK Legacy:                NO
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         NO
Libraries:
    protobuf:                YES (bundled)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  YES
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    NO
    IPMI:                    YES
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES
[root@host netdata]# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Protobuf
Protocol Used: Protobuf
MQTT Version: 3
Claimed: No
Online: No
Reconnect count: 0
Banned By Cloud: No

[root@host netdata]# uname -a
Linux host 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

What I expected to happen

Housekeeping keeps the /var/cache/netdata/dbengine directory at a maximum size of the configuration parameter; 256M.

Hi @svdb thanks for the report.

Indeed the limit there can be exceeded if the metrics collected are a lot…

Can you please post a screenshot of the chart Netdata dbengine page cache statistics (netdata.page_cache_stats) ? Esp. interested for the dimension used_by_collectors.

Can you also perhaps update one of those agents to 1.37.1 and observe it’s behaviour?

Thanks

Hi Manolis, thanks for the reply.

I’ve attached screenshots of the metrics you requested, plus the Zabbix disk space graph for the /var partition. These nodes unfortunately only have about a day’s worth of metrics. I’m upgrading one of the nodes to 1.37.1 and will report the findings over the weekend.



]# tree -hsugpDQ /var/cache/netdata/
"/var/cache/netdata/"
β”œβ”€β”€ [-rw-r----- netdata  netdata  4.4M Jan  2  9:19]  "anomaly-detection.db"
β”œβ”€β”€ [-rw-r----- netdata  netdata   16K Nov 25 13:40]  "context-meta.db"
β”œβ”€β”€ [-rw-r----- netdata  netdata   32K Dec 28 20:11]  "context-meta.db-shm"
β”œβ”€β”€ [-rw-r----- netdata  netdata   72K Dec 28 20:11]  "context-meta.db-wal"
β”œβ”€β”€ [drwxrwx--- netdata  netdata   12K Jan  6  9:50]  "dbengine"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  2 15:06]  "datafile-1-0000000414.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  2 17:06]  "datafile-1-0000000415.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  2 19:39]  "datafile-1-0000000416.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  2 21:39]  "datafile-1-0000000417.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3  0:12]  "datafile-1-0000000418.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3  2:46]  "datafile-1-0000000419.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3  4:28]  "datafile-1-0000000420.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3  6:45]  "datafile-1-0000000421.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3  9:18]  "datafile-1-0000000422.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 11:35]  "datafile-1-0000000423.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 13:17]  "datafile-1-0000000424.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 15:17]  "datafile-1-0000000425.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 16:59]  "datafile-1-0000000426.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 18:59]  "datafile-1-0000000427.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 21:15]  "datafile-1-0000000428.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  3 23:32]  "datafile-1-0000000429.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4  1:48]  "datafile-1-0000000430.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4  4:05]  "datafile-1-0000000431.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4  5:48]  "datafile-1-0000000432.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4  7:47]  "datafile-1-0000000433.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4  9:46]  "datafile-1-0000000434.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 11:28]  "datafile-1-0000000435.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 13:45]  "datafile-1-0000000436.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 15:44]  "datafile-1-0000000437.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 17:44]  "datafile-1-0000000438.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 19:43]  "datafile-1-0000000439.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 22:00]  "datafile-1-0000000440.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  4 23:59]  "datafile-1-0000000441.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5  2:33]  "datafile-1-0000000442.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5  4:32]  "datafile-1-0000000443.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5  6:32]  "datafile-1-0000000444.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5  8:14]  "datafile-1-0000000445.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5  9:57]  "datafile-1-0000000446.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 11:56]  "datafile-1-0000000447.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 14:13]  "datafile-1-0000000448.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 16:12]  "datafile-1-0000000449.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 18:12]  "datafile-1-0000000450.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 20:11]  "datafile-1-0000000451.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  5 22:28]  "datafile-1-0000000452.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  6  0:28]  "datafile-1-0000000453.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  6  2:44]  "datafile-1-0000000454.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  6  4:43]  "datafile-1-0000000455.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  6  6:43]  "datafile-1-0000000456.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   13M Jan  6  8:59]  "datafile-1-0000000457.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata   12M Jan  6 10:49]  "datafile-1-0000000458.ndf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  2 15:06]  "journalfile-1-0000000414.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  2 17:06]  "journalfile-1-0000000415.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.1M Jan  2 19:39]  "journalfile-1-0000000416.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.5M Jan  2 21:39]  "journalfile-1-0000000417.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  3  0:12]  "journalfile-1-0000000418.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  3  2:46]  "journalfile-1-0000000419.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.4M Jan  3  4:28]  "journalfile-1-0000000420.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  3  6:45]  "journalfile-1-0000000421.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  3  9:18]  "journalfile-1-0000000422.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.1M Jan  3 11:35]  "journalfile-1-0000000423.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.6M Jan  3 13:17]  "journalfile-1-0000000424.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  960K Jan  3 15:17]  "journalfile-1-0000000425.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  896K Jan  3 16:59]  "journalfile-1-0000000426.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  3 18:59]  "journalfile-1-0000000427.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  3 21:15]  "journalfile-1-0000000428.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.9M Jan  3 23:32]  "journalfile-1-0000000429.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  4  1:48]  "journalfile-1-0000000430.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.1M Jan  4  4:05]  "journalfile-1-0000000431.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  4  5:48]  "journalfile-1-0000000432.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.9M Jan  4  7:47]  "journalfile-1-0000000433.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  4  9:46]  "journalfile-1-0000000434.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  896K Jan  4 11:28]  "journalfile-1-0000000435.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.1M Jan  4 13:45]  "journalfile-1-0000000436.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  3.2M Jan  4 15:44]  "journalfile-1-0000000437.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  960K Jan  4 17:44]  "journalfile-1-0000000438.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  4 19:43]  "journalfile-1-0000000439.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  4 22:00]  "journalfile-1-0000000440.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  3.3M Jan  4 23:59]  "journalfile-1-0000000441.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  5  2:33]  "journalfile-1-0000000442.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  5  4:32]  "journalfile-1-0000000443.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1020K Jan  5  6:32]  "journalfile-1-0000000444.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  3.3M Jan  5  8:14]  "journalfile-1-0000000445.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  908K Jan  5  9:57]  "journalfile-1-0000000446.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  976K Jan  5 11:56]  "journalfile-1-0000000447.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  5 14:13]  "journalfile-1-0000000448.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  3.6M Jan  5 16:12]  "journalfile-1-0000000449.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  992K Jan  5 18:12]  "journalfile-1-0000000450.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.0M Jan  5 20:11]  "journalfile-1-0000000451.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  5 22:28]  "journalfile-1-0000000452.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.7M Jan  6  0:28]  "journalfile-1-0000000453.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  2.1M Jan  6  2:44]  "journalfile-1-0000000454.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  972K Jan  6  4:43]  "journalfile-1-0000000455.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1016K Jan  6  6:43]  "journalfile-1-0000000456.njf"
β”‚   β”œβ”€β”€ [-rw------- netdata  netdata  1.2M Jan  6  8:59]  "journalfile-1-0000000457.njf"
β”‚   └── [-rw------- netdata  netdata  3.7M Jan  6 10:46]  "journalfile-1-0000000458.njf"
β”œβ”€β”€ [-rw-r----- netdata  netdata  134M Jan  6  5:22]  "netdata-meta.db"
β”œβ”€β”€ [-rw-r----- netdata  netdata   32K Jan  6  8:19]  "netdata-meta.db-shm"
└── [-rw-r----- netdata  netdata   15M Jan  6  7:40]  "netdata-meta.db-wal"

1 directory, 97 files

Here’s the node we upgraded:

[root@host ~]# cd /var/cache/netdata/
[root@host netdata]# netdata -v
netdata v1.37.1
[root@host netdata]# du -h
81M     ./dbengine-tier1
542M    ./dbengine
24K     ./dbengine-tier2
974M    .
[root@host netdata]# curl -s localhost:19999/netdata.conf | grep dbengine
        dbengine disk space MB = 256
        # mode = dbengine
        # dbengine page cache with malloc = yes
        # dbengine page cache size MB = 32
        # dbengine multihost disk space MB = 256
        # dbengine page fetch timeout secs = 3
        # dbengine page fetch retries = 3
        # dbengine page descriptors in file mapped memory = no
        # dbengine tier 1 page cache size MB = 16
        # dbengine tier 1 multihost disk space MB = 128
        # dbengine tier 1 update every iterations = 60
        # dbengine tier 1 backfill = new
        # dbengine tier 2 page cache size MB = 8
        # dbengine tier 2 multihost disk space MB = 64
        # dbengine tier 2 update every iterations = 60
        # dbengine tier 2 backfill = new
        # dbengine pages per extent = 64

Since the upgrade to 1.37.1, disk usage of the /var partition keeps increasing like this:


@Manolis_Vasilakis is there any other information you need?

@ilyam8 could you possibly take a peek at this?

Hi, @svdb. There have been really a lot of changes in dbengine since v1.37.1 (can say it was rewritten). I think you either try the latest nightly or wait for v1.38.0 (which will happen early next week).

Thanks for the reply @ilyam8! I’ll wait for the release of v1.38.0 and report back with results.

We found the problem, fixed in DBENGINE v2 - bug fixes by ktsaou Β· Pull Request #14413 Β· netdata/netdata Β· GitHub.

1 Like

That’s great! I’ll still report back to confirm whether this solves the issue.