Just wanted to check in on one last thing. I’m using the following settings on the parent:
[db]
dbengine multihost disk space MB = 500000
dbengine tier 1 multihost disk space MB = 100000
dbengine tier 2 multihost disk space MB = 125000
replication threads = 5
The load seems to be quite high and I have 48 children sending data to the parent. I read online we should expect much more children<>parent ratio, but just trying to understand how the performance works.
11:51:53 up 19 days, 3:30, 2 users, load average: 11.97, 6.74, 4.34
root@netdata:/etc/netdata# top cd1
top - 11:51:57 up 19 days, 3:30, 2 users, load average: 12.05, 6.84, 4.38
Tasks: 246 total, 1 running, 245 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 3.5 sy, 17.0 ni, 79.1 id, 0.1 wa, 0.0 hi, 0.2 si, 0.0 st
Linux 5.15.0-91-generic (netdata) 01/19/24 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.01 5.82 1.31 0.03 0.00 92.85
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
dm-0 83.54 2226.13 0.00 0.00 0.11 26.65 33.29 357.97 0.00 0.00 0.12 10.75 0.01 564.26 0.00 0.00 0.40 55278.05 0.00 0.00 0.01 3.95
loop0 0.00 0.00 0.00 0.00 0.07 9.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop1 0.00 0.00 0.00 0.00 0.24 17.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop2 0.00 0.01 0.00 0.00 0.08 37.46 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop3 0.00 0.01 0.00 0.00 0.05 34.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop4 0.00 0.00 0.00 0.00 0.00 6.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop5 0.00 0.00 0.00 0.00 0.00 1.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme0n1 83.54 2226.17 0.00 0.00 0.10 26.65 31.68 357.97 1.71 5.14 0.16 11.30 0.01 565.20 0.00 0.00 0.40 55360.06 0.00 0.00 0.01 3.95
One thing I noticed, is we’re alerting on basically everything even if it’s not defined in health.d. Is there a way to have the reverse and maybe it’ll help us on resource consumption?
Basically, only alert on what we want to alert on.
Thanks!