load1 > 40. No charts can explain it

Problem/Question

Server load sometimes reaches insane amount. No cpu/ram or other bottle neck. At least none which I could find via netdata. Should I completely ignore load?

Hi, @ben. How many CPU cores are in your system? It could be not that, as you say, insane.

Load average measures the number of threads that are currently working and those waiting to work (CPU, disk, uninterruptible locks). It doesn’t always indicate a problem. It doesn’t say much (who knows if that is a CPU or IO-bound problem).

Do you have CPU/memory/disk pressure charts? If you want to check if your server is stalled on CPU/memory/IO - always prefer pressure charts.

Thank you for reply.
AMD Ryzen 9 5950X 16-Core Processor

Attaching new screenshots from today:
Notice how load doesn’t correlate with all pressure graphs. Or am I wrong on that?






System pressure shows the percentage of time some processes (some_pressure) or all processes (full_pressure) and the amount of time (*stall_time charts) have been waiting due to CPU, memory, or I/O (storage) congestion.

Click on “Information” to see the description

If you see high congestion due to CPU - check CPU% charts and Applications cpu.
If you see high congestion due to memory - check ram% charts and Applications mem.

Yes I understand this. But none of pressure graphs match load. In my first screenshot load is > 40 for all selected period. However pressures are raising only for specific parts of that period.
Basically I can’t find any graph that would raise same time and same duration as load.
Or in other words if this load would be related to any of the pressure, once pressure goes down load should too. But it stays same.

Speaking from experience, because of how the load average is calculated, you can sometimes see behavior like this when the system is seeing lots of very short-lived processes created one after the other. In general, such a workload will also show a similar spike in the number of new processes (system.forks) and context switches (system.ctxt), though depending on what those short-lived processes are doing you may not see any associated spike in CPU usage (or at least, not one anywhere near as pronounced as the spike in load average) or the PSI metrics (because there may just be no actual resource contention involved).

@Austin_Hemmelgarn Thank you for insight. I had similar experience before but I think it doesn’t apply here:



I’m also including zoomed out charts. Just for reference



@ben, as I said

Load average measures the number of threads that are currently working and those waiting to work (CPU, disk, uninterruptible locks). It doesn’t always indicate a problem. It doesn’t say much (who knows if that is a CPU or IO-bound problem).

  • See the number of processes - it explains (" No charts can explain it").
  • Keep in mind that the number of processes is the current value, load avg and pressure - trends over N-seconds windows.