Why summing temperatures?

I recieve frequent alerts about very high temperature of CPU and NVMe (120-130 Degrees in Celsius) from lm-sensors.

But I don’t watch such a high temperatures in graphs… and I think Netdata alert is summing individual temperatures values (f.e. AMD CPU and NVMe each has two or even three ones) instead of using max of them – why? What a phisical meaning of temperatures summing?

I would prefer to see all individual temperatures values in graph by default. And I can’t get Netdata alerting on max of temperature even editing alert lookup condition to max instead of average. May be bug is there.

Please help me fix it or explain why that very useful thing not working.

Thanks!

Hi @Ansy ,

Thanks for raising this to our attention.

Indeed summing temperatures shouldn’t be a way to represent such metrics.

Could help by opening a bug report adding this and the alert that you get triggered?
You can use this template.

Thanks,
Hugo

@Ansy Do you also have a kubernetes deployment of netdata hosted on those nodes?

No, I’ve only Proxmox hosts monitored by Netdata, Intel and AMD-based, LXC and QEMU VMs, some of them with windows_exporter installed.

@Ansy
There are two easy checks to make.
Firstly, double-check the stats coming out of lm-sensors (assuming you have that installed) to verify if this is a sensors or netdata issue.
Secondly, assuming that you have access to Netdata cloud, create a new room and move only one of the the affected nodes into that room. Then have a look at the metrics from within that room and note whether you see the same problematic behaviour.