I recieve frequent alerts about very high temperature of CPU and NVMe (120-130 Degrees in Celsius) from lm-sensors.
But I don’t watch such a high temperatures in graphs… and I think Netdata alert is summing individual temperatures values (f.e. AMD CPU and NVMe each has two or even three ones) instead of using max of them – why? What a phisical meaning of temperatures summing?
I would prefer to see all individual temperatures values in graph by default. And I can’t get Netdata alerting on max of temperature even editing alert lookup condition to max instead of average. May be bug is there.
Please help me fix it or explain why that very useful thing not working.
@Ansy
There are two easy checks to make.
Firstly, double-check the stats coming out of lm-sensors (assuming you have that installed) to verify if this is a sensors or netdata issue.
Secondly, assuming that you have access to Netdata cloud, create a new room and move only one of the the affected nodes into that room. Then have a look at the metrics from within that room and note whether you see the same problematic behaviour.