I'm receiving false positive for IPMI Sensors State

Hi,

I have encountered something peculiar. For the past few weeks. I have received some Critical and Warning Notification about IPMI_SENSORS_STATE. In which, saying that my fans, my PSU and powerunit is in critical state. Yet when I contacted my server’s provider and checking via IRMC it’s clear and there’s no problem at all.

Is there any ways I can either refresh this notifications/metrics? or Maybe somehow solve it?

Thank you,

I attach some screenshot of the server.

Btw below are my configuration/specifications:

Bare Metal/Dedicated Server
Intel Xeon E-2236
32 GB RAM
2x500 GB NVMe SSD
1 Gbps Port Uplink
Almalinux 8
ISPManager 6 Control Panel
Netdata version : netdata v1.40.0-116-nightly

Hi, @Infinix_Media. Netdata collects the state (nominal, warning, or critical) of each sensor.

If you go to the “IPMI Sensor State” chart and see the total number of sensors in each state.

In order to find which sensors are in the critical state change “Group by” to “sensor” and select “critical” in “dimensions”.

You can double-check the correctness by executing ipmimonitoring on the host system, e.g.

# ipmimonitoring
ID | Name           | Type                   | State    | Reading    | Units | Event
4  | Watchdog       | Watchdog 2             | Nominal  | N/A        | N/A   | 'OK'
5  | SEL            | Event Logging Disabled | Nominal  | N/A        | N/A   | 'Log Area Reset/Cleared'
16 | CPU0_Status    | Processor              | Nominal  | N/A        | N/A   | 'Processor Presence detected'
17 | CPU1_Status    | Processor              | Nominal  | N/A        | N/A   | 'Processor Presence detected'
18 | CPU0_TEMP      | Temperature            | Nominal  | 62.00      | C     | 'OK'
19 | CPU1_TEMP      | Temperature            | Nominal  | 56.00      | C     | 'OK'
...

Thanks for the response.

Yes, I have checked via ipmitools. there are some critical. But if I check manually via IRMC the server’s web console and monitoring. Everything is A-OK and have no problem at all. The technician and Engineer also said, the server is nominal.

Which one do you think the problem is? Is it the ipmimonitoring, netdata or the server itself?

I checked the code and found a bug - we set the sensor state value the first time the sensor appears and never update it after that. I made a PR that fixes the problem.

However, it it is not your case because:

I have checked via ipmitools. there are some critical.

Netdata uses libipmimonitoring to gather data. And reports whatever it reports. So it is not a Netdata bug if ipmitools says there are some in critical.

there are some critical

Should be exactly the same sensors as reported by Netdata.

Thanks!

Ah, glad this threads help found some bugs.

I’ll try again with my engineer to look at it physically. Since the iRMC not showing any error or abnormalities.

I’ll update again later if we found something.