Netdata Community



OS: Linux

Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption cannot be tolerated under any circumstances, like industrial control applications, critical databases, and infrastructural memory caches. 1

“Correctable errors are generally single-bit errors that the system or the built-in ECC mechanism can correct. These errors do not cause system downtime of data corruption.” 2

Netdata agent monitors the number of ECC correctable errors in the last 10 minutes.

References and sources:
  1. ECC memory on wikipedia
  2. RAM types and ECC technologies
  3. memtester homepage

Troubleshooting section:

Verify a bad memory module

Correctable errors do not necessarily indicate hardware failures, but should generally still be investigated.

  1. memtester is a userspace utility for testing the memory subsystem for faults. It’s portable and
    should compile and work on any 32 or 64-bit Unix-like system. For hardware developers, memtester
    can be told to test memory starting at a particular physical address (memtester v4.1.0+). 3

You can also get this kind of errors by incorrect seating or improper contact between the socket and
RAM module. Check on both before consider replacing the RAM module.

Check for BIOS updates

You should check for critical BIOS updates on your hardware’s vendor support page.