All posts by Quentin Santos

On-Die ECC

This article will be pretty short.

When I built my new desktop computer, I considered ECC memory. So, I looked around for DDR5 ECC memory.

Surprisingly, DDR5 memory sticks that mentioned ECC was not significantly more expensive than other DDR5 memory sticks. Sometimes, they were even cheaper! At some point, I read that all DDR5 memory sticks supported ECC. But, also, not really.

It turns out that “ECC memory” has now been overloaded to mean two subtly different concepts:

  1. Side-band ECC: The memory stick stores, and sends some additional bits of information along with the data. These bits then travel along additional, dedicated, wires to the CPU, which finally use them to detect and correct any errors in the data.
  2. On-die ECC: The memory stick store some additional bits of information, and use it to detect and correct any errors in the data. The corrected data is sent on its usual wires to the CPU; no additional wires are needed.

The point of ECC is to protect data from random bit flips while it is stored in RAM, travels along the bus, or is handled by the memory controller1. In the case of one-die ECC, errors in the bus or in the memory controller would not be detected nor corrected.

A secondary consequence of this is that you get no report of corrected and detected errors. With side-band ECC, these reports can let you know when your memory stick starts to become less unreliable.

In short, you do not have the choice about on-die ECC, so you don’t have to think about. And side-band ECC still allow you to detect bad sticks, so it can be worth it depending on the additional cost.

  1. CPU caches have their own ECC. ↩︎