Category Archives: Programming

AMD CPUs, CCDs and CCXs

When researching for my article about Linux pipes and for other upcoming articles, I had to get more familiar with how my CPU works. I thought I knew quite a bit, with L1/L2/L3 caches, ISA extensions, branch predictors, register renaming and so much more. But, it turns out, even as a mere user, there is so much more to know1.

In particular, modern CPUs are exploring several strategies to increase the number of cores while keeping single-thread performance high. I was vaguely aware of P-cores and E-cores2 on Intel CPUs. For AMD CPUs, I encountered the concept of CCDs and CCXs. It can be confusing fast, especially when you end up reading articles that might be specific to either of the Zen generations, without explicitly saying so.

This article contains the information I was able to clarify. This is certainly not hardcore low-level content, and is probably obvious to anyone working at this level, but I am sure I am not the only one who has been confused by this.

Taken from AnandTech.

AMD CPUs

First things first, you might hear about “Ryzen”, “Zen”, “Zen 5”, “7800X3D” and so on. As with all marketing names, it gets confusing fast. So first, let’s clarify how AMD does naming.

First, “Ryzen” is a brand of AMD CPUs. A brand is a commercial name that is used to signal that the CPU is designed for a particular market or niche. It is distinct from the generation of CPU and reflects a set of trade-offs that were chosen when that CPU was released. Every generation will offer CPU models for each of these brands (until marketing changes the names or reassign its meaning).

The brands of AMD CPUs and corresponding target markets are:

  • Athlon, for budget computers. Confusingly, this used to be the high-end brand in the early 2000s. The equivalent Intel CPU brand is now Intel Processor, formerly Intel Pentium (which also used to be high-end) and Intel Celeron. These CPUs focus on being cheap.
  • Ryzen, for consumers. This is the one you will probably hear the most about in places like /r/buildapc/ or /r/pcmasterrace/. Intel’s equivalent is Intel Core. These CPUs focus on having (relatively) few high performance cores, which is usually good for gaming or compiling. Ryzen is further split into Ryzen 3, Ryzen 5, Ryzen 7 and Ryzen 9, to mimic Intel’s i3, i5, i7 and i9. They correspond respectively to the market segments of entry-level, mid-range, high-end and enthusiast.
  • Epyc, for servers. They’re not for gaming, but you might have heard about it on Hacker News. Intel’s equivalent is Intel Xeon. These CPUs focus on a large number of cores with decent performance, which makes them good for Web servers.
  • Ryzen Threadripper, or just Threadripper, for workstations. The full name can make it easy to confuse with Ryzen. Intel’s equivalent is Intel Xeon W (for Workstation). They’re not for gaming either, but you might have heard about it on Hacker News as well. These CPUs have many powerful cores, which are useful for video editing, physics simulation, scientific computing, etc.
  • Ryzen Embedded and EPYC Embedded, for embedded and low-power devices. Intel’s equivalent is Intel Atom. They focus on consuming little power.

“Zen 5” is a CPU core design, or microarchitecture. So are “Zen”, “Zen+”, “Zen 2”, “Zen 3”, “Zen 3+”, “Zen 4” and “Zen 4c”. Every one or two years, AMD iterates and improves the design of their CPU cores.

For instance, AMD offers various Zen 4 CPU models by putting a different number of such cores in the CPU, or different amounts of L1/L2/L3 cache, or by varying other “uncore” features. However, all the Zen 4 CPUs use the exact same cores.

Confusingly, “Zen” can also refer to the family of microarchitectures, grouping all the microarchitecture listed two paragraphs above. Before Zen, AMD had the Bulldozer microarchitecture family. And before that, the K-series.

Finally, the model names are purely marketing. They don’t cleanly map to a specific microarchitecture, but the first digit in the model is used to hint at how recent it is. Thus, the 4300G (4000 series) is older than the 9900X (9000 series)3. Within a series, the second digit hints at how powerful the CPU is. Thus, the 7900X should be more powerful than the 7600. Comparing between series is tricky, since an older, but more powerful CPU might still outclass a recent budget CPU.

CCDs

Modern CPUs can contain multiple chips (or die) on a single module (the thing you put in the socket on the motherboard). This is done to improve yields4. CCD (Core Complex Die) is AMD’s name for the dies in a multi-chip CPU. But first, let’s see why this is helpful.

A Zen 2 Epyc CPU. Each of the 8 smaller dies host 8 cores. Taken from Tom’s Hardware.

When producing a die, there is some chance that some part of the die will be defective. The probability goes up with smaller features and more complex dies. To work around this, it is common for manufacturers to disable parts of the die. This then allows them to bin the produced dies to minimize the waste.

Let’s imagine we are manufacturing some dies with 4 cores.

  • On some dies, all the cores are functional, so we package them and sell them as 4-core CPUs.
  • On many dies, two cores are defective, so we disable these cores, and package the dies and sell them as 2-core CPUs.
  • On many dies, one core is defective; to avoid having too many marketing products, we disable that core as well as a second one, and sell them as the same 2-core CPUs as above.
  • On some dies, three or four cores are defective, which is hard to sell, so we just discard them.

This approach increases the number of sellable dies from any given wafer, and thus how much money it makes. That is, the yield.

For instance, I used to own a Phenom II X2 550. But it turned out that it contained the same die as some Phenom II X4, but with two cores disabled. With some tweaking, I was able to enable them (it then appeared as a “Phenom II X4 B50”). One of the two initially-disabled cores did produce faults from time to time, but rarely enough that it could be usable, and I could just disable it at the software level.

Coming back to multi-chip modules, as the number of cores in CPUs increased, so did the complexity and dies. Even with chip binning, this limits the yield you can attain. So a new strategy was devised: split the chip on a CPU.

To get a 16 core CPU, you need 2 relatively simple dies with 8 working cores, instead of a single die with 16 working cores. Let’s say the probability of producing a working core is 95%.

  • The probability of producing a fully-working 16-core die is 44%. Let’s say that you can fit 1,000 such dies on a wafer. This means that a wafer should let you produce 440 16-core CPUs.
  • The probability of producing a fully-working 8-core die is 66%. You need 2, but they are also twice as small, so you can produce 2,000 such dies on the same wafer5. So, now your wafer is giving you 660 16-core CPUs, a 50% improvement!

CCXs

Within each CCD (die), AMD groups cores in “CCXs” (Core Complex6).

  • From Zen to Zen 2, each CCD contains 2 CCXs, and each CCX contains 4 cores.
  • From Zen 3 to Zen 5, each CCD contains 1 CCXs and each CCX contains 8 cores.
  • With Zen 4c (compact), each CCD contains 2 CCXs, and each CCX contains 8 cores7.

Thus, from Zen to Zen 5, except for Zen 4c, a CCD always contains 8 cores8. But this could change with Zen 5c and Zen 6.

Architecture of a Zen (first generation) CPU, with 2 CCDs, each containing 2 CCXs, each containing 4 cores.
On the left, a Zen 2 CCD, containing two CCXs (upper and lower halves). Each CCX has 4 cores and its own L3 cache. They communicate with Infinity Fabric. On the right, a Zen 3 CCD, containing a single CCX, with 8 cores that share the same L3 cache. Here, Infinity Fabric is used to communicate with other CCDs. Taken from Wikipedia.
On the left, a Zen 4 CCD, containing a single CCX, with 8 cores that share the same L3 cache. On the right, a Zen 4c CCD, containing two CCXs (left and right halves). Each CCX has 8 cores and its own L3 cache. Taken from Chips and Cheese.

My CPU is a 7950X3D. AMD lists three dies on the module, with a CCD die being 71mm² and an I/O die being 122mm². As we might expect, when delidding the CPU, we can observe two smaller dies and a larger one. From this, we can confirm that there are two CCDs on the 7950X3D.

Two smaller dies on the top (the CCDs), and a larger one below (the I/O die). Taken from /u/YouOnly-LiveOnce on Reddit.

For each CCD, Zen 4 CPUs have 1 CCX with 8 cores. This matches the fact that the 7950X3D is a 16-core CPU. We can also deduce from this that all cores of the CCDs are enabled, and thus need to be functional, making it more expensive to produce than, e.g. a 12-core 7900X, which only needs to have 6 functional cores on each CCD.

Conclusion

I think that the main takeaway from this article is that, if you want to look up information about a particular CPU model, you might get more from looking up its microarchitecture than the specific model. For instance, if I want to better understand how the 7950X3D works, I can look up general information about Zen 4 cores, since that’s not actually specific to the CPU model. That include ISA, core features, instruction latencies and throughput, caches, and so on.

  1. I could not find a relevant iceberg meme, and I am way out of my depth (ah!) to fill in anything but the shallow waters. ↩︎
  2. There is surprisingly little mainstream documentation on this, even though the concept is largely known. ↩︎
  3. But, for instance, the 5800XT was released almost a year after the 7900X. ↩︎
  4. In addition, the core dies (CCDs) are produced with 5mm technology, while the I/O die uses 6mm. ↩︎
  5. Probably slightly more thanks to better rectangle packing. ↩︎
  6. Hence a “Core Complex Die” is a die that contains “Core Complexes”, each of which being made of several cores. ↩︎
  7. Wikipedia lists some Zen 4c CPUs with up to 16 cores per CCX, but there is no source. According to AMD, Zen 4c CCDs are made of to CCX, each with up to 8 cores. So I think the number of CCXs listed in the Wikipedia tables is wrong, and that, for instance, the 8534P should be listed wigth a core config of 8 × 8, not 4 × 16. ↩︎
  8. CPUs that feature than 8 cores just use a single CCD with some defective cores, and were binned as 4-core or 6-core CPUs. ↩︎