diff options
author | Dan McDonald <danmcd@joyent.com> | 2021-08-24 17:20:33 -0400 |
---|---|---|
committer | Dan McDonald <danmcd@joyent.com> | 2021-08-24 17:20:33 -0400 |
commit | cf209a1bd5cc79461b0a23ce43ded7ae0387f3ff (patch) | |
tree | 2f9b3650da4ff43d9b0b8afe0238ba40d04cd1e7 | |
parent | a2147f7a3c06d97137a8fe90bd5a977834ce127e (diff) | |
parent | 0c21e245660b46e01c379729e58268754021523c (diff) | |
download | illumos-joyent-cf209a1bd5cc79461b0a23ce43ded7ae0387f3ff.tar.gz |
[illumos-gate merge]release-20210826
commit 0c21e245660b46e01c379729e58268754021523c
14029 Add Zen 2/3 discussion to cpuid.c theory statement
commit 6881023fc43e5c6ac49e78f526f6cce9b68d69a3
14030 udf poorly handles its usage
-rw-r--r-- | usr/src/cmd/amdzen/udf.c | 3 | ||||
-rw-r--r-- | usr/src/uts/i86pc/os/cpuid.c | 128 |
2 files changed, 128 insertions, 3 deletions
diff --git a/usr/src/cmd/amdzen/udf.c b/usr/src/cmd/amdzen/udf.c index 604e0b4802..8c21b02b74 100644 --- a/usr/src/cmd/amdzen/udf.c +++ b/usr/src/cmd/amdzen/udf.c @@ -10,7 +10,7 @@ */ /* - * Copyright 2020 Oxide Computer Company + * Copyright 2021 Oxide Computer Company */ /* @@ -87,6 +87,7 @@ main(int argc, char *argv[]) warnx("missing required arguments"); (void) fprintf(stderr, "Usage: udf [-l] -d device -f func -i " "inst -r reg\n"); + exit(2); } errno = 0; diff --git a/usr/src/uts/i86pc/os/cpuid.c b/usr/src/uts/i86pc/os/cpuid.c index 4b1399355c..d513b5598a 100644 --- a/usr/src/uts/i86pc/os/cpuid.c +++ b/usr/src/uts/i86pc/os/cpuid.c @@ -510,8 +510,9 @@ * generations of topology. There's the basic topology that has been used in * family 0xf+ (Opteron, Athlon64), there's the topology that was introduced * with family 0x15 (Bulldozer), and there's the topology that was introduced - * with family 0x17 (Zen). AMD also has some additional terminology that's worth - * talking about. + * with family 0x17 (Zen), evolved more dramatically in Zen 2 (still family + * 0x17), and tweaked slightly in Zen 3 (family 19h). AMD also has some + * additional terminology that's worth talking about. * * Until the introduction of family 0x17 (Zen), AMD did not implement something * that they considered SMT. Whether or not the AMD processors have SMT @@ -648,6 +649,129 @@ * die is made up of two core complexes, we have multiple different NUMA * domains that we care about for these systems. * + * ZEN 2 + * + * Zen 2 changes things in a dramatic way from Zen 1. Whereas in Zen 1 + * each Zeppelin Die had its own I/O die, that has been moved out of the + * core complex in Zen 2. The actual core complex looks pretty similar, but + * now the die actually looks much simpler: + * + * +--------------------------------------------------------+ + * | Zen 2 Core Complex Die HH | + * | HH | + * | +-----------+ HH +-----------+ | + * | | | HH | | | + * | | Core |==========| Core | | + * | | Complex |==========| Complex | | + * | | | HH | | | + * | +-----------+ HH +-----------+ | + * | HH | + * | HH | + * +--------------------------------------------------------+ + * + * From here, when we add the central I/O die, this changes things a bit. + * Each die is connected to the I/O die, rather than trying to interconnect + * them directly. The following image takes the same Zen 1 image that we + * had earlier and shows what it looks like with the I/O die instead: + * + * PP PP + * PP PP + * +---------------------PP----PP---------------------+ + * | PP PP | + * | +-----------+ PP PP +-----------+ | + * | | | PP PP | | | + * | | Zen 2 | +-PP----PP-+ | Zen 2 | | + * | | Die _| | PP PP | |_ Die | | + * | | |o|oooo| |oooo|o| | | + * | +-----------+ | | +-----------+ | + * | | I/O | | + * MMMMMMMMMMMMMMMMMMMMMMMMMM Die MMMMMMMMMMMMMMMMMMMMMMMMMM + * MMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM + * | | | | + * MMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM + * MMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM + * | | | | + * | +-----------+ | | +-----------+ | + * | | |o|oooo| PP PP |oooo|o| | | + * | | Zen 2 -| +-PP----PP-+ |- Zen 2 | | + * | | Die | PP PP | Die | | + * | | | PP PP | | | + * | +-----------+ PP PP +-----------+ | + * | PP PP | + * +---------------------PP----PP---------------------+ + * PP PP + * PP PP + * + * The above has four core complex dies installed, though the Zen 2 EPYC + * and ThreadRipper parts allow for up to eight, while the Ryzen parts + * generally only have one to two. The more notable difference here is how + * everything communicates. Note that memory and PCIe come out of the + * central die. This changes the way that one die accesses a resource. It + * basically always has to go to the I/O die, where as in Zen 1 it may have + * satisfied it locally. In general, this ends up being a better strategy + * for most things, though it is possible to still treat everything in four + * distinct NUMA domains with each Zen 2 die slightly closer to some memory + * and PCIe than otherwise. This also impacts the 'amdzen' nexus driver as + * now there is only one 'node' present. + * + * ZEN 3 + * + * From an architectural perspective, Zen 3 is a much smaller change from + * Zen 2 than Zen 2 was from Zen 1, though it makes up for most of that in + * its microarchitectural changes. The biggest thing for us is how the die + * changes. In Zen 1 and Zen 2, each core complex still had its own L3 + * cache. However, in Zen 3, the L3 is now shared between the entire core + * complex die and is no longer partitioned between each core complex. This + * means that all cores on the die can share the same L3 cache. Otherwise, + * the general layout of the overall package with various core complexes + * and an I/O die stays the same. Here's what the Core Complex Die looks + * like in a bit more detail: + * + * +-------------------------------------------------+ + * | Zen 3 Core Complex Die | + * | +-------------------+ +-------------------+ | + * | | Core +----+ | | Core +----+ | | + * | | +--------+ | L2 | | | +--------+ | L2 | | | + * | | | Thread | +----+ | | | Thread | +----+ | | + * | | +--------+-+ +--+ | | +--------+-+ +--+ | | + * | | | Thread | |L1| | | | Thread | |L1| | | + * | | +--------+ +--+ | | +--------+ +--+ | | + * | +-------------------+ +-------------------+ | + * | +-------------------+ +-------------------+ | + * | | Core +----+ | | Core +----+ | | + * | | +--------+ | L2 | | | +--------+ | L2 | | | + * | | | Thread | +----+ | | | Thread | +----+ | | + * | | +--------+-+ +--+ | | +--------+-+ +--+ | | + * | | | Thread | |L1| | | | Thread | |L1| | | + * | | +--------+ +--+ | | +--------+ +--+ | | + * | +-------------------+ +-------------------+ | + * | | + * | +--------------------------------------------+ | + * | | L3 Cache | | + * | +--------------------------------------------+ | + * | | + * | +-------------------+ +-------------------+ | + * | | Core +----+ | | Core +----+ | | + * | | +--------+ | L2 | | | +--------+ | L2 | | | + * | | | Thread | +----+ | | | Thread | +----+ | | + * | | +--------+-+ +--+ | | +--------+-+ +--+ | | + * | | | Thread | |L1| | | | Thread | |L1| | | + * | | +--------+ +--+ | | +--------+ +--+ | | + * | +-------------------+ +-------------------+ | + * | +-------------------+ +-------------------+ | + * | | Core +----+ | | Core +----+ | | + * | | +--------+ | L2 | | | +--------+ | L2 | | | + * | | | Thread | +----+ | | | Thread | +----+ | | + * | | +--------+-+ +--+ | | +--------+-+ +--+ | | + * | | | Thread | |L1| | | | Thread | |L1| | | + * | | +--------+ +--+ | | +--------+ +--+ | | + * | +-------------------+ +-------------------+ | + * +-------------------------------------------------+ + * + * While it is not pictured, there are connections from the die to the + * broader data fabric and additional functional blocks to support that + * communication and coherency. + * * CPUID LEAVES * * There are a few different CPUID leaves that we can use to try and understand |