All hopes at AMD are pinned on their new Zen CPU architecture performing well against Intel’s currently superior lineup. While we have seen some leaked block diagrams and claims of 40% IPC improvement come out, other details have been scant. We now have a patch that details pretty much what Zen will look like, at least at the block level. Zen is also expected to bring instruction set parity between the two x86 CPU players.
Overall, Zen may have a total of 10 executions ports. These are meant to feed the integer side 4 ALUs and 2 AGUs, and the FPU consisting of 2 128it FMAC, each with 2 128BIT add and 2 128bit mul. There is a chance that AMD will have fewer execution ports as the ALUs may share ports with the FPU. In order to get the most out of the architecture though, having more execution ports is crucial. With such a wide core, AMD will really let their SMT (Hyper-Threading) stretch its legs with mixed workloads.
Compared to Bulldozer, the ALU count remains the same but being in the same core should give a massive boost to single-threaded performance. Zen does lose out 2 AGUs in the process, but that shouldn’t hurt it too much given our experience with K10. The fact they should be AVX2 compatible should also mean improvements on throughput, at least if the software uses the latest extensions. For the FPU, Zen basically doubles the throughput, which goes a long way towards boosting IPC in floating point heavy workloads. It’s interesting that Zen goes with 128bit units but we’re expecting them to combine to do AVX instructions which should provide parity with Haswell/Skylake. Furthermore, by allowing the FPU to split into 2 128bit units, older instructions may actually run better than on Intel which still can only process 1 128bit instruction despite the execution units 256bit width.
On the instruction decode side of things, Zen cuts things down from Steamroller/Excavator, with only 4 instructions per clock compared to 8. Zen ‘s decoders won’t need to feed 2 cores however as in the Bulldozer design, meaning the real decode rate is the same provided you are running more than 1 core at a time. 4 instructions per clock is also where Intel is currently sitting. For the cache, it looks like we will be seeing a return to the Cat (Jaguar) and K10 design, with 512Kb of L2 per core and 32KB of L1 data, with 32kb of L1 instruction likely as well. While it is a drop, Zen won’t have to feed as many cores and with less cache thrashing it should actually perform better.
With Zen, it really looks like AMD has taken a lot of lessons from K10, Jaguar, Bulldozer and even Intel to create what appears to be a really strong CPU architecture on paper. By combining all of the strong traits from previous and current CPUs, AMD may finally give Intel a run for their money. It’s just too bad we’ll have a year to wait before Zen will arrive. Given Intel’s pace though, Zen should still be plenty competitive in a years time.
Thank you dresdenboy for providing us with this information