AMD Patch Reveals Highly Competitive Zen Architecture Details

All hopes at AMD are pinned on their new Zen CPU architecture performing well against Intel’s currently superior lineup. While we have seen some leaked block diagrams and claims of 40% IPC improvement come out, other details have been scant. We now have a patch that details pretty much what Zen will look like, at least at the block level. Zen is also expected to bring instruction set parity between the two x86 CPU players.

Overall, Zen may have a total of 10 executions ports. These are meant to feed the integer side 4 ALUs and 2 AGUs, and the FPU consisting of 2 128it FMAC, each with 2 128BIT add and 2 128bit mul. There is a chance that AMD will have fewer execution ports as the ALUs may share ports with the FPU. In order to get the most out of the architecture though, having more execution ports is crucial. With such a wide core, AMD will really let their SMT (Hyper-Threading) stretch its legs with mixed workloads.

Compared to Bulldozer, the ALU count remains the same but being in the same core should give a massive boost to single-threaded performance. Zen does lose out 2 AGUs in the process, but that shouldn’t hurt it too much given our experience with K10. The fact they should be AVX2 compatible should also mean improvements on throughput, at least if the software uses the latest extensions. For the FPU, Zen basically doubles the throughput, which goes a long way towards boosting IPC in floating point heavy workloads. It’s interesting that Zen goes with 128bit units but we’re expecting them to combine to do AVX instructions which should provide parity with Haswell/Skylake. Furthermore, by allowing the FPU to split into 2 128bit units, older instructions may actually run better than on Intel which still can only process 1 128bit instruction despite the execution units 256bit width.

On the instruction decode side of things, Zen cuts things down from Steamroller/Excavator, with only 4 instructions per clock compared to 8. Zen ‘s decoders won’t need to feed 2 cores however as in the Bulldozer design, meaning the real decode rate is the same provided you are running more than 1 core at a time. 4 instructions per clock is also where Intel is currently sitting. For the cache, it looks like we will be seeing a return to the Cat (Jaguar) and K10 design, with 512Kb of L2 per core and 32KB of L1 data, with 32kb of L1 instruction likely as well. While it is a drop, Zen won’t have to feed as many cores and with less cache thrashing it should actually perform better.

With Zen, it really looks like AMD has taken a lot of lessons from K10, Jaguar, Bulldozer and even Intel to create what appears to be a really strong CPU architecture on paper. By combining all of the strong traits from previous and current CPUs, AMD may finally give Intel a run for their money. It’s just too bad we’ll have a year to wait before Zen will arrive. Given Intel’s pace though, Zen should still be plenty competitive in a years time.

Thank you dresdenboy for providing us with this information

Rumors Suggest AMD Zen Instruction Set Parity with Haswell/Broadwell

While AMD has released some details about Zen at their Financial Analyst Day earlier this year, details have still been a bit scant. What we already know is that Zen will have a 40% IPC increase compared to Excavator, bringing AMD’s IPC much closer to Intel’s in one jump. Zen will also support a version of Simultaneous Multithreading (SMT) to support 2 logical processors per core. This will all be bundled on the AM4 platform with DDR4 support and use a FinFET process. Most critically, the CMT or cluster-based threading will be gone and each core will have 2 256bit FPUs and a good number of Integer ALUs.

Today though, we have a rumour that suggests that Zen will bring AMD to instruction set parity with Intel’s Haswell/Broadwell CPUs. With Excavator that launched earlier this year, AMD already caught up partially with AVX 2 which brings 256bit support to integer work, BMI2 and RDRAND for pseudo-random number generation. If Zen is to catch up to Haswell, it will probably add hardware acceleration support to CRC, SHA-256 and RSA algorithms and RDSEED for more pseudo-random number generation. interestingly, there is also suggestions that AMD’s SMT implementation will be compatible with the Intel’s meaning OS’s may not need to be patched, like they did with Bulldozer, to fully support the extra logical processor.

AMD may also support some of the new Skylake instructions like AVX 512 though we will have to wait and see. Part of this is due to the fact that Intel is yet to fully reveal what Skylake supports till IDF later this month. With Intel slipping in a refresh with Kaby Lake in 2016, AMD really has a good chance at a comeback if Zen performs well.

Thank you Fudzilla for providing us with the information

Intel Reveals Details Regarding Intel’s “Knights Landing” Xeon Phi Coprocessor

Intel has announcement the ‘Knights Landing’ Xeon Phi Coprocessor late last year, having released very few details about the lineup back then. As time passes, details are bound to be revealed and Intel is said to start shipping the series next year. This is why Intel apparently has decided to reveal some more details regarding the ‘Knights Landing’ Xeon Phi Coprocessor.

The announcement from last year points to the Knights Landing taking the jump from Intel’s enhanced Premium 1 P54C x86 cores and moving on to the more modern Silvermont x86 cores, significantly increasing the single threaded performance. Furthermore, the cores are said to incorporate AVX units, allowing AVX-512F operations and provide bulk Knight Landing’s compute power.

Intel is said to offer 72 cores in Knight Landing CPUs, with double-precision FP63 performance expected to reach 3 TFLOPS, having the CPUs boasting the 14nm technology. While this is somewhat old news, Intel revealed some more insights at the ISC 2014.

During the conference, Intel stated that the company is required to change the 512-bits and GDDR5 memory present in the current Knights Corner series. This is why Intel and Micron have apparently struck a deal to work on a more advanced memory variant of Hybrid Memory Cube (HMC) with increased bandwidth.

Also, Intel and Micron are said to be working on a Multi-Channel DRAM (MCDRAM) specially designed for Intel’s processors, having a custom interface best suited for Knights Landing. This is said to help scale its memory support up to 16 GB if RAM while offering up to 500 GB/s memory bandwidth, a 50% increased compared to Knights Corner’s GDDR5.

The second change made to Knights Landing is said to include replacing the True Scale Fabric with Omni Scale Fabric in order to offer better performance compared to the current fabric solution. Though Intel is currently keeping this information on a down-low, traditional Xeon processors are said to benefit from this fabric change in the future as well.

Lastly, compared to Intel’s Knights Corner series, the Knights landing will be available both in PCIe and socketed form factor, mainly thanks to the MCDRAM technology. This is said to allow the CPU to be installed alongside Xeon processors on specific motherboards. The company has also emphasised that the Knights Landing version will be able to communicate directly with other CPUs with the help of Quick Patch Interconnect, compared to current PCIe interface.

In addition to the latter, having the Knights Landing socketed would also allow it to benefit from the Xeon’s NUMA capabilities, being able to share memory and memory spaces with the Xeon CPUs. Also, Knights Landing is said to be binary compatible with Haswell CPUs, having the company considering writing programs once and running them across both types of processors.

Intel is expected to start shipping the Knights Landing Xeon Psi Coprocessor somewhere around Q2 2015, having the company already lining up its first Knights Landing supercomputer deals with National Energy Research Scientific Computing Center with around 9300 Knights Landing nodes.

Thank you Anandtech for providing us with this information
Image courtesy of Anandtech

Intel Reveals Haswell-E Engineering Sample With 8 Cores and 3 GHz Clock Speed

According to a post from chinese portal VR-Zone, a picture of the first Haswell-E engineering sample which will feature 8 cores and a clock speed of 3 GHz has been revealed. Based on the 22nm Haswell architecture, Intel’s Haswell-E processor stacked in the X or Extreme series would be the chip giant’s first chip to feature 8 native core with 16 threads which will put them in the same line with AMD’s 8 Core processors which have been available since the arrival of Bulldozer in 2011.

Intel would be shipping two unlocked processors at launch which will include an X series Extreme edition and K Series Unlocked edition chips. Intel should integrate the Haswell-E processors as the Core i7-5xxx series in which case the X series would be known as Core i7-5960X and K series part would be called the Core i7-5930K. These aren’t confirmed names, but Intel has kept this style of series branding for a while and we suspect they will continue the trend with their Haswell-E and Broadwell generation of processors up until 2015.

The detailed from Intel reveal a 6-8 cores for their Haswell-E processors that would be equipped with a massive 20 MB of L3 smart cache and just like Haswell. It would feature an integrated voltage regulator and the flashgrip part would ship with TDPs around 140W which is impressive since that’s 10W under what we get on the Core i7-3970X which has 6 cores compared to the Haswell-E beast that would feature 8 cores and 20 MB of L3 cache. Intel is aiming for an 55% IPC improvement over quad cores with their flagship Haswell-E processors.

Haswell-E would also keep the great overclocking features that would ship with the “K” series and “Extreme Edition” processors. Both the memory and processor can be overclocked beyond limits with unlocked turbo limits, unlocked core ratios in 80/100 increments, programmable iVR voltage, support for XMP mode, unlocked memory controller and voltage limits, native support for memory up to 2667 MHz, Unlocked PCH and PLL voltage controls and more.

One Haswell-E processor is said to support two x16 and three x8 PCIe v3.x with 40 lanes and would be directly connected to the DDR4 memory controller and the Wellsburg X99 chipset. The feature set would remain the with technologies such as SSE4, AVX, VT, AESNI under its belt. Unlike the Haswell processor which come with 4th gen HD graphics core, the Haswell-E platform wouldn’t feature built-in graphics but someone buying such a costly processors will definitely go for a discrete GPU for graphics.

Intel’s Haswell-E is officially the first HEDT platform to feature support for DDR4 memory which is great news for enthusiasts who want to upgrade from the DDR3 memories which have reached their max overclock speeds. The new DDR4 memory modules consume only 1.2 V of power compared to 1.65/1.5V standard with DDR3. The can feature upto 16 banks of memory and require a 288-Pin DIMM connectors which would be available on the new X99 chipset motherboards. The DDR4 memory controller offers Quad channel memory support.

Haswell-E is expected be released in Q4 2014, however it could very well be pushed to early 2015.

Thank you WCCF for providing us with this information
Images courtesy of WCCF