When AMD and Nvidia release all of those TFLOPs numbers, it’s important to realize that those are theoretical maximums. In order of a chip to reach that number, its architecture has to be extremely efficient and powerful to use. When Pascal first launched, Nvidia released some details about what it would look like. With the release of the white paper for the architecture, there are a few additional highlights worth noting.
First off, we know that Pascal has cut the SM (Stream Multiprocessor) down from 128 FP32 cores to 64. This allows for better distribution of processing power to tasks and as each SM keeps the same amount of register files and other support hardware, throughput is increased overall. Nvidia has also tweaked the SM so datapaths are more streamlined and sharing information within the SM takes less power and hardware. The scheduler has also seen some improvements and updates to ensure the SM is constantly being fed.Cache sizes have also been increased from 3MB to 4MB and a dedicated shared memory space of 64KB per SM has been added This is lower than the 96KB per SM in Maxwell but if you consider the doubled SM count relative to a same size Maxwell chip, it’s actually an increase of 16KB per SM. Finally, Nvidia detailed the P100 interposer layout for HBM2, something for us to look forward to when HBM2 finally arrives.
For other details, be sure to check out our earlier write up on the Pascal architecture.
With the reveal of the Tesla P100, Nvidia has taken the wraps off of their new Pascal architecture. Originally set to debut last year, delays with 16nm kept Pascal from being a reality, leading to Maxwell on 28nm. Now that Pascal is finally here, we are getting an architecture that combines the gaming abilities of Maxwell with much improved compute performance. The new Unified Memory and Compute Pre-Emption are the main highlights.
First off, Pascal changes the SM (Stream Multiprocessor) configuration yet again. Kepler featured 192 CUDA cores per SM, Maxwell had 128 and Pascal will now have 64. By reducing the number of CUDA cores per SM, it increases the fine grain control over compute tasks and ensure higher efficiency. Interestingly, 64 is also the same amount of cores GCN has in each CU, AMD’s equivalent to SM. The TMU to CUDA core ratio remains the same as Maxwell with 4 per SM instead of 8, in line with the drop in cores/SM.
For compute, the gains mostly come from increasing the number of FP64 or Dual Precision CUDA cores. DP is important for scientific and compute workloads though game rarely make use of them. Kepler started cutting out some FP64 units and Maxwell went even further, with virtually no FP64 even in the Tesla’s. This was one reason why Maxwell cards were so efficient and Nvidia only managed to hold onto their leadership in compute due to CUDA and their Single Precision performance.
With Pascal, the ratio of SP to DP units goes to 2:1, significantly higher than the 32:1 of Maxwell and 3:1 of Kepler. GP100 in particular has about 50% of its die space dedicated to FP32, about 25% to DP and the last 25% split between LD/ST and SFUs. This suggests that Pascal won’t be changing much in terms of gaming performance. The only gains will be from a slight increase in efficiency due to the smaller SMs and the die shrinking from 16nmFF+. GeForce variants of Pascal may have their FP64 units trimmed to cram in more FP32 resources but again, most of the gains will be due to increased density.
Lastly, Pascal brings forward unified memory to allow threads to better share information. This comes along with improved L2 cache sizes and the more than double register file sizes. P100, the first Pascal chip, also uses HBM2, with 16GB of VRAM over a 4096bit bus for a peak bandwidth of 720 GB/s. For CUDA compute tasks, a new Unified Memory model allows Pascal GPUs to utilize the entire system memory pool with global coherency. This is one way to tackle AMD’s advancement with HSA and GCN and Intel’s Xeon Phi’s.
Overall, Pascal looks to be an evolutionary update for Nvidia. Perhaps, Nvidia has reached the point that Intel has, making incremental progress. In other ways though, the reduction in SM size has great potential and provides a more flexible framework to build GPUs. Now all we are waiting for is for the chips to finally drop.
After revealing their next flagship Telsa earlier, Nvidia has let loose with a few more details and specifications. Based on the new Pascal architecture, the P100 will be utilizing TSMC’s latest 16nmFF+ process. As we know from the keynote, the chip will feature 15.3 billion transistors and the latest HBM2 memory. The P100 also features what Nvidia is calling the “5 miracles”.
First off, the P100 will run at an impressive 1328 MHz base clock and 1480 MHz boost. This is high for a professional Tesla card though well in line with GeForce clocks. The card won’t be using the full GP100 die with 60 SMs and 3840 CUDA cores, rather it will use a cut-down version with 56 SMs with 3584 cores. This mirrors Kepler’s launch where the cut-down Titan came before the Titan Black. In addition to the usual FP32 CUDA cores, there are also 1792 FP64 CUDA cores for Dual Precision Work. This gives a 2SP/1DP ratio, higher than anything from Kepler or Maxwell. The P100 also has 224 TMUs and massive amounts of cache and register files.
Next, we have the massive 610 mm² die on 16nmFF+. About 50% of that is FP32 CUDA cores, 25% is FP64 and rest on other parts. This means despite the massive die size, the P100 and GP100 derivatives won’t be great gamers, as games generally only use FP32 CUDA cores. There may be a GP100 variant though that swaps out the FP64 cores for FP32 ones. Even saddled with compute though, GP100 will still beat the Titan X by a good margin. TDP is a relatively tame 300W, as expected from the use of 16nm and 16GB of HBM2.
Finally, most marketing statements are hyperbole and the “5 miracles” are no exception. They are the Pascal Architecture, 16nm FinFET, CoWoS with HBM2, NVLink, and New AI Algorithms. Honestly, none of these are really that amazing on their own and have been expecting. Combining all of them in one go on such a massive chip though is pretty amazing though. While the P100 will be shipping soon, don’t expect many till Q1 2017.
With both the Pascal announcement and GeForce launch coming ins the next 2 months, more information is being leaked about the upcoming Nvidia cards. According to the latest rumour, the first GeForce Pascal card to launch, the GTX 1080, will not be as impressive as many had hoped. As expected from Nvidia, they are keeping with their tradition to launch first with the mainstream GP104 die first in order to maximise yields and profits.
Utilizing the GP104 based on the 16nmFF+ process from TSMC, the GTX 1080 may yet be the fastest Nvidia card yet on the market till the bigger GP100 GeForces launches later. Despite the boost in performance, it appears that Nvidia will be sticking to 8GB of plain old GDDR5X, and not using HBM2 as some have suggested. While GDDD5X does have some disadvantages, it is a decent upgrade over GDDR5 and allows for an earlier launch than using HBM2 as production for those chips are still ramping up.
Furthermore, the leak specifies the display outputs as DisplayPort x2, HDMI x1, DVI x1 and the use of only 1 PCIe 8 pin power connector. This limits power to 225W but with the new architecture and use of 16nmFF+, this may still allow the card to dance with the 980Ti. The launch date is reported as May 27th, just before Computex. Big Pascal GP100 is set to launch before that date though so stay tuned!
Last month we received word that the first Pascal chips would be launching a bit sooner than expected. At that time, GP100 was expected to drop in April and GP104 in June. According to the latest rumors, it looks like that timetable was accurate, with the GP100 based Tesla chip coming in April around GTC. What’s more, we’re getting more details about when the rest of Nvidia’s Pascal lineup will launch.
As with the previous report, GP104 will arrive in June and it looks like the GTX lineup will be based off that, with both the GTX 1080 and 1070 being GP104 chips. Near the end of the year in Q4, we can expect GP106 and GP107. These will be longer end chips and likely power the GTX 1060 and 1050. Finally, we have the Titan which will use GP100 and a GP108 in early 2017. This follow’s Nvidia’s new trend of releasing a GTX x80 first, followed by the Titan, then finally a GTX x80Ti. While it’s good for Nvidia’s sales, it moves high-end users into a quicker upgrade schedule than if all the cards launched at the same time.
The Tesla launching first makes a whole lot of sense as enterprise users can pay the high premiums for early HBM2 and 16nm. A June launch for GP104 may point to them using HBM2 as the timeline is a bit tight for GDDR5X though it is doable. The biggest question is how well Pascal will perform as it is a stopgap architecture between Maxwell and Volta, like a Maxwellv2 though the die shrink to 16nm should make some great gains in and of itself.
For those waiting on Nvidia’s next-generation GPUs, the wait may not be as long as expected. Last week, we found out that Pascal would be arriving a bit sooner than expected, in the earlier parts of 2H 2016 rather than late in 2016. Now, it looks like Nvidia may be moving even faster than those rumours, with GP100 to arrive in April with GP104 to follow 2 months later in June. What’s more, the GTX 1080 will also debut in June and reportedly be based off GP104; perhaps we’ll see it at Computex 2016?
While GP100 or Big Pascal will launch first in April, that is only for the Titan and various enterprise models. This is in line with what Nvidia has done in the past by launching models with higher margins in order to reduce their risk and grab as much of the early adopter crowd cash. Later on, the more mainstream GP104 will follow up with gaming oriented GeForce models with the compute units cut out. The biggest change is that the Titan will be launching before the GeForce this time.
If the GTX 1080 is based off GP104 as rumoured, this would suggest a GTX 1080Ti based off of GP100 would arrive later on, just like what happened with the 9xx and 6xx series. For those looking to get the very best gaming card for the next generation, waiting may be a smart move. AMD is also set to launch their own Polaris GPUs around the same time though it looks like Nvidia may beat them to the punch with GP100.
Even as this generation’s GPUs are continuing to fly off the shelves, Nvidia is already gearing up for their Pascal launch. Despite being quieter than AMD, it looks like Nvidia will launch their Pascal cards around the same time, in 2H 2016 as AMD’s Polaris will. What’s more, 2H 2016 will see Nvidia’s flagship Pascal GPU based on TSMC’s 16nmFF+ process and utilizing HBM2. This is still a rumour right now but it does fit the time frame since 1H 2016 would be too soon and 2017 too late.
The biggest question is what does”flagship” mean exactly. Ever since GTX 680 was launched, Nvidia has been playing around with the word flagship. Traditionally, the big dies like GF110 would launch first with the smaller mainstream GF104 launching after. Kepler and Maxwell saw that switch with GK104 and GM104 launching ahead of GK110 and GM200 respectively. This suggests that the so-called “flagship” may only be GP104 and not GP100. Even if it is GP100, it may well be a cut-down version, similar to how the GTX 780 was the cut-down variant of the later GTX 780Ti. This strategy does maximize sales for Nvidia but isn’t that great for consumers.
Whatever the card is, be it GP104 or GP100, it is going to use HBM2, giving it at least 512GB/s with 8GB of VRAM but potentially much higher at 16-32GB with 1TB/s+ of bandwidth. With AMD set to launch Polaris around the same time, Q3 2016 should make for exciting times as a slew of new GPUs arrive.
It’s no secret that Nvidia are working their next-generation graphics hardware, even with their flagship 980Ti and Titan series of cards doing well, there’s always going to be something better on the horizon. The next Nvidia hardware will be powered by the new GP100 silicon, which has reportedly entered its testing phase, putting it through its paces to judge its viability for future consumer hardware.
This means that Nvidia now has a few completed chips which may one day make it into their next-gen graphics cards. GP100 is based on the Pascal architecture, which is set to feature more than 17 billion transistors, made possible by TSMC’s 16nm FinFET+ node construction. Of course, the excitement doesn’t end there, as the new hardware an HBM2 memory interface, which leaves room for up to 32GB of extremely fast memory, although it’s more likely this next-generation will not go above 16GB.
Early predictions, which are usually quite accurate, expect that Pascal GP100 hardware, with HBM2 memory could outperform the current Titan-X hardware but 60-90%, and I honestly don’t doubt that fact.
With AMD and Nvidia cooking up HBM2 cards, the next generation of the GPU wars is going to be a lot of fun!
Thank you TechPowerUp for providing us with this information.
While there had been some rumours that Nvidia would turn to Samsung’s 14nm process for GPUs, it appears those were wrong. For the longest time, Nvidia has relied on TSMC to manufacture their chips and it appears this relationship is continuing. Set to launch next year, Nvidia’s Pascal architecture will reportedly use TSMC’s latest 16nm process. This will be the same process used for AMD’s upcoming Greenland GPUs.
As with AMD’s Greenland, Pascal will be a new architecture with new features and other improvements. Most notably, Pascal will be paired with HBM2, allowing for up to 16GB of VRAM and 1TB/s of memory bandwidth. Other additions include support for NVLINK, Nvidia’s GPU interconnect and mixed precision support. With Kepler and later Maxwell, Nvidia had been stripping out compute power, leading to better power efficiency but at the cost of compute performance. Pascal is set to fix this and bring Nvidia’s compute power back on par with AMD’s, though likely at the cost of efficiency.
Even though Samsung lost out this time, the simple fact that they were in competition with TSMC speaks volumes. TSMC has been falling slightly behind in terms of process technology and trying to meet Apple’s insatiable demand. In some ways, using Samsung would have made sense as Samsung is also set to be a major HBM2 supplier as well, simplifying the production for Nvidia. In the end though, it seems that TSMC’s long experience with Nvidia and GPU’s won out.
After being stuck for what seems like forever on 28nm, we’re finally getting a glimpse of the monsters set to arrive with TSMC’s 16nm process. Code-named Pascal, Nvidia’s top end 16nm GPU is reportedly pushing 17 billion transistors, set to replace the current GM200.
To put that number in context, the current Titan X only clocks in at about 8 billion transistors, making the “GP100” Pascal more than twice as complex and likely twice as dense. Even AMD’s monster Fury X only pushes 8.9 billion transistors, which is still far and behind Pascal. Combined with a reported 32Gb of HBM VRAM at the highest SKU, Pascal may show a massive jump in performance compared to our current chips.
These gains are only possible with the new 16nm FinFET process from TSMC. Being nearly twice as dense, 16nm would allow Nvidia and AMD to double transistors in only a slightly larger die size. Combined with better power efficiency from being a lower process, FinFETs and HBM, efficiency should also improve despite having more transistors. Despite being called 16nm, TSMC’s process is closer to Intel’s 22nm or Samsung’s 20nm design, so there is certainly even more room to shrink in the future.
While CPUs have not benefitted as much from increased transistor counts, GPUs are relatively less complex and easier to make full use of the extra transistors. With DX12 and Vulcan in line as well as the new architectures from Nvidia and AMD, these new technologies should create a perfect storm to push GPU performance and gaming forward.
Thank you Fudzilla for providing us with this information