When AMD and Nvidia release all of those TFLOPs numbers, it’s important to realize that those are theoretical maximums. In order of a chip to reach that number, its architecture has to be extremely efficient and powerful to use. When Pascal first launched, Nvidia released some details about what it would look like. With the release of the white paper for the architecture, there are a few additional highlights worth noting.
First off, we know that Pascal has cut the SM (Stream Multiprocessor) down from 128 FP32 cores to 64. This allows for better distribution of processing power to tasks and as each SM keeps the same amount of register files and other support hardware, throughput is increased overall. Nvidia has also tweaked the SM so datapaths are more streamlined and sharing information within the SM takes less power and hardware. The scheduler has also seen some improvements and updates to ensure the SM is constantly being fed.Cache sizes have also been increased from 3MB to 4MB and a dedicated shared memory space of 64KB per SM has been added This is lower than the 96KB per SM in Maxwell but if you consider the doubled SM count relative to a same size Maxwell chip, it’s actually an increase of 16KB per SM. Finally, Nvidia detailed the P100 interposer layout for HBM2, something for us to look forward to when HBM2 finally arrives.
For other details, be sure to check out our earlier write up on the Pascal architecture.