Ashes of the Singularity DirectX 12 Graphics Performance Analysis

Introduction


Ashes of the Singularity is a futuristic real-time strategy game offering frenetic contests on a large-scale. The huge amount of units scattered across a number of varied environments creates an enthralling experience built around complex strategic decisions. Throughout the game, you will explore unique planets and engage in enthralling air battles. This bitter war revolves around an invaluable resource known as Turinium between the human race and masterful artificial intelligence. If you’re into the RTS genre, Ashes of the Singularity should provide hours of entertainment. While the game itself is worthy of widespread media attention, the engine’s support for DirectX 12 and asynchronous compute has become a hot topic among hardware enthusiasts.

DirectX 12 is a low-level API with reduced CPU overheads and has the potential to revolutionise the way games are optimised for numerous hardware configurations. In contrast to this, DirectX 11 isn’t that efficient and many mainstream titles suffered from poor scaling which didn’t properly utilise the potential of current graphics technology. On another note, DirectX 12 allows users to pair GPUs from competing vendors and utilise multi graphics solutions without relying on driver profiles. It’s theoretically possible to achieve widespread optimization and leverage extra performance using the latest version of DirectX 12.

Of course, Vulkan is another alternative which works on various operating systems and adopts an open-source ideology. Although, the focus will likely remain on DirectX 12 for the foreseeable future unless there’s a sudden reluctance from users to upgrade to Windows 10. Even though the adoption rate is impressive, there’s a large number of PC gamers currently using Windows 7, 8 and 8.1. Therefore, it seems prudent for developers to continue with DirectX 11 and offer a DirectX 12 render as an optional extra. Arguably, the real gains from DirectX 12 will occur when its predecessor is disregarded completely. This will probably take a considerable amount of time which suggests the first DirectX 12 games might have reduced performance benefits compared to later titles.

Asynchronous compute allows graphics cards to simultaneously calculate multiple workloads and reach extra performance figures. AMD’s GCN architecture has extensive support for this technology. In contrast to this, there’s a heated debate questioning if NVIDIA products can even utilise asynchronous compute in an effective manner. Technically, AMD GCN graphics cards contain 2-8 asynchronous compute cores with 8 queues per core which varies on the model to provide single cycle latencies. Maxwell revolves around two pipelines, one designed for high-priority workloads and another with 31 queues. Most importantly, NVIDIA cards can only “switch contexts at draw call boundaries”. This means the switching process is slower and gives AMD and a major advantage. NVIDIA has dismissed the early performance numbers from Ashes of the Singularity due to its current development phase. Finally, the game’s release has exited the beta stage which allows us to determine the performance numbers after optimizations were completed.

Take a Look at the New Nvidia Pascal Architecture

With the reveal of the Tesla P100, Nvidia has taken the wraps off of their new Pascal architecture. Originally set to debut last year, delays with 16nm kept Pascal from being a reality, leading to Maxwell on 28nm. Now that Pascal is finally here, we are getting an architecture that combines the gaming abilities of Maxwell with much improved compute performance. The new Unified Memory and Compute Pre-Emption are the main highlights.

First off, Pascal changes the SM (Stream Multiprocessor) configuration yet again. Kepler featured 192 CUDA cores per SM, Maxwell had 128 and Pascal will now have 64. By reducing the number of CUDA cores per SM, it increases the fine grain control over compute tasks and ensure higher efficiency. Interestingly, 64 is also the same amount of cores GCN has in each CU, AMD’s equivalent to SM. The TMU to CUDA core ratio remains the same as Maxwell with 4 per SM instead of 8, in line with the drop in cores/SM.

For compute, the gains mostly come from increasing the number of FP64 or Dual Precision CUDA cores. DP is important for scientific and compute workloads though game rarely make use of them. Kepler started cutting out some FP64 units and Maxwell went even further, with virtually no FP64 even in the Tesla’s. This was one reason why Maxwell cards were so efficient and Nvidia only managed to hold onto their leadership in compute due to CUDA and their Single Precision performance.

With Pascal, the ratio of SP to DP units goes to 2:1, significantly higher than the 32:1 of Maxwell and 3:1 of Kepler. GP100 in particular has about 50% of its die space dedicated to FP32, about 25% to DP and the last 25% split between LD/ST and SFUs. This suggests that Pascal won’t be changing much in terms of gaming performance. The only gains will be from a slight increase in efficiency due to the smaller SMs and the die shrinking from 16nmFF+. GeForce variants of Pascal may have their FP64 units trimmed to cram in more FP32 resources but again, most of the gains will be due to increased density.

Lastly, Pascal brings forward unified memory to allow threads to better share information. This comes along with improved L2 cache sizes and the more than double register file sizes. P100, the first Pascal chip, also uses HBM2, with 16GB of VRAM over a 4096bit bus for a peak bandwidth of  720 GB/s. For CUDA compute tasks, a new Unified Memory model allows Pascal GPUs to utilize the entire system memory pool with global coherency. This is one way to tackle AMD’s advancement with HSA and GCN and Intel’s Xeon Phi’s.

Overall, Pascal looks to be an evolutionary update for Nvidia. Perhaps, Nvidia has reached the point that Intel has, making incremental progress. In other ways though, the reduction in SM size has great potential and provides a more flexible framework to build GPUs. Now all we are waiting for is for the chips to finally drop.

NVIDIA Pascal Rumoured to Struggle with Asynchronous Compute

NVIDIA’s new Pascal GPU micro-architecture – billed as 10x faster than the previous Maxwell iteration, and set for release in retail graphics cards later this year – is rumoured to be having problems when dealing with Asynchronous Compute code in video games.

“Broadly speaking, Pascal will be an improved version of Maxwell, especially about FP64 performances, but not about Asynchronous Compute performances,” according to Bits and Chips. “NVIDIA will bet on raw power, instead of Asynchronous Compute abilities.”

“This means that Pascal cards will be highly dependent on driver optimizations and games developers kindness,” Bits and Chips adds. “So, GamesWorks optimizations will play a fundamental role in company strategy. Is it for this reason that NVIDIA has made publicly available some GamesWorks codes?”

This report has not been independently verified, but if it is true, it could spell bad news for NVIDIA, especially since, despite fears to the contrary, its Maxwell architecture was capable of processing Async Compute, and AMD’s Radeon graphics cards currently leading all DirectX 12 benchmarks

We shouldn’t have to wait too long before we find out, though, with WCCFTech reporting that NVIDIA’s flagship Pascal graphics card, the “GTX 1080”, will be unveiled at GTC 2016, which takes place in Silicon Valley, California, from 4th-7th April.