Take a Look at the New Nvidia Pascal Architecture

With the reveal of the Tesla P100, Nvidia has taken the wraps off of their new Pascal architecture. Originally set to debut last year, delays with 16nm kept Pascal from being a reality, leading to Maxwell on 28nm. Now that Pascal is finally here, we are getting an architecture that combines the gaming abilities of Maxwell with much improved compute performance. The new Unified Memory and Compute Pre-Emption are the main highlights.

First off, Pascal changes the SM (Stream Multiprocessor) configuration yet again. Kepler featured 192 CUDA cores per SM, Maxwell had 128 and Pascal will now have 64. By reducing the number of CUDA cores per SM, it increases the fine grain control over compute tasks and ensure higher efficiency. Interestingly, 64 is also the same amount of cores GCN has in each CU, AMD’s equivalent to SM. The TMU to CUDA core ratio remains the same as Maxwell with 4 per SM instead of 8, in line with the drop in cores/SM.

For compute, the gains mostly come from increasing the number of FP64 or Dual Precision CUDA cores. DP is important for scientific and compute workloads though game rarely make use of them. Kepler started cutting out some FP64 units and Maxwell went even further, with virtually no FP64 even in the Tesla’s. This was one reason why Maxwell cards were so efficient and Nvidia only managed to hold onto their leadership in compute due to CUDA and their Single Precision performance.

With Pascal, the ratio of SP to DP units goes to 2:1, significantly higher than the 32:1 of Maxwell and 3:1 of Kepler. GP100 in particular has about 50% of its die space dedicated to FP32, about 25% to DP and the last 25% split between LD/ST and SFUs. This suggests that Pascal won’t be changing much in terms of gaming performance. The only gains will be from a slight increase in efficiency due to the smaller SMs and the die shrinking from 16nmFF+. GeForce variants of Pascal may have their FP64 units trimmed to cram in more FP32 resources but again, most of the gains will be due to increased density.

Lastly, Pascal brings forward unified memory to allow threads to better share information. This comes along with improved L2 cache sizes and the more than double register file sizes. P100, the first Pascal chip, also uses HBM2, with 16GB of VRAM over a 4096bit bus for a peak bandwidth of  720 GB/s. For CUDA compute tasks, a new Unified Memory model allows Pascal GPUs to utilize the entire system memory pool with global coherency. This is one way to tackle AMD’s advancement with HSA and GCN and Intel’s Xeon Phi’s.

Overall, Pascal looks to be an evolutionary update for Nvidia. Perhaps, Nvidia has reached the point that Intel has, making incremental progress. In other ways though, the reduction in SM size has great potential and provides a more flexible framework to build GPUs. Now all we are waiting for is for the chips to finally drop.

AMD Loses Corporate Fellow Phil Rogers to Nvidia

AMD has been making the corporate news lately and much of it has not been great to say the least. Today, AMD lost a member of their corporate team, Phil Rogers. Rogers has spent 21 years working with AMD/ATi so his departure is quite surprising. What must be more galling to AMD and their fans is that he left to join Nvidia to be their Chief Software Architect – Compute Server.

Rogers was one of the main driving forces behind AMD’s compute efforts. This was quite evident as he was in charge of System Architecture and Performance at AMD and managing the HSA initiative. He also was part of the HSA Foundation as a vice-president, an area which AMD has been pushing hard on with their APUs. With his departure to Nvidia, it looks like the green team may be looking for his compute experience in general than HSA as Nvidia doesn’t have a CPU heavy focus.

AMD has been bleeding quite a lot of staff lately with some other notable departures, among them Jim Keller. In line with the 5% cut in staff, AMD may have simply cut salaries too deep and Nvidia may have offered more. With Zen and Greenland largely complete, AMD will need to execute their upcoming products in order to survive. Only then will the firm likely have the resources to bring back the talent they have lost.

Thank you ComputerBase for providing us with this information 

Intel Windows HDMI Compute Stick Review

Introduction


There is a portable “USB-powered-HDMI-displayed-AIO-device” war going on since the arrival of low-cost Android boxes which in turn brought us Windows boxes of the same form factor. The next generation of these devices came and shrunk the form factor even more, bringing the full-fat Windows experience to the System on a Chip (SoC) device while maintaining the size of a large memory stick.

Amongst the-the many rebranded Chinese Android sticks to make it to Europe has been an official offering by Intel in the form of their “Compute Stick”, which serves as proof that they are aiming to claim a slice of this emerging market in both Linux and Windows flavours. Intel were kind enough to send us a Windows Compute Stick to put through its paces, and no sooner had they confirmed we would be getting one had it arrived.

Anyway enough of the backstory, let’s get the shrink wrap off and get this fired up!

Specifications
  • Name: Intel Compute Stick (Windows Variant)
  • CPU: Intel Atom Z3735F (Quad Core 1.3Ghz with 1.8GHz burst)
  • RAM: 2GB 1333MHz DDR3
  • SSD: eMMC 32GB
  • GPU:Intel Integrated Graphics (64mb)
  • LAN: None
  • WLAN: 802.11a/b/g/n Built-in Bluetooth V4.0
  • I/O: 1x USB 2.0 , 1x HDMI 1.4
  • OS: Windows 8.1 32bit
  • Dimensions: 103 x 12 x 37 mm (WxHxD)
  • Warranty: 2 Year
  • Price: £119.98

The box is very nicely packed, no bigger than you would get with a phablet sized phone. There is a small black tab which when pulled slides the inner box out from the cover

With the box slid out, the first thing you are presented with is the compute stick itself

Removing the lining reveals multiple accessories underneath

CPU-Z

GPU-Z

AMD R9 Nano Performance Indirectly Revealed

The AMD R9 Nano, a dinky little card that still packs a massive punch. It is said to pack 2x the performance per watt and 2x the performance density of AMD’s previous flagship card, the result being a 175 watt six-inch long beast of a graphics card.

We’re referring to Teraflops here when describing the “computer power” and how many this card achieves. It doesn’t always translate into gaming performance, but will give you a good idea.

How do we know the performance? Wccftech managed to apply some maths to the little amounts of data that AMD revealed to us. They used that cards TDP and performance per watt to work it out:

Since we know that performance per watt is FP32/TDP, we can go ahead and extrapolate the power efficiency of the R9 290X.
R9 290X’s peak FP32 = 5.6 TFLOPs, in other words, 5600 GFLOPs, and its TDP is 250W.
Perf/W = 5600 GFLOPs/250W = 22.4 GFLOPs/W

We also know that the R9 Nano has 2X the perf/watt of the R9 290X.
Which means it’s 2X (5.6TFLOP/250W)
= 2X 22.4 GFLOPs/W
= 44.8 GFLOPs/W.
Thus the perf/watt rating of the R9 Nano is 44.8GFLOPs/W.

Incidentally, we also have the TDP for the Nano, and that’s the last missing piece in the puzzle.

R9 Nano
Perf/watt = FP32 in GFLOPs (unknown) / TDP (175)
44.8 = FP32 (unknown) / 175
44.8 x 175 = FP32 (unknown)
44.8 x 175 = 7840 GFLOPs or 7.84 TFLOPs.
In summary, this means that the R9 Nano could pack a crazy 7.84 Teraflops in compute power. That’s more than a Titan X can give. If bitcoin mining was still financially feasible with graphics cards, this one would by far be the most popular. It will give out nearly twice the amount of gigaflops per second than the Titan X.

For a definite verdict on if this card will be good at gaming, we will have to wait for it to be released into general population. AMD’s power and performance claims will need testing by an independent organisation, that won’t happen until near the launch date.

Thank you to Wccftech for providing us with this information

The Tiny Intel Compute Stick Is Now Available

The tiny and pocket-sized Intel Compute Stick, based on an Intel atom quad-core processor, is available now. The tiny computer plugs directly into your HDMI port and the current version is running Windows 8.1 with Bing, although Linux versions are planned as well. The pocket computer can now be bought from authorized dealers in most of the world, and best of all, it doesn’t even cost that much.

The Intel Compute stick can transform any HDMI display into an entire computer capable of working with productivity apps such as office or simple image editing, but it’s also perfectly suited for light gaming and streaming content, driving basic digital signage or enabling thin clients. It supports Wireless technology for its connection and you can hook it up to your existing Wireless 802.11 b/g/n network.

You’ll also get 2GB of memory and 32GB onboard flash storage and the Compute stick’s storage can be extended by the use of microSD cards and also comes with a USB port. Wireless devices can also be connected to the stick through Bluetooth 4.0

The Windows version is available now at e-tailers such as Amazon and NewEgg and Intel is also going to release a Ubuntu version later that only comes with 1 GB memory and 8 GB of onboard storage. The Windows version will cost you $149 while the smaller Linux model is expected to cost just $110.

The Nvidia Titan Goes Pro With The Quadro K6000

Nvidia have revealed their most powerful weapon (I mean graphics card). The incredibly powerful Quadro K6000 has been shown to the world and its set to shake up the world of professional graphics in a big way. The card is based around the GK110 architecture, albeit every feature has been turned up to 11 and the card packs a mighty 2880 stream processors, a whopping 12GB of memory and its designed to be the best tool available for high performance graphics and simulation work.

“The Nvidia Quadro K6000 GPU is the highest performance, most capable GPU ever created for the professional graphics market. It will significantly change the game for animators, digital designers and engineers, enabling them to make the impossible possible,” said Ed Ellett, senior vice president of professional solutions group at Nvidia.

Specifications:

  • GK110 graphics processing unit
  • 2880 stream processors
  • 12GB GDDR5 memory
  • 4 x 4K display out-puts

Most consumer solutions struggle to push 4K in a real world environment, yet the K6000 will burn through four displays running 3840 x 2160 resolution, making it an incredible resource for creative works, so long as you can afford the setup. Of course if you have to ask how much something like this will cost, it’s likely because you’re in a job that doesn’t really need it, it will  be incredibly expensive though.

Naturally the most powerful graphics card in the world will be very expensive and rightly so, it’s the best there is! It blows the GTX Titan away with its brute force performance of having over 200 extra stream processors and Nvidia say that the GPU delivers 5x higher compute performance and nearly double the graphics capability of its predecessor the Quadro 6000, partly thanks to the 12GB of GDDR5, the worlds largest and fastest graphics memory.

The card will be available later this year from many major workstation providers and system integrators.

Thank you Xbitlabs for providing us with this information.

Image courtesy of Xbitlabs.