Take a Look at the New Nvidia Pascal Architecture

With the reveal of the Tesla P100, Nvidia has taken the wraps off of their new Pascal architecture. Originally set to debut last year, delays with 16nm kept Pascal from being a reality, leading to Maxwell on 28nm. Now that Pascal is finally here, we are getting an architecture that combines the gaming abilities of Maxwell with much improved compute performance. The new Unified Memory and Compute Pre-Emption are the main highlights.

First off, Pascal changes the SM (Stream Multiprocessor) configuration yet again. Kepler featured 192 CUDA cores per SM, Maxwell had 128 and Pascal will now have 64. By reducing the number of CUDA cores per SM, it increases the fine grain control over compute tasks and ensure higher efficiency. Interestingly, 64 is also the same amount of cores GCN has in each CU, AMD’s equivalent to SM. The TMU to CUDA core ratio remains the same as Maxwell with 4 per SM instead of 8, in line with the drop in cores/SM.

For compute, the gains mostly come from increasing the number of FP64 or Dual Precision CUDA cores. DP is important for scientific and compute workloads though game rarely make use of them. Kepler started cutting out some FP64 units and Maxwell went even further, with virtually no FP64 even in the Tesla’s. This was one reason why Maxwell cards were so efficient and Nvidia only managed to hold onto their leadership in compute due to CUDA and their Single Precision performance.

With Pascal, the ratio of SP to DP units goes to 2:1, significantly higher than the 32:1 of Maxwell and 3:1 of Kepler. GP100 in particular has about 50% of its die space dedicated to FP32, about 25% to DP and the last 25% split between LD/ST and SFUs. This suggests that Pascal won’t be changing much in terms of gaming performance. The only gains will be from a slight increase in efficiency due to the smaller SMs and the die shrinking from 16nmFF+. GeForce variants of Pascal may have their FP64 units trimmed to cram in more FP32 resources but again, most of the gains will be due to increased density.

Lastly, Pascal brings forward unified memory to allow threads to better share information. This comes along with improved L2 cache sizes and the more than double register file sizes. P100, the first Pascal chip, also uses HBM2, with 16GB of VRAM over a 4096bit bus for a peak bandwidth of  720 GB/s. For CUDA compute tasks, a new Unified Memory model allows Pascal GPUs to utilize the entire system memory pool with global coherency. This is one way to tackle AMD’s advancement with HSA and GCN and Intel’s Xeon Phi’s.

Overall, Pascal looks to be an evolutionary update for Nvidia. Perhaps, Nvidia has reached the point that Intel has, making incremental progress. In other ways though, the reduction in SM size has great potential and provides a more flexible framework to build GPUs. Now all we are waiting for is for the chips to finally drop.

AMD Snags Nvidia Supercomputer Client With FirePro & GPUOpen

When AMD revealed their GPUOpen and Boltzmann Initiative, it seemed like a last-ditch effort to get back some GPU computing and enterprise marketshare. Nvidia had struck first with CUDA and because of its success, many GPU computing platforms have largely been based on CUDA, with OpenCL lagging behind. With the Boltzmann Initiative AMD hoped to steal some Nvidia CUDA customers and they’ve just snagged a big one with their FirePro GPUs.

In partnerships with AMD, CGG, a geoscience company working on gas, oil and resource mining simulations switched over from a full CUDA platform to one entirely based on OpenCL and compatible with AMD. What’s more, they dropped all of their existing Nvidia GPUs and upgraded to FirePro S9150’s based on Hawaii (290X/390X). With the recent changes in graphics architecture, AMD has led in terms of computing with GCN especially shining in this regard.

With one major conversion under their belt, AMD now has a great example to show off to other potential customers. With HIP, up to 90% of CUDA code can be ported without any extra developer input and it looks like the befits of moving to AMD’s platform are worth the 10% of work. Hopefully, AMD will continue to score more enterprise wins, a market which they haven’t done as a well in for a while, and it is incredibly lucrative.

Nvidia May Launch Confusing GT 930 in Early 2016

For the longest time, both AMD and Nvidia have taken to rebranding their low-end cards in order to present something “new” at low cost. While rebranding has become the norm, Nvidia’s GT 930 may be setting a new standard when it comes to that. Set to launch in Q1 2016, the GeForce GT 930 will reportedly come in 3 widely different flavours spanning 3 generations over 6 years in total.

From what we know right now, the 930 will use either Fermi, Kepler or Maxwell based chips. These will also be paired with either GDDR5 or DDR3 VRAM, accessed over either a 64bit or 128bit interface, meaning a lot of variation in performance. Due to the different chips used, the features offered and power consumption characteristics will vary widely as well.

The oldest chip is the Fermi one, the GF108 released back in 2010 with 96 CUDA cores. Slightly newer is the Kepler-based GK208 which was released in 2013 and features 384 CUDA cores. Finally, there is the new chip which is the Maxwell-based GM108 featuring 384 CUDA cores, offering the most features and performance. With such great variation, it won’t be surprising if consumer end up being confused and won’t be sure which GT 930 they are getting till they start gaming.

AMD Boltzmann Initiative Brings C++ & CUDA Compiler/Compatibility

While AMD has long-held decent marketshare in the consumer space, the high performance compute segment has long alluded the graphics firm. With the announcement of the so-called Boltzmann Initiative today though, AMD is planning to turn a leaf on their troubles and plow ahead with a better than ever graphics compute toolkit.

Frist off are a new set of HSA extensions dubbed, HSA+. While not part of the main HSA initiatives, the extensions will allow better use of discrete graphics in a HSA environment. Combined with a new headless Linux driver, this should help spur performance with a unified memory space, reduced latency and better system controls.

The next addition is the new Heterogeneous Compute Compiler or HCC. While AMD has long supported the open source OpenCL while Nvidia runs their own CUDA, HCC is a change in direction away from OpenCL. While AMD continues to in invested in and working on OpenCL, HCC will offer more competitive features compared to UCDA, until one day OpenCL support and features catches up. HCC is also C++ based which will allow for easy transition from CUDA.

Lat of all is Heterogeneous-compute Interface for Portability or HIP. Even with HCC being familiar and similar to CUDA, AMD recognizes that getting developers to switch is no easy task. With HIP, AMD is providing a compatibility layer for 90% of CUDA code, it means developers will only need to do fresh work on a small portion of the code in order to run on AMD hardware. One can only hope that there will be some trickle down effect on the consumer software market as well where CUDA has also been dominant.

In a market where CUDA has long dominated, AMD is finally making the moves that will make them competitive. If AMD is able to keep their Boltzmann Initiative up to date and provide support, there is probably a lot of room to grow int he HPC segment given their poor marketshare today. With a more lucrative HPC business in their back pocket, AMD will also be in a better position to invest in their consumer lineup.

Colorful Announces Their iGame GeForce GTX 960 KUDAN Mini-ITX Graphics Card

Colorful has revealed their iGame GeForce GTX 960 KUDAN Mini-ITX graphics solution, featuring a custom PCB design and factory overclock frequencies.

The GeForce GTX 960 KUDAN is said to be one of the very few Mini-ITX designs with a GM206 chip, making it very popular in the $200 range market. The graphics solution is said to have a low-wattage percentage and provide great 1080p performance.

In terms of specs, the graphics card comes with 1024 CUDA cores, 64 texture mapping units, 32 raster operation units and a 2 GB GDDR5 VRAM with a 128-bit memory interface. While the memory is clocked at 7010 MHz, the GPU clock comes in two blocks.

The factory overclocked specs that come with the card are set at 1127 MHz base and 1178 MHz boost, but a second OC BIOS is said to be available that can ramp up the card to 1152 MHz and 1216 MHz respectively at the push of the button.

Looking at the aesthetics of the card, we see that it features the same design as the iGame GeForce GTX 980 KUDAN, only a bit smaller. It boasts only one fan, but given its low-budget design, it is enough to keep the Mini-ITX card cool.

Power is said to be provided via a 6-pin connector, having the card coming with a 120W TDP. There is also a single SLI connector present on the card and comes with a single DVI, HDMI and three DisplayPort connectors.

Colorful has already listed the card on its website, but details regarding its availability and pricing have yet to be made available.

Thank you WCCF for providing us with this information

Impressive Tegra K1 Demo Puts Last-Gen Consoles to Shame

Mobile gaming is coming on in a big way recently and while I still find it a royal pain in the butt to try and play most games via touch screen, especially games like Grand Theft Auto, there have been many improvements. One of the biggest improvement for me is with devices such as the Nvidia Shield, designed to play Android games, but without the draw backs of a touch screen, and there are lots more devices like this cropping up each year.

On a technical side of things, the graphics hardware that is being installed into our smartphones, tablets and other mobile device is becoming incredibly powerful, allowing for some very pretty 3D graphics, high frame rates, UHD resolutions and a whole lot more. Nvidia are certainly making some big advances and as you can see in the video below, their TK1 graphic chip is able to push graphics that put most console games to shame.

The demo uses the same rendering pipeline as the full desktop edition of the engine which will be deployed on PC, Xbox One, and Ps4. Using Epics Unreal Engine 4 and some assets from the famous Samaritan Demo to create “Rivalry” a stunning rendering that uses the K1’s 192 CUDA cores with a speed of around 950 MHz.

[youtube width=”800″ height=”450″]http://youtu.be/X-tAZtbDZ8E[/youtube]

Thank you Eurogamer for providing us with this information.

Nvidia Grid – Is It The Future Of High Performance Computing?

When the eTeknix team visited CES 2014, there was one word that we heard more than any other, one big focus that it seems many big tech companies – especially gaming tech companies – are trying to promote, “cloud”. Now I’m sure many of you will agree that saying something is in the cloud is just a bit of clever marketing.By any definition, the whole internet is in the cloud and we’ve been using its features for a very long time. The concept of remotely accessing powerful computers via a virtual data centre from say… your home, is nothing strange either, and it’s the very basis of the technology that is used to run everything from Facebook to YouTube. On the professional side, we have industries such as science, oil & gas and construction, to name a few, that require high performance computing to drastically reduce the time required to perform large complex calculations or to generate 3D computational models.

Gaming could offer one of the biggest changes in the industry of cloud computing, at least as far as your average consumer is concerned, but could cloud computing give us all access to superior gaming by offloading the processing to the cloud? As well as offloading rendering and other graphics intensive applications for businesses? Nvidia certainly think so.

“Streaming video and music to TVs, PCs and tablets using cloud services like Netflix, YouTube, Pandora and Spotify has become the predominant way to enjoy content for connected devices. The convenience of large cloud-managed libraries of content with stream-anywhere capability is impossible to resist. Now with revolutionary NVIDIA GRID cloud gaming technology, you’ll soon be able to stream video games from the web just like any other streaming media. GRID renders 3D games in cloud servers, encodes each frame instantly and streams the result to any device with a wired or wireless broadband connection.” – say Nvidia on their Grid website.

Streaming video to various devices is pretty commonplace these days, so why aren’t we doing the same with games? Services, such as NetFlix and LoveFilm, are household names offering a subscription based model that allows you to stream  virtually unlimited amount of video content directly to your TV, notebook, tablet, smartphone or desktop PC, over a cellular or WiFi network,from pretty much anywhere in the world. Yet when it comes to high-end PC gaming, or any format of gaming for that matter, processing is done on the hardware we want to play on, often requiring powerful graphics processors. With NVIDIA’s GRID gaming, its hoping to build what is known as on-demand Gaming as a Service, also known as GaaS.

GaaS aims to offload almost all of the processing to the cloud, allowing any-device gaming. That means high-quality, low-latency, multi device gaming for any PC, Mac, tablet, smartphone, TV or similar smart device. Pretty much anything with a screen, some form of input and an internet connection, and it wants to make picking a game and playing no different than searching for a video on YouTube and clicking play. That means the end user will typically already have the hardware required to play, given that most of us at least have a PC, notebook or smartphone. It gets rid of any difficult setup issues, you don’t need a disc, you don’t need to wait for the game to download – you just play.

The concept sounds too good to be true, and in reality it already exists and has done for some time. Unfortunately, the solution so far hasn’t been perfect and companies like Gaikai and Onlive took their shot at cloud based gaming around two years ago. OnLine fell into obscurity and Gaikai got snapped up by Sony, who is currently planning to re-launch the service to stream PlayStation titles. One of the biggest factors for this kind of service is latency, and as any serious gamer will tell you, latency is a big deal. The delay from your button press to the action happening on screen can have a big impact on how well a game plays and your overall experience, something of which is especially true in competitive online multi-player titles, where every millisecond can mean the difference between life and death.

It is this issue with latency that NVIDIA has addressed more than anything else. As a company, its no stranger to powerful graphics technology, so that part is relatively easy. Yet getting those graphics streamed to the end user in the blink of an eye is far from simple.

Testing for input lag – the time it takes for a signal from the controller to cause an on screen response – gets unplayable at around 200 ms, or at least it becomes noticeable to some extent. Most games, especially on consoles such as the Xbox and PlayStation, exhibit around 133 ms average response time for most games, but faster paced shooting games and rhythm games are often optimised to respond in 60-70 ms as it’s vital to get that headshot, or music note as close to when you see a visual cue on screen as possible, and that is before you factor in any lag introduced by your individual display or network by playing online.

As you can see from the graph above, NVIDIA isn’t mucking about with its latency times. The joys of using scalable hardware at the server side means that you can reap the benefits of a significantly more powerful system for rendering the graphics. Combined with powerful video streaming and compression hardware, it means that when the end user pushes a button on their controller, there is less delay in getting the visual output back to the users system. Of course NVIDIA isn’t leasing full servers to each gamer, instead it’ll be providing its GRID Software Development Kit (SDK) which will allow service and middleware providers a format to push content out via the cloud.

For each server, it is now possible to have up to 48 HD quality game streams thanks to NVIDIA’s new on-chip video encoder technology, which is the same technology that is used to cut latency by doing it within the graphics hardware, not by offloading it to a separate rendering engine. It does this by capturing the output of multiple rendered games for the entire operating system desktop instantly, using the Nvidia Fast Capture API. This API sends the images directly to the GPUs’ build-in H.264 encoder, cutting latency over a standard graphics card by a staggering 30 ms.

Of course all this hardware is useless without software, and NVIDIA says it’s been working closely with developers for over a decade to ensure that hundreds of titles will be (and are) ready for its GRID service. Unlike the launch of a new console format, GRID is effectively a collection of very powerful PC components, so its capable of running just about any software you wish to throw at it. Anyone who is familiar with remote desktop access will know this. For those who don’t fully understand the concept yet, you plug your keyboard, mouse or controller into your computer as normal and when you hit a button, the signal transmits to NVIDIA’s GRID servers where it is processed and then the output is streamed back to you as a video – effectively like having your computer a few hundred miles away from your input devices and monitor.

The underlying hardware is the stuff of super computers – custom build hardware to render games in such a high fidelity that would make most end user consumer graphics cards blush, and to do it faster and more efficiently than virtually anything else on the market. Just take a look at the specifications below of some of the server hardware that NVIDIA has to  offer through both middleware partners (companies who work with NVIDIA to provide services) and OEM hardware partners (companies who will sell you the hardware).

It certainly sounds like a promising deal and it’s one that’s already been put into action by a few companies with more signing up, or at least working on a project of their own using the hardware. Playcasts and Bouygues Telecom already offer cloud gaming services in an “A la carte” rental service. You can stream games like Assassin’s Creed Revelations, Mafia II, Street Fighter Vs. Tekken and much more directly to your TV. They charge around 12.90 Euro a month for their premium package, which is obviously a lot cheaper than buying a single gaming title at retail. The only downside is that you don’t really own the games – you are effectively renting and streaming just like you do on NetFlix. Stop paying the subscription and the games are no longer available to play, not to mention there is obviously no option to play off-line with this kind of service.

What I do like to see is that some streaming services such as Bouygues allow you to rent the games individually. For a few Euros you can unlock the game for streaming for a period of 48 hours, so there’s no need to go to your nearest rental store to pick up the title, no need to use a mailout service, you just unlock and play immediately, reaping the rewards of maxed out PC graphics and low latency.

There is just one major, and pretty obvious issue with the whole thing; cloud gaming requires a rock solid internet connection and it needs to be on at all times while gaming, or else your game stream will simply be gone. Of course there will be error correcting systems, as well as safe guards in place that pause your game should you drop a connection and resume when you’re back, but it’s one problem we don’t have to deal with when gaming on a local system.

Here in the UK we are finally getting on board with fiber broadband speeds, with much of the country still lagging behind, literally. With NVIDIA already running betas for its gaming service, users are advised that they’ll need a minimum of 10 Mbps download speed, and a ping time of no more than 40 ms for optimum experience. Anything less and your 1080p stream may not look so great and the input lag could become a problem for fast paced games. The first beta users to test this service are doing so using the NVIDIA SHIELD handheld gaming device in Northern California, as NVIDIA is currently testing out the service using its San Jose servers. These users also need to have a GameStream-ready 5GHz WiFi router, which will obviously help with lag and stream quality to the device. When your gaming on the cloud, every millisecond matters.

It’s also worth pointing out that remote gaming is no stranger to NVIDIA SHIELD owners, as the hardware already allows you to use NVIDIA GameStream to render your favourite game and stream it back to the handheld device using a GeForce-powered PC – much in the same way that GRID remotely renders its content.

The whole thing sounds pretty robust and it’s clear that NVIDIA has invested heavily in the product. However, the game streaming aspect is a nice sideline for NVIDIA given that it’s a graphics company first and foremost, but its GRID products go way beyond streaming the latest gaming titles to home users and their smart devices. Smaller installations, like GRDI VCA, can be purchased for businesses that run Quadro K5000-class graphics, allowing streaming to up to eight users. This can be used for anything from film production to graphics design- anything that typically relies on GPU rendering, while also reducing the cost of installing and maintaining eight individual high-end workstations capable of doing the same job.

So is GRID the future of computing? Absolutely. However, it remains to be seen how successful it will be with consumers and many parts of the world still need to catch up in terms of internet speeds to really reap the rewards. We are confident that the business world will love the streaming technology for rendering and other tasks, but there is still a lot of work for NVIDIA to do until its gaming services are available on a larger scale.

GeForce GTX 780 Ti Specifications Have Been Leaked

NVIDIA introduced a new card called the GeForce GTX 780 Ti during last week’s announcements, but at the time we were left with little more than a mid-November release time frame. It seems that now we may have a peak inside the new card.

Some forum members at Chip Hell have leaked an image that may show specs for the Titanium 780 variant. If accurate, they’re really amazing and according to the slide, the 780 Ti is equipped with a GK110 Kepler GPU with 2,496 CUDA cores, 208 texture units, 48 ROPs, and 3GB of GDDR5 memory clocked at 6008MHz. This all gives it a base core clock speed ticking in around 902MHz and 954MHz when boosted.

Given the above specs, the card should run at more frames per second, have better shaders, and bigger explosions from a lower-end card. In terms of price, there are also unconfirmed rumors that the GTX 780 Ti will end up selling around the $650 / £401 / $641 price range. This means that it puts the GeForce 780 Ti above the rumored price of AMD’s Radeon R9 290X, which is said to be priced around the $549 / £339 / $541 range.

Thank you Tech Radar for providing us with this information.
Images courtesy of Tech Radar

Gigabyte GeForce GTX 770 OC WindForce 3x 2GB Graphics Card Review

With the release of the GTX 770 and the recent launch of the GTX 780, NVIDIA made on fundamental change to their cards over the GTX 690 and Titan. This change is one that may seem simple, but it is one that has a major role in the graphics market and for each of NVIDIA’s partners. This change is the grant to change the PCB layout and most importantly the cooling on the cards. When the GTX Titan was released, NVIDIA put a halt on any non reference designs and in effect the only alteration partners could make was to put a sticker with their name on the card.

The is not the case with the 700 series however, whilst the GTX 770 and GTX 780 used the exact same cooler as seen on Titan, manufacturers are now allowed to deviate from the reference design and put their own mark on their cards to set them apart from the competition. In Gigabyte’s case, the cooler of choice is WindForce and for a number of years now this has been at the forefront of their marketing campaigns.

With the release of the 770, Gigabyte are keen to show off the latest revision of their well known cooler, which now features a metal housing rather than the older plastic design. On top of this, Gigabyte have given the GK104 core the overclock treatment to take the 770 to the next level and let it stretch its legs a bit more.

When it comes to looking at this card, there is little more than the card in a box to look at as this is a review sample, meaning that Gigabyte have omitted the usual gubbins and accessories that we would normally see as part of a graphics card bundle.

NVIDIA GTX 770 2GB Graphics Card Review

Last week we saw the release of NVIDIA’s latest graphics range – namely the 700 series and its top model, the GTX 780. In many respects the GTX 780 brings a whole new level of performance to a greater audience and as I showed, there is only a small difference between the 780 and Titan on a single screen.

Working through the new 700 series line-up, NVIDIA are now lifting the lid on their next card, the GTX 770. Like the GTX 780, the GTX 770 has had many rumours surrounding its release and like the 780, these are all related to specifications, performance and most of all the GK104 core and a GTX 680. Like the GTX 780 I first of all want to put one of these rumours to rest and state the reason why. The one that I am referring to is the speculation that GTX680 owners would be able to turn their card into a GTX 770 through a BIOS update. Simply put this CANNOT be done. Whilst both cards share the same GK104 GPU core, there are a number of factors that lead to this impossibility. Like the 780 to Titan comparison, the GTX 770 has a slightly different revision of the GK104 core with varying number s of CUDA cores and texture units, however the most significant factor for the inability to ‘convert’ the GTX680 lies with the on-board memory.

One of NVIDIA’s major shouting points with the GTX770 is the inclusion of memory that runs at a whopping 7Gbps at stock, these are no overclocked ICs either, they are entirely new, so unless you have the ability to unsolder and resolder the ICs on to a GTX 680 as well as change the PCB layout slightly, there is no possibility of changing your card from one to the other.

Nvidia GeForce GTX 780 3GB Graphics Card Review

It’s that time of year again where NVIDIA have a new series of cards in the pipelines and as we have seen running up to today, the number of rumours and leaks that have been flying about are as profound as ever. For some this leads to pure confusion as to what is to be seen and what is complete rubbish, and for people like myself it leads to pure frustration as I know all the true facts and figures, meaning that when I see the rumours and false facts floating around I can do nothing but sit and wait until the NDA lifts to put a number of these claims to rest with the real specifications and performance figures behind the new cards.

So here we have it, the GTX 780 – the first in the new line of Kepler based 700 series cards and before we get too far into the nitty gritty of what’s new in the 700 series, I want to make the following fact clear and true – the GTX 780 CANNOT be flashed in any way to effectively turn it into Titan. There are a number of reasons for this; first off, whilst both cards share the same GK110 core, the 780 has far less CUDA cores, is a different revision of the core chip and has less texture units on-board. On top of this, there is also half the amount of video memory and a number of components in the power region of the PCB are missing as the 780 does not require these as opposed to Titan.

Point out of the way, NVIDIA’s new 700 series cards are here to replace the ever popular 600 series, although they are not a re-hash and re-brand of 6xx cards as some may presume. Whilst the GK110 cores may be featured on both 600 and 700 series cards, they will have subtle variances to them, mainly on the front of CUDA core count and texture filters and so forth.

So what is the 780 in relation to the 600 series cards. Whilst it may look like Titan, it is a slightly lower performing card. Titan is more geared towards users with multiple high resolution displays and thus the higher 6GB of GDDR5 memory that it encompasses. The 780 whilst still home to 3GB of GDDR5 is more aimed at users who are going to be gaming on a single screen at high resolutions with all the settings turned to 11. Over its predecessor, the GTX 680, the 780 has 50% more CUDA cores with a count of 2034, 50% more memory, up to 3GB from 2GB and overall a 34% increase in performance. Interestingly enough, GTX 580 users who upgrade to a 780 will see a whopping 70% gain in performance between the two cards and a 25-30% gain can also be found over AMD’s 7970.