Graphics chip and PC and server processor wannabe Nvidia is lifting the skirt a bit on its next-generation “Kepler” graphics processing units today as it starts talking about the feeds and speeds of its new GeForce graphics cards for desktop and notebook PCs.
As Nvidia co-founder and CEO Jen-Hsun Huang explained when he outed the roadmap for the Kepler GPUs (originally slated for late 2011) and the “Maxwell” follow-ons due in 2013, Nvidia is focused like a laser on performance per watt, not just performance, for its GPU chips. This is because heat, more than any other factor, is the gating issue deciding where GPUs can be adopted and where they cannot.
The promise that Huang made back in September 2013 was that by shifting to a new design and moving to a 28 nanometer wafer-baking process at foundry partner Taiwan Semiconductor Manufacturing Corp, Nvidia could deliver somewhere on the order of three to four times the double-precision floating point operations per watt of the current “Fermi” GPUs, which are used in GeForce graphics cards for PCs, Quadro GPUs for workstations, and Tesla server coprocessors alike. And the shift to Maxwell in 2013 is supposed to deliver 16 times more double-precision flops per watt as the Fermis.That’s a pretty tall order, and one that Nvidia has not had an easy time filling, with TSMC’s ramp on 28 nanometer processes being steeper than expected. But today, with the unveiling of the GeForce GTX 680 for PCs and the GT 640M for mobile PCs, Nvidia is trying to prove to potential OEM customers that build PCs and notebooks – as well as the end users who will buy them – that it has the speed they crave.
The data is a bit thin as El Reg goes to press, but here’s what Sumit Gupta, senior product manager of the Tesla line at Nvidia, told us ahead of the skirt-raising today. As is the case with CPU manufacturers, Nvidia is scaling back the clock speed on the cores in the Kepler GPU while jacking up the number of cores to get more performance and even more performance per watt. Performance scales more or less linearly (okay, less) with the number of cores on a CPU or GPU, but power consumption and heat dissipation go up exponentially with clock speed. So a small reduction in clock speed can mean a lot, and then you can use a process shrink, like Nvidia’s move from 40 nanometer to 28 nanometer processes, to cram more cores onto the die and thereby boost the performance per watt and the raw performance, too.
To keep things straight between the PCs and the servers, El Reg had Gupta dub the one used in GeForce PC GPUs “Kepler1” because it will have a different design from the one used in Telsa server coprocessors at the heart of a number of very large and powerful supercomputers later this year. We’ll call that one “Kepler2”, which will have a heavy dose of double-precision floating point processing as well as more memory, ECC scrubbing on the memory, different packaging aimed at servers, and a higher price tag.
The Kepler1 GPU used in the GeForce GTX 680 graphics card will have 1,536 CUDA cores, which will run at 1006MHz and will have a turbo boost speed of 1058MHz. This card has 2GB of GDDR5 graphics memory with a 256-bit path to memory running at 6Gb/sec. The card will have two 6-pin power connectors and will have two DVI ports and one HDMI port, and most significantly, will slide into PCI-Express 3.0 peripheral slots coming with the “Ivy Bridge” family of Core processors from Intel.
With the Fermi designs, the GPU had 512 cores, with 64KB of L1 cache per core added for the first time to the CUDA cores and a 768KB L2 cache shared across a group of 32 cores known as streaming multiprocessors, or SMs for short. The Fermi had 16 of these SMs and either 3GB or 6GB of GDDR5 memory. The initial Fermis only shipped with 448 cores activated in the top-end models, due to the typical yield issues that all chip makers face. The Fermis weighed in at between 225 watts and 250 watts in a discrete graphics card and Tesla coprocessor, and originally ran at 1.15GHz and were boosted to 1.3GHz.
The new Kepler GPU puts 192 cores into a “streaming multiprocessor extreme” with a slightly modified CUDA core, according to Gupta. Eight of these SMX units are on the GPU for a total of 1,536 cores. For whatever reason, Nvidia is not releasing any single-precision or double-precision floating point performance figures yet on the Kepler GPUs, but says that the new SMX module offers twice the performance per watt of the prior Fermi SM unit, and because a card only burns 195 watts, it offers much better performance per watt.
For the gamers out there, it might take three of the GeForce GTX 580 graphics cards, which burned 732 watts, to play the Samaratin video game. But now, Nvidia is claiming that you can get the same performance with only one GeForce GTX 680 video card, and this will only burn 195 watts. No word on what the pricing will be, but the GTX 680 will almost certainly cost more than a single GTX 580 – particularly with the 28 nanometer wafers coming out of TSMC being in short supply.
The main thing as far as Nvidia is concerned is that the GTX 680 offers anywhere from 1.2 to 1.6 times the performance of rival Advanced Micro Devices’ HD7970 graphics card.
On the notebook front, Nvidia is talking a little bit about the GeForce GT 640M mobile GPU, and is bashing Intel’s integrated HD3000 graphics card because it can’t do better than 20 frames per second playing all the popular high-res games out there – making ultrabooks not so ultra. But you can get more than 30 frames per second with the GT 640M, says Gupta, which is twice as power-efficient as the GT 580M it replaces.
The upshot is that if you hold the performance of the notebook steady on a composite of commercial and game benchmark tests, a notebook from early 2010 with e GeForce GTX 285M card weighed in at 12 pounds, was 60mm thick, and had two hours of battery life. By this time last year, you could get a notebook with a GTX 460M and it weighed 9 pounds, was 50mm thick, and had three hours of battery life running the benchmarks. With this year’s ultrabooks – in this case, an Acer Timeline Ultra M3 – it weighs 5 pounds, is 20mm thick and has 8 hours of battery life running the composite benchmarks.
And yes, the Kepler1 GPUs can play Crysis 2…
So that leaves us with the Kepler2 GPUs. Gupta says that these are still on track to ship in Tesla GPU coprocessors to Oak Ridge National Laboratory for its “Titan” supercomputer and to the University of Illinois for its “Blue Waters” big bad box in the third quarter. Volume shipments of the server coprocessors bearing the Kepler2 GPUs will start in the fourth quarter of this year. These Kepler2 GPUs will have three times the performance per watt of the top-end Fermi coprocessors today.
“With Tesla, everything is larger and more,” says Gupta. But he declined to give any specific details. ®