Nvidia's GM200 is Down on FP64 Performance

Nvidia's GM200 will reportedly feature only fractional double precision compute performance and will be unmarried precision optimized like GM204, its younger sibling residing in the GeForce GTX 980 and 970 gaming cards. About a month ago a similar written report surfaced, which claimed that NVIDIA would not be utilizing GM200 for HPC purposes. This is mainly due to its limited FP64 capability. Instead information technology volition focus on pursuing unmarried precision compute performance improvements.

This report comes via 3DCenter.com who claim to have confirmed this particular tidbit of information. According to their sources, GM200 lacks specific chip-level FP64 hardware that's necessary for maintaining adequate double precision compute throughput. As a result, they claim, GM200 will be significantly downwards on FP64 operation compared to what we used to seeing from Nvidia'south enthusiast class GPUs.

Nvidia GM200 A reduced emphasis on double precision compute (FP64) performance in a compute class carte du jour marks an anomaly in Nvidia'south strategy, historically speaking. This will mayhap be the first time ever that Nvidia will introduce a 500mm² GPU flagship that lacks proper FP64 compute capability.

What Nvidia's GM200 Weak Double Precision Performance Could Hateful For Pro Graphics

Earlier going into the ramifications of this potential decision past Nvidia we must remind you lot over again that we couldn't verify this report by 3DCenter ourselves and thus volition treat it as a rumor for the fourth dimension being. Now that nosotros got that out of the way, to fully sympathise what this development ways for consumers nosotros must empathize how FP64 compute has traditionally been added and why it's important.

You tin deduce the departure between double precision floating point (FP64) and single precision floating betoken (FP32) from the name. FP64 results are significantly more precise than FP32. This added precision in the results is crucial for scientific enquiry, professional applications and servers. And less so in video games. Even though FP64 is used in games in a very limited subset of functions, the bulk of video game and graphics code relies on FP32. Every bit such this added precision in turn requires more capable hardware which would net higher costs by increasing the size of the fleck while simultaneously increasing ability consumption.

Double precision (FP64) compute performance has always been lower than single precision (FP32) in GPUs for that reason. Normally there's a fixed ratio betwixt the pinnacle single and double precision floating point adequacy of a given GPU. This ratio varies between different GPU architectures and different GPUs within the same architecture as well. In the latest enthusiast course chip from Nvidia the ratio between FP32 and FP64 peak performance sits at 3:1. This is true for Nvidia'southward GK110 GPU which powers the Quadro K6000, Titan and GTX 780/780 Ti graphics cards among others. Although the ratio has been artificially restricted on the 780 and 780 Ti cards to 16:1.

For AMD the ratio is a more aggressive two:1 in its latest enthusiast class GPU Hawaii which powers the company's flagship FirePro W9100 and Radeon R9 290 serial products. Although the ratio is artificially restricted in the 290 series to 8:ane.

And so, since the GTX Titan Black has a acme of 5.ane TFLOPS single precision floating bespeak operation, a three:one ratio means that double precision compute goes downward to one.seven TFLOPs. And with AMD's Hawaii XT which has a peak of v.6 TFLOPs of FP32 compute operation, a two:i ratio means that it will get down to a more respectable 2.8 TFLOPs of FP64 compute performance. This advantage in FP64 compute is why AMD succeeded in capturing the top spot in the Green500 list of the earth'southward most power efficient supercomputers with it's Hawaii XT powered FirePro S9150 server graphics cards.

The FP32 to FP64 ratio in Nvidia's GM204 and GM206 Maxwell GPUs, powering the GTX 980, 970 and 960 is 32:1. Which means the GPU will exist 32 times slower when dealing with FP64 intensive operations compared to FP32. As we've discussed above this is mostly OK for video games but downright unacceptable for professional applications.
If Nvidia'south GM200 does end upward with a similarly weak double precision compute capablity the card will take very express uses in the professional marketplace. However in theory the reduction of FP64 hardware resource on the chip should make it more ability efficient in games and FP32 compute work. Even though I'thou not entirely convinced that it's a worthwhile trade off. Especially for a card that is poised to become into the next generation Qaudro flagship compute cards.

All will become crystal clear in due time. Earlier reports suggested that Nvidia will showcase its new flagship GM200 and adjacent generation GTX Titan X / Titan II between March 17-20. Yet a new announcement by the company could mean that nosotros'll get to encounter the new scrap in action much sooner.