Nvidia Disputes Intel’s Maching Learning Performance Claims

“Few fields are moving faster right now than deep learning,” writes Buck. “Today’s neural networks are 6x deeper and more powerful than just a few years ago. There are new techniques in multi-GPU scaling that offer even faster training performance. In addition, our architecture and software have improved neural network training time by over 10x in a year by moving from Kepler to Maxwell to today’s latest Pascal-based systems, like the DGX-1 with eight Tesla P100 GPUs. So it’s understandable that newcomers to the field may not be aware of all the developments that have been taking place in both hardware and software.”

Bitfusion Labs Opens for Boosting Application Performance

Today Bitfusion announced Bitfusion Labs, a collaborative proving ground for delivering performance improvements for hardware-accelerated applications.

DDN Sets World Record STAC Performance

Today DDN announced record performance on the Securities Technology Analysis Center (STAC) benchmark. Using the company’s EXAScaler storage solution, DDN set new public records for multiple workload types and sizes, including large and small workloads as well as I/O and compute-intensive workloads.

Comparing Haswell Processors for HPC Applications

Over at the Dell Blog, Garima Kochhar has posted a performance evaluation of four Haswell processor models (Intel Xeon E5-2600 v3 Product Family) comparing them for performance and energy efficiency on HPC applications.

Benchmarking Intel Haswell vs. Xeon Phi on the Libor Finance Code

Over at the Xcelerit Blog, Jörg Lotze benchmarks Intel’s new Haswell (Xeon E5 v3 series) against the company’s flagship Xeon Phi coprocessor using a popular computational finance code. As the test application, he use a Monte-Carlo simulation used to price a portfolio of LIBOR swaptions. “The Xeon Phi accelerator wins the race clearly for double precision, reaching around 1.8x speedup vs. the Haswell CPU. However, this drops to 1.2x in single precision. The main reason is that the single precision version requires only half the memory and hence makes better use of the cache.”

SGI Benchmarks World Record Performance with Latest Intel Xeon

Today SGI announced performance world records using Intel’s newest processor, the E5-2600 v3 (Haswell).

Using Surrogate Benchmarks to Project HPC Application Performance

“Computer science and engineering performance projections of HPC applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems and help HPC users with system procurement. This lecture presents a method for projecting the performance of HPC applications using surrogate benchmarks and the application performance profile obtained on one base system.”

Why Compilers Love Messing with Benchmarks

Over at Brendan Gregg’s Blog, the senior performance architect at Netflix writes that if you want accurate and trustworthy benchmarks, you need to perform active benchmarking, as everything, including compilers, can mess with your benchmark. “If you want to compare different servers using benchmarks that you compile, you need the compilers to match, or you need to take that into consideration. This should be something you unearth by following an active benchmarking approach, where you study and understand what the benchmark really does.”