Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Why Modernize Code?

Intel_HPC_CodeM_IDZweb_HERO_375x420As microprocessor performance increased for many years based on increasing transistor density and new architectures, the expected performance of applications increased as well. Just wait for the next release of a chip from a favorite vendor, and the users would be pleased. Clock rate increases, cache capacity increases would take care of any performance issue with a favorite application. Unfortunately at the same time, power requirements increased, as did the heat generated by the CPU.

In the early 1999 time frame, an Intel Pentium II Xeon “Drake” had a frequency of 400 to 450 Mhz, and had only 1 core. The Level 2 cache was in the range of 512 Kilobytes. Over the next few years, the frequency increased to over 3 Ghz, with Level 2 caches in the 2 MB range. By the mid-2000’s, the clock frequencies stopped their march upward and peaked at about 3.67 Ghz in 2005. Application performance would have scaled somewhat with the clock frequencies, but perhaps not linearly, as other factors do affect even a single thread performance (memory access, bus speed, etc.).

When the dual core processors first appeared in the Intel Xeon product line in late 2005, almost twice the total computing power became available in a single socket. The early dual core chips had 2 cores running at about 3 Ghz.  Over the next few years, the dual core CPUs continued to show performance improvement, but through faster connections to memory, higher amounts of memory in the caches, lower power requirements and lower prices.  Thus, applications that were single thread might continue to show some performance improvement, but any significant gains would have to come from modernizing the application to take advantage of the increased number of cores on the chip.

Quad core CPUs began to appear in 2007, however the GHz rating was in the 2.5 Ghz range, and with lower L2 cache than on dual core sockets.  With increased in transistor density on the Intel Xeon processors, in mid-2015 the latest offerings from Intel contain up to 18 cores, running at up to 3.5 Ghz (in Turbo mode), with about 50 MB of varying cache levels.  To take advantage of these number of cores on a CPUs and on a single board (up to 8), applications must be modified to use the increase of raw hardware performance. A simple comparison shows that from 1999, the total cores X frequency has gone from 450 to 45000. Caches have moved in the range of 2 MB to 50 GB.  All of this does not even begin to touch the increased efficiency of the CPUs and the enhanced instruction sets.

In order to speed up applications, a developer must learn to take advantage of the multiple threads, cores and sockets found on a single server or on a cluster. Just hoping for a faster CPU anymore won’t cut it.

Transform Your Code

Deliver top application performance and reliability with Intel Parallel Studio XE: A C++ and Fortran tool suite that simplifies the development, debug, and tuning of code. Compatible with leading compilers.

Resource Links: