“Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.”
“As clock speeds for CPU’s have not been increasing as compared to a decade ago, chip designers have been enhancing the performance of both CPUs, such as the Intel Xeon and the Intel Xeon Phi coprocessor by adding more cores. New designs allow for applications to perform more work in parallel, reducing the overall time to perform a simulation, for example. However, to get this increase in performance, applications must be designed or re-worked to take advantage of these new designs which can include hundreds to thousands of cores in a single computer system.”
“NWPerf is software that can measure and collect a wide range of performance data about an application or set of applications that run on a cluster. With minimal impact on performance, NWPerf can gather historical information that then can be used in a visualization package. The data collected includes the power consumption using the Intelligent Platform Management Interface (IPMI) for the Intel Xeon processor and the libmicmgmt API for the Intel Xeon Phi coprocessor. Once the data is collected, and using some data extraction mechanisms, it is possible to examine the power used across the cluster, while the application is running.”
A research team at the Ohio Supercomputer Center (OSC) is beginning the task of modernizing a computer software package that leverages large-scale, 3-D modeling to research fatigue and fracture analyses, primarily in metals. “The research is a result of OSC being selected as an Intel Parallel Computing Center. The Intel PCC program provides funding to universities, institutions and research labs to modernize key community codes used across a wide range of disciplines to run on current state-of-the-art parallel architectures. The primary focus is to modernize applications to increase parallelism and scalability through optimizations that leverage cores, caches, threads and vector capabilities of microprocessors and coprocessors.”
“While new technology will be developed that reduces the power per operation needed, in today’s environments it is important to understand how an application affects power usage. For modern applications that have been optimized to take advantage of both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, the hardware mentioned does include various power states, which can minimize the power consumption when idle.”
Today Cray announced a contract to upgrade the supercomputers at Germany’s National Meteorological Service – the Deutscher Wetterdienst (DWD). Located in Offenbach, Germany, DWD is one of the world’s premier numerical weather prediction centers. “Supercomputers are absolutely vital to our mission of providing important meteorological services for the protection of life and property,” said. Dr. Jochen Dibbern, Member of the Executive Board at DWD. “Our Cray supercomputers are critical tools for our researchers and scientists, and it’s imperative that we equip our users with highly advanced supercomputing technologies.”
Funded by the European Commission in 2011, the DEEP project was the brainchild of scientists and researchers at the Jülich Supercomputing Centre (JSC) in Germany. The basic idea is to overcome the limitations of standard HPC systems by building a new type of heterogeneous architecture. One that could dynamically divide less parallel and highly parallel parts of a workload between a general-purpose Cluster and a Booster—an autonomous cluster with Intel® Xeon Phi™ processors designed to dramatically improve performance of highly parallel code.
“It is important to be able to express algorithms and then the coding in an architecture independent manner to gain maximum portability. Vectorization, using the available CPUs and coprocessors such as the Intel Xeon Phi coprocessor, are critical for HPC applications where performance is of the highest importance. However, since architectures change over time and become more powerful, using libraries that can adjust to the new architectures is quite important.”
Threading plus vectorization together can increase the performance of an application more than one technique or the other. Threading and vectorizing an application are two techniques that are known to increase the performance of an application using modern CPUs and coprocessors. However, a deep understanding of the application is needed in order to make the decisions needed and to rewrite portions of the application to take advantage of these techniques. In cases where the developer might not be familiar with the code an automated tools such as the Intel Vectorization Advisor can assist the developer.
“Vector instruction sets have progressed over time, and it important to use the most appropriate vector instruction set when running on specific hardware. The OpenMP SIMD directive allows the developer to explicitly tell the compiler to vectorize a loop. In this case, human intervention will override the compilers sense of dependencies, but that is OK if the developer knows their application well.”