For decades, Intel has been enabling insight and discovery through its technologies and contributions to parallel computing and High Performance Computing (HPC). Central to the company’s most recent work in HPC is a new design philosophy for clusters and supercomputers called Intel® Scalable System Framework (Intel® SSF), an approach designed to enable sustained, balanced performance as the community pushes towards the Exascale era.
Intel® Cilk™ Plus is an extension to C and C++ that offers a quick and easy way to harness the power of both multicore and vector processing. The three Intel Cilk Plus keywords provide a simple yet surprisingly powerful model for parallel programming, while runtime and template libraries offer a well-tuned environment for building parallel applications.
“Tasks keep the CPUs busy. When a core is working, rather than waiting for work to be sent to it, the application progresses towards it conclusion. A caveat to all of this is to remember that tasking and threading models remain on the system it was created on. Tasks that use a shared memory space only work within the shared memory segment that the processing cores can get to. Shared memory on the CPU side of the system is separate from the shared memory on the coprocessor. The threads created will remain on the part of the system where it started.”
Sandia National Laboratories has already seen the benefits from a major Asetek liquid cooled HPC system that has been in use for over twelve months. The 600 teraflop Sky Bridge Supercomputer with 1,848 nodes was installed using Asetek D2C in a Cray CS300-LC supercomputer cluster. With RackCDU D2C, air heat-load was cut by more than 70%, making mechanical upgrade of data center cooling unnecessary and allowing more investment in compute.
The HPC industry is ever facing more and more challenges on various topics and especially a significant increase in cooling requirements. To meet those requirements, liquid cooling looks like the solution. But there is an alternative cooling solution that works without a pump and without water.
For Universities and Colleges that have a traditional infrastructure, adding new programs and applications is a huge endeavor. The IT staff needs to determine if all of the hardware meets the installation requirements and how to deploy these new programs on different models of desktops and notebooks. With a VDI environment that utilizes simple boot-up devices that connect to virtual desktops on the school’s server, the IT staff doesn’t have to worry about the age and capability of each individual PC when installing new software.
New HPC products and technologies. Compelling demos. Insights from top Intel HPC architects. More than 60 presentations from Intel and industry experts. Additional details about Intel® Scalable System Framework. Intel will have something for everyone at this year’s International Supercomputing Conference in Frankfurt, Germany.
For maximum performance, data needs to flow into and out of the vectorization units. There are a few things to remember regarding laying out the data to gain high performance. These include, data layout, alignment, prefetching, and store operations. “Prefetching is also extremely important in HPC applications that use coprocessors. If the vectors are aligned, then the data can be streamed to the math units very efficiently, with data being prefetched, rather than the system having to load registers from various memory storage.”
The process to vectorize application code is very important and can result in major performance improvements when coupled with vector hardware. In many cases, incremental work can mean a large payoff in terms of performance. “When applications that have successfully been implemented on supercomputers or have made use of SIMD instructions such as SSE or AVX are excellent candidates for a methodology to take advantage of modern vector capabilities in servers today.”
The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”