Even though it’s a new generation fabric, Intel OPA is still backwards compatible with the many applications in the HPC community that were written using the OpenFabrics Alliance* software stack for InfiniBand. So, existing InfiniBand users will be able to run their codes that are based on the OpenFabrics Enterprise Distribution (OFED) software on Intel OPA. Additionally, Intel has open sourced the key software elements of their fabric to allow integration of Intel OPA into the OFED stack, which several Linux* distributions include in their packages.
Threading plus vectorization together can increase the performance of an application more than one technique or the other. Threading and vectorizing an application are two techniques that are known to increase the performance of an application using modern CPUs and coprocessors. However, a deep understanding of the application is needed in order to make the decisions needed and to rewrite portions of the application to take advantage of these techniques. In cases where the developer might not be familiar with the code an automated tools such as the Intel Vectorization Advisor can assist the developer.
“Vector instruction sets have progressed over time, and it important to use the most appropriate vector instruction set when running on specific hardware. The OpenMP SIMD directive allows the developer to explicitly tell the compiler to vectorize a loop. In this case, human intervention will override the compilers sense of dependencies, but that is OK if the developer knows their application well.”
In this special guest feature, John Kirkley writes that Argonne is already building code for their future Theta and Aurora supercomputers based on Intel Knights Landing. “One of the ALCF’s primary tasks is to help prepare key applications for two advanced supercomputers. One is the 8.5-petaflops Theta system based on the upcoming Intel® Xeon Phi™ processor, code-named Knights Landing (KNL) and due for deployment this year. The other is a larger 180-petaflops Aurora supercomputer scheduled for 2018 using Intel Xeon Phi processors, code-named Knights Hill. A key goal is to solidify libraries and other essential elements, such as compilers and debuggers that support the systems’ current and future production applications.”
“An interesting aspect to prefetching is the distance ahead of the data that is being used to prefetch more data. This is a critical parameter for success and can be defined as how many iterations ahead to issue a prefetch instruction, and can be referred to as the distance. A compiler will automatically determine the distance to prefetch, and can be determined by looking at the compiler optimization reports.”
“In this talk, Intersect360 Research returns with an annual deep dive into the trends, technologies and usage models that will be propelling the HPC community through 2016 and beyond. Emerging areas of focus and opportunities to expand will be explored along with insightful observations needed to support measurably positive decision making within your operations.”
“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”
Registration is now open for the Advanced Hands-On OpenCL Tutorial at the IWOCL 2016 conferernce. The tutorial focuses on advanced OpenCL concepts and is an extension of the highly successful “Hands on OpenCL” course which has received over 6,500 downloads from GitHub. Simon McIntosh-Smith, Associate Professor in High Performance Computing at the University of Bristol and one of the tutorial authors will lead the sessions.
In this special guest feature from Scientific Computing World, Andrew Jones from NAG looks ahead at what 2016 has in store for HPC and finds people, not technology, to be the most important issue. “A disconcertingly large proportion of the software used in computational science and engineering today was written for friendlier and less complex technology. An explosion of attention is needed to drag software into a state where it can effectively deliver science using future HPC platforms.”
“In a heterogeneous system that combines both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, there are various options available to optimize applications. Whether one has an advantage over another is somewhat dependent on the application that is being run. Comparisons can be made comparing the two methods, as long as the algorithm lends itself to run and take advantage of either OpenMP or OpenCL.”