Today Univa announced the general availably of its Grid Engine 8.4.0 product. Enterprises can now automatically dispatch and run jobs in Docker containers, from a user specified Docker image, on a Univa Grid Engine cluster. This significant update simplifies running complex applications in a Grid Engine cluster and reduces configuration and OS issues. Grid Engine 8.4.0 isolates user applications into their own container, avoiding conflict with other jobs on the system and enables legacy applications in Docker containers and non-container applications to run in the same cluster.
Disruptive Opportunities and a Path to Exascale: A Conversation with HPC Visionary Alan Gara of Intel
“We want to encourage and support that collaborative behavior in whatever way we can, because there are a multitude of problems in government agencies and commercial entities that seem to have high performance computing solutions. Think of bringing together the tremendous computational expertise you find from the DOE labs with the problems that someone like the National Institutes of Health is trying to solve. You couple those two together and you really can create something amazing that will affect all our lives. We want to broaden their exposure to the possibilities of HPC and help that along. It’s important, and it will allow all of us in HPC to more broadly impact the world with the large systems as well as the more moderate-scale systems.”
The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”
Pat McGarry from Ryft presented this talk at the HPC User Forum in Tucson. “Years in the making, the Ryft ONE combines two proven innovations in hardware and software to optimize compute, storage and I/O performance: the Ryft Hybrid FPGA/x86 Compute Platform, which leverages a massively parallel bitwise computing architecture and the Ryft Algorithmic Primitives (RAP) Library.
In this podcast, the Radio Free HPC team looks at the news highlights for the week leading up to Friday the 13th of May, 2016. Highlights include a 25 Petaflop Fujitsu supercomputer coming to Japan, an OpenPOWER Summit coming to Europe, and fighting the Zombie Apocalypse with HPC.
“The next step is to look at using OpenMP directives to create multiple threads to distribute the work over many threads and cores. A key OpenMP directive, #pragma omp for collapse, will collapse the inner two loops into one. The developer can then set the number of threads and cores to use and return the application to determine the performance. In one test case, three threads per physical core shows the best performance, by quite a lot compared to just using one or two threads per core.”
Today Fujitsu announced an order for a 25 Petaflop supercomputer system from the University of Tokyo and the University of Tsukuba. Powered by Intel Knights Landing processors, the “T2K Open Supercomputer” will be deployed at the Joint Center for Advanced High-Performance Computing (JCAHPC), which the two universities jointly operate. “The new supercomputer will be an x86 cluster system consisting of 8,208 of the latest FUJITSU Server PRIMERGY x86 servers running next-generation Intel Xeon Phi processors. Due to be completely operational in December 2016, the system is expected to be Japan’s highest-performance supercomputer.”
“The Intel Xeon Phi coprocessor is an example of a many core system that can greatly increase the performance of an application when used correctly. Simply taking a serial application and expecting tremendous performance gains will not happen. Rewriting parts of the application will be necessary to take advantage of the architecture of the Intel Xeon Phi coprocessor.”
“Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.”
“As clock speeds for CPU’s have not been increasing as compared to a decade ago, chip designers have been enhancing the performance of both CPUs, such as the Intel Xeon and the Intel Xeon Phi coprocessor by adding more cores. New designs allow for applications to perform more work in parallel, reducing the overall time to perform a simulation, for example. However, to get this increase in performance, applications must be designed or re-worked to take advantage of these new designs which can include hundreds to thousands of cores in a single computer system.”