Today ASRock Rack announced plans to showcase its 2U and 3U systems for the HPC market at ISC 2016. “First of all, ASRock Rack is showing its new product 3U16N, which is by far the highest-density among all the microservers features with Intel Xeon D processors. With multiple computing nodes, this microserver can easily handle intensive critical tasks under low power consumption.”
The vector parallel capabilities of the Intel Xeon Phi coprocessor are similar in many ways with vectorizing code for the main CPU. The performance improvement when coding smartly and using the tools available can be tremendous. Since the Intel Xeon Phi coprocessor can show very large gains in performance due to its extra wide processing units. “Although it is time consuming to look at each and every loop in a large application, by doing so, and both telling the compiler what to do, and letting the compiler do its work, performance increases can be quite large, leading to shorter run times and/or more complete results.”
With the release of a Developer Access Program for the Intel Xeon Phi Processor codenamed Knights Landing, Intel and its partner Colfax are widening early levels of access, support and training for the widely anticipated next-generation Intel Xeon Phi processor release. The Developer Access Program gives developers the opportunity to begin leveraging key new capabilities in the processor before they are generally available. That means developers will have time to work to parallelize and vectorize their code and look for opportunities to exploit the massive performance capabilities that KNL offers so workloads are ready for prime time when customers deploy their next-generation systems.
Today the Numerical Algorithms Group (NAG) has announced the NAG Software Modernization Service. The new service solves the porting and performance challenges faced by customers wishing to use the capabilities of modern computing systems, such as multi-core CPUs, GPUs and Xeon Phi. NAG HPC software engineering experts modernize the code to enable portability to appropriate architectures, optimize for performance and assure robustness.
Intel is offering a 4-part summer series of developer training workshops at Stanford University to introduce high performance computing tools.
“The Pittsburgh Supercomputing Center recently added Bridges to its lineup of world-class supercomputers. Bridges is designed for uniquely flexible, interoperating capabilities to empower research communities that previously have not used HPC and enable new data-driven insights. It also provides exceptional performance to traditional HPC users. It converges the best of High Performance Computing (HPC), High Performance Data Analytics (HPDA), machine learning, visualization, Web services, and community gateways in a single architecture.”
New HPC products and technologies. Compelling demos. Insights from top Intel HPC architects. More than 60 presentations from Intel and industry experts. Additional details about Intel® Scalable System Framework. Intel will have something for everyone at this year’s International Supercomputing Conference in Frankfurt, Germany.
The MVAPICH User Group (MUG) meeting has issued its Call for Presentations. The event takes place August 15-17 in Columbus, Ohio.
Over at the Dell HPC Community, Jim Ganthier writes that TACC is planning to deploy its 18 Petflop Stampede 2 supercomputer based on Dell servers running Intel Knights Landing processors. “Stampede 2 will do more than just meet growing demand from those who run data-intensive research. Imagine the discoveries that will be made as a result of this award and the new system. Now more than ever is an exciting time to be in HPC.”
For maximum performance, data needs to flow into and out of the vectorization units. There are a few things to remember regarding laying out the data to gain high performance. These include, data layout, alignment, prefetching, and store operations. “Prefetching is also extremely important in HPC applications that use coprocessors. If the vectors are aligned, then the data can be streamed to the math units very efficiently, with data being prefetched, rather than the system having to load registers from various memory storage.”