Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


New AMD Radeon Instinct Rolls Out to Accelerate Machine Intelligence

“New Radeon Instinct accelerators will offer organizations powerful GPU-based solutions for deep learning inference and training. Along with the new hardware offerings, AMD announced MIOpen, a free, open-source library for GPU accelerators intended to enable high-performance machine intelligence implementations, and new, optimized deep learning frameworks on AMD’s ROCm software to build the foundation of the next evolution of machine intelligence workloads.”

Optimizing Your Code for Big Data

Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.

Intel Xeon Phi with Software Defined Visualization at SC16

“Software Defined Visualization (SDVis) is an open source initiative from Intel and industry collaborators to improve the visual fidelity, performance and efficiency of prominent visualization solutions – with a particular emphasis on supporting the rapidly growing “Big Data” usage on workstations through HPC supercomputing clusters without the memory limitations and cost of GPU based solutions. Existing applications can be enhanced using the high performing parallel software rendering libraries OpenSWR, Embree, and OSPRay. At the Intel HPC Developer Conference, Amstutz provided an introduction to this initiative, its benefits, a brief descriptions of accomplishments in the past year and talk about the changes made to Intel provided libraries in the past year.”

HIP and CAFFE Porting and Profiling with AMD’s ROCm

In this video from SC16, Ben Sander from AMD presents: HIP and CAFFE Porting and Profiling with AMD’s ROCm. “We are excited to present ROCm, the first open-source HPC/Hyperscale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application. ROCm is built for scale; it supports multi-GPU computing in and out of server-node communication through RDMA.”

For HPC, Red Hat Offers Much More than just Linux

“The HPC Community demands performance, transparency, and value—exactly what Red Hat and open source offer. Red Hat is the standard choice for Linux in HPC clusterers worldwide. But it doesn’t stop there–our cloud, virtualization, storage, platform and service-oriented solutions bring real freedom and collaboration to federal, state, local, and academic programs. And Red Hat’s worldwide support, training and consulting services bring the power of open source to your agency. We are a part of a larger community working together to drive innovation.”

Simplify Cluster Deployment with Intel HPC Orchestrator

“Intel HPC Orchestrator simplifies the installation, management, and ongoing maintenance of an HPC system by reducing the amounft of integration and validation effort required to run an HPC system software stack. With Intel HPC Orchestrator, based on the OpenHPC system software stack, you can take advantage of the innovation driven by the open source community – while also getting peace of mind from Intel® support across the HPC system software stack.”

Scaling Machine Learning Software with Allinea Tools

“The majority of deep learning frameworks provide good out-of-the-box performance on a single workstation, but scaling across multiple nodes is still a wild, untamed borderland. This discussion follows the story of one researcher trying to make use of a significant compute resource to accelerate learning over a large number of CPUs. Along the way we note how to find good multiple-CPU performance with Theano* and TensorFlow*, how to extend a single-machine model with MPI and optimize its performance as we scale out and up on both Intel Xeon and Intel Xeon Phi architectures.”

Manage Your Lustre HPC Storage with the new Dashboard from RAID Inc.

In this video from SC16, Yugendra Guvvala, VP of Technology at RAID Inc. describes the company’s new Dashboard software. The Dashboard provides a single pane of glass to manage your high performance Lustre storage pools. “Scaling to tens of petabytes and thousands of clients – considered a best filesystem for storage by many – Lustre is a high performance storage architecture for clusters. The central component of this architecture is the Lustre shared file system, which is currently available for Linux, providing a POSIX-compliant UNIX file system interface. RAID, Inc. offers custom Lustre solutions with installation & 24/7 support.”

Five Reasons Why you Want to Try OpenACC – Starting with “Its Free!”

OpenACC is a directive based programming model that gives C/C++ and Fortran programmers the ability to write parallel programs simply by augmenting their code with pragmas. Pragmas are advisory messages that expose optimization, parallelization, and accelerator offload opportunities to the compiler so it can generate efficient parallel code for a variety of different target architectures including AMD and NVIDIA GPUs plus ARM, x86, Intel Xeon Phi, and IBM POWER processors.

BSC Releases COMPSs Version 2.0 at SC16

This version of COMPSs, available from today, updates the result of the team’s work in the last years on the provision of a set of tools that helps developers to program and execute their applications efficiently on distributed computational infrastructures such as clusters, grids and clouds. COMPSs is a task based programming model known for notably improving the performance of large scale applications by automatically parallelizing their execution.