NVIDIA Long-Haul InfiniBand at Purdue University – Extending Accelerated Research Across Campus

[Sponsored Post] For data-driven researchers, the time-related expense of moving data from machines between data centers slows computation and causes costly delays in results. Plus, data center space can be limited for many organizations, even  well-established academic campuses. To address these issues, Purdue University deployed NVIDIA InfiniBand MetroX across campus in 2011, connecting remote computation clusters to remote storage facilities.

Azure HBv2 Virtual Machines eclipse 80,000 cores for MPI HPC

Today Microsoft announced general availability of Azure HBv2-series Virtual Machines designed to deliver leadership-class performance, message passing interface (MPI) scalability, and cost efficiency for a variety of real-world HPC workloads. “HBv2 VMs deliver supercomputer-class performance, message passing interface (MPI) scalability, and cost efficiency for a variety of real-world high performance computing (HPC) workloads, such as CFD, explicit finite element analysis, seismic processing, reservoir modeling, rendering, and weather simulation. Azure HBv2 VMs are the first in the public cloud to feature 200 gigabit per second HDR InfiniBand from Mellanox. 

vScaler Launches AI Reference Architecture

A new AI reference architecture from vScaler describes how to simplify the configuration and management of software and storage in a cost-effective and easy to use environment. “vScaler – an optimized cloud platform built with AI and Deep Learning workloads in mind – provides you with a production ready environment with integrated Deep Learning application stacks, RDMA accelerated fabric and optimized NVMe storage, eliminating the administrative burden of setting up these complex AI environments manually.”

Radio Free HPC Recaps SC19

In this podcast, the Radio Free HPC team looks back the “State Fair for Nerds” that was SC19. “At this year’s conference, we not only learned the latest discoveries in our evolving field – but also celebrated the countless ways in which HPC is improving our lives … our communities … our world. So many people worked together to make SC19 possible – more than: 780 volunteers, 370 exhibitors, 1,150 presenters, and a record 13,950 attendees.”

Call for Sessions: OpenFabrics Alliance Workshop in March

The OpenFabrics Alliance (OFA) has published a Call for Sessions for its 16th annual OFA Workshop. “The OFA Workshop 2020 Call for Sessions encourages industry experts and thought leaders to help shape this year’s discussions by presenting or leading discussions on critical high-performance networking issues. Session proposals are being solicited in any area related to high performance networks and networking software, with a special emphasis on the topics for this year’s Workshop. In keeping with the Workshop’s emphasis on collaboration, proposals for Birds of a Feather sessions and panels are particularly encouraged.”

How we built Oracle Cloud Infrastructure for HPC

In this video, Karan Batta from Oracle describes how the company built Oracle Cloud Infrastructure to deliver high performance for HPC applications. “Over the last 12 months, we have invested significantly, in both technology and partnerships, to make Oracle Cloud Infrastructure the best place to run your Big Compute and HPC workloads.”

Building Oracle Cloud Infrastructure with Bare-Metal

In this video, Taylor Newill from Oracle describes how the Oracle Cloud Infrastructure delivers high performance for HPC applications. “From the beginning, Oracle built their bare-metal cloud with a simple goal in mind: deliver the same performance in the cloud that clients are seeing on-prem.”

Designing Scalable HPC, Deep Learning, Big Data, and Cloud Middleware for Exascale Systems

DK Panda from Ohio State University gave this talk at the UK HPC Conference. “This talk will focus on challenges in designing HPC, Deep Learning, Big Data and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS – OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks, and GPGPUs (including GPUDirect RDMA).”

Mellanox Powers Virtualized Machine Learning with VMware and NVIDIA

Today Mellanox announced that its RDMA (Remote Direct Memory Access) networking solutions for VMware vSphere enable virtualized Machine Learning solutions that achieve higher GPU utilization and efficiency. “As Moore’s Law has slowed, traditional CPU and networking technologies are no longer sufficient to support the emerging machine learning workloads,” said Kevin Deierling, vice president marketing, Mellanox Technologies. “Using hardware compute accelerators such as NVIDIA T4 GPUs and Mellanox’s RDMA networking solutions has proven to boost application performance in virtualized deployments.”

RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects

Torsten Hoefler from ETH Zurich gave this talk at the Swiss HPC Conference. “Network cards contain rather powerful processors optimized for data movement and limiting the functionality to remote direct memory access seems unnecessarily constraining. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card.”