Heterogeneous hardware is now present in virtually all clusters. Make sure you can monitor all hardware on all installed clusters in a consistent fashion. With extra work and expertise, some open source tools can be customized for this task. There are few versatile and robust tools with a single comprehensive GUI or CLI interface that can consistently manage all popular HPC hardware and software. Any monitoring solution should not interfere with HPC workloads.
In this slidecast, Fritz Ferstl from Univa presents: Rev Up Your HPC Engine. The presentation explores the challenges for Workload Management systems in today’s datacenters with ever-increasing core counts.
“Jointly defined by a group of major computer hardware and software vendors, the OpenMP API is a portable, scalable model that gives shared memory parallel programmers a simple and flexible interface for developing parallel applications on platforms ranging from embedded systems and accelerator devices to multicore systems and shared memory systems.”
Smaller clusters often overload a single server with multiple services such as file, resource scheduling, plus monitoring/management. While this approach may work for systems with fewer than 100 nodes, these services can overload the cluster network or the single server as the cluster grows. InsideHPC Guide show a plan for scalable HPC cluster growth
“Our core product is the Xcelerit SDK, a Software Development Kit that makes it easy for domain specialists (i.e. mathematicians in banks or geophysicists in energy exploration firms) to convert their existing code to take advantage of multi-core, GPU and other hardware accelerators.”
“We need to educate new legions of students in high-performance computing,” said James Lin, vice director Center for HPC at SJTU. “Teaching HPC skills in a competition like this can be more effective than in a classroom. In fact, as a result of his experience, one of our team members decided to focus on an HPC PhD at Virginia Tech.”
“This talk will focus on challenges in designing software libraries and middleware for upcoming exascale systems with millions of processors and accelerators. Two kinds of application domains – Scientific Computing and Big data will be considered. For scientific computing domain, we will discuss about challenges in designing runtime environments for MPI and PGAS (UPC and OpenSHMEM) programming models by taking into account support for multi-core, high-performance networks, GPGPUs and Intel MIC. “
HPC is reaching out of its traditional setting in large compute clusters and into embedded systems for modern signal processing applications in defense.
Today Cray announced that the Center for Computational Sciences (CCS) at the University of Tsukuba in Japan has installed their second Petascale Cray CS300 supercomputer.
The basic HPC cluster consists of at least one management/login node connected to a network of many worker nodes. Depending on the size of the cluster, there may be multiple management nodes used to run cluster-wide services, such as monitoring, workflow, and storage services. This insideHPC article series looks at the Five Essential Strategies for Managing HPC Clusters.