Understanding Your HPC Application Needs

Print Friendly, PDF & Email

Many HPC applications began as single processor (single core) programs. If these applications take too long on a single core or need more memory than is available, they need to be modified so they can run on scalable systems. Fortunately, many of the important (and most used) HPC applications are already available for scalable systems. Not all applications require large numbers of cores for effective performance, while others are highly scalable.

As mentioned, many HPC applications can use large amounts of system memory. In addition to core counts, the memory requirements of an application will often grow as the problem size increases. In a scale-up situation, adding memory is a simple process. There is no need to alter the application (other than to programmatically increase memory usage). In a scale-out cluster environment, the application memory is distributed across many separate servers each with its own memory domain. Expanding memory capacity in this situation requires much more attention to detail and usually requires additional programming changes.

Download the insideHPC Guide to Successful Technical Computing - Click Here.

Download the insideHPC Guide to Successful Technical Computing – Click Here.

This the third article in a series on insideHPC’s Guide to Successful Technical Computing.

Another important application requirement is data input and output (IO) needs. The amount of storage can vary widely between applications. Some applications rely on intermediate or checkpoint files so that should the application be interrupted, it can be restarted from the last checkpoint. Other applications may read or write large amounts of data while other computational application may have minimal IO requirements.

In summary, there are important characteristics of all HPC applications that need to be understood before committing to a specific architecture or design. One of the best ways to understand application behavior is to benchmark on a scale-up and scale-out system. Indeed, if the applications are used throughout a specific market sector, there should be published data on application performance across a variety of hardware.

Real Applications: Scale-Up Success

In some cases, there are problems that are only solvable using scale-up CSM (Coherent Shared Memory) systems. The following are some examples of how scale-up designs provide a distinct advantage over scale-out cluster systems.

Advancing Bioinformatics

The data sets used in bioinformatics are large and growing. Tools for analyzing genomic data such as BLAST, FASTA, ClustalW and HMMER  all run best using a scale-up CSM architecture. In addition, genome assembly codes such as Velvet show huge performance gains from using scale-up solutions, such as the SGI UV systems.

In genomics, when searching for a match of an unknown query sequence against a database the fastest results are achieved when the entire reference database can be loaded into a coherent shared memory (CSM) space. In a similar fashion, genome assemblies run on a large CSM system, can help turn what was an intractable computational problem into a usable result.

For Example, TGAC based in Norwich, UK, specializes in genomics and bioinformatics with a focus on analysis and interpretation of plant, animal and microbial genomes.  They were challenged with memory constraint in assembling large and complex genomes, and with bottlenecks in the workflow from too many apps and too much data spread across standard x86 cluster hardware.

Dr. Mario Caccamo, Director of TGAC, states “the main benefit of using such a system is the ability to assemble and analyze large and complex genome sequences in memory.  We were experiencing memory limitations in standard cluster x86 hardware and thus had difficulty in assembling large and complex genomes.”

Enabling Academic Research

Academic researchers are using HPC more than ever. Colleges and universities are therefore tasked with providing top-level HPC facilities for staff and students. When considering various types of HPC systems features, such as ease of programming and administration, the total cost of ownership (TCO), utilization, and time to solutions are big contributors to success. Providing academic researchers with a simple to use and simple to manage solutions guarantees the resource will get used right away making everyone more productive — and offer the ability to easily scale as problems increase in size.

Recently, CUNY, the third largest university system in the United States, serving more than 269,000 students on 24 campuses across all five boroughs of New York City decided to upgrade its infrastructure to the SGI UV 300 Coherent Shared Memory (CSM) system. The leading edge system provides 384 Intel® Xeon® cores, 12 terabytes (TB) of shared memory, and eight GPU accelerators.  As a reference point, the largest Intel Xeon servers can only provide 1.5 Terabytes of memory. The SGI UV provides almost 10 times as much.

Paul Muzio, Director, CUNY HPC Center stated, “The SGI UV 300, with its large shared memory, provides a unique capability for researchers to develop new methodologies and algorithms for interdisciplinary research needed to support social sciences research.”

Industrial: Oil and Gas and Electronic Design

Scale-up solutions have enabled success and capabilities in many industries. To remain competitive a leading oil and gas exploration company must continuously deliver larger and larger seismic datasets for interpretation. As the datasets become larger and more precise, the geoscientists require larger memory systems in order to utilize all the data and provide better insights. In some cases geoscientists need to visualize and interact with more than one-half of a terabyte of seismic data in real time.

A leading RF solutions firm significantly reduces simulation times and simulation failure rates for complex laminate designs. For the analysis, the company used ANSYS®HFSS™, the industry standard for simulating 3-D full-wave EM fields, on a SGI UV system. The SGI UV line was a good fit because most phases of their simulation typically consume a large amount of memory and require significant computing power and processing time. Measurable results included the elimination of crashed simulations due to improper memory sizing on a traditional HPC cluster, improved throughput from a simplified run-time environment that supported novice users, and a reduction of large memory simulation processing times by up to 50% or more. Simulations that took 8 to 12 hours to complete on a traditional HPC cluster were finished in as little as four hours.

The Advantages of Scale-Up Solutions

As the application success stories indicate, the scale-up approach offers many advantages and capabilities that are not available in scale-out cluster systems. The following is a partial list of the major advantages of scale-up systems:

  • All existing user software will run without modification. Applications are simply recompiled and there is no “porting” required.
  • All scale-out applications, (i.e. MPI-based applications) will run on scale-up systems.
  • An application requiring large amounts of memory can be run right away and memory can be easily expanded.
  • Applications can be incrementally improved by adding more cores, IO, or memory.
  • The administration overhead is much lower than scale-out systems. Scale-up systems administration is similar to a large workstation.
  • Additional resources can be added (more processor, memory, storage, applications) with minimal administration overhead.

In summary, scale-up computing provides a simplified gateway into HPC computing. The next article in this series will the question of local or cloud HPC. If you prefer you can download the complete insideHPC Guide to Successful Technical Computing, courtesy of SGI and Intel – Click Here.