Taking Altair Structural & CFD Analysis to New Heights with the 3rd Gen AMD EPYC™ Processors

Print Friendly, PDF & Email

Sponsored Post

By: Eric Lequiniou, VP of Radioss Development and Altair Solver HPC, Altair

In CAE design centers, computing power is always at a premium. Manufacturers across all industries rely on HPC simulation for applications ranging from structural analysis to fluids and thermal to multiphysics.

At the recent ISC 2021 virtual event in June, I sat down with Kevin Mayo of AMD and InsideHPC’s Doug Black. We discussed how the latest AMD EPYC™ processors are helping accelerate Altair applications for the most demanding engineering workloads. AMD and Altair have been busily collaborating to deliver better performance for resource-intensive FEA and CFD applications. We share some benchmarks resulting from these efforts below.

The Need for Higher Throughput

As designs get more complex, engineers rely on automation and optimization techniques to find the best answer that meets design goals while ensuring manufacturability and minimizing costs. Dynamic structural and CFD simulations are among the most computationally expensive workloads in the CAE design cycle, so throughput is critical. These workloads have the potential to impact schedules and budgets substantially. As a result, even modest performance gains can have an outsized impact on design quality and timeliness. License-use efficiency is another consideration. Minimizing feature checkout time for commercial solver licenses with faster processors helps software investments go further, providing additional productivity and efficiency gains.

Altair Tools

Altair offers a variety of software tools ranging from structural analysis to multiphysics to electronic system design. In characterizing tool performance on AMD EPYC 7003 series processors, we focused on demanding real-world simulations involving Altair® Radioss® and Altair® AcuSolve®.

Altair Radioss is a versatile, explicit finite element (FE) solver able to evaluate and optimize product performance for highly nonlinear problems. Radioss is the “go-to” tool for predicting dynamic, transient loading effects to improve design safety, survivability, and durability. Applications include automotive crash testing and use cases across a range of manufactured products, including shock and impact, drop testing, and high-velocity impacts.

Altair AcuSolve is a general-purpose CFD solver that is part of Altair’s CFD portfolio. It uses an implicit Navier-Stokes-based solver that brings multiple advantages to engineers. These include the capacity to take larger time steps while maintaining accuracy (thus reducing simulation time) and quickly solving coupled velocity and pressure systems. Simulation accuracy is also less sensitive to mesh quality than competing CFD solvers, helping engineers more easily run more accurate simulations quickly.

AMD EPYC Processors

Two standard benchmarks were used to compare Radioss single-node performance. The first, known as the NEON FE benchmark, simulated 80 milliseconds of a 1996 Chrysler Neon comprised of roughly 1M elements colliding with a rigid wall. The second was a larger Ford Taurus model comprised of approximately 10M elements. For the CFD benchmark, we used an Altair-supplied “Impinging Nozzle” model. The Impinging Nozzle model is a realistic CFD workload comprised of 7.7M elements that simulates liquid jets used for convective cooling applications.

AMD EPYC 7003 series processors delivered compelling performance gains for each benchmark run on dual-socket hosts compared to a competitive reference processor. The 24 core EPYC 74F3 processors delivered an average overall 1.45x performance gain running the two Radioss models. In comparison, the 32 core EPYC 75F3 processor delivered an average overall 1.73x improvement running the same benchmark. AcuSolve simulations also benefited from the latest EPYC processors. The CFD model ran 1.76x faster on the EPYC 74F3 and 1.79x faster on the EPYC 75F3 [1]. Considering that engineers frequently spend considerable time tuning systems to achieve just a few percentage points gain, these results are dramatic.

The Taurus T10M FEA model was also run using the Radioss distributed parallel solver across 1,2, and 4 nodes to validate scalability in multi-node environments. While distributed FEA solvers often exhibit poor scalability, the combination of Radioss and AMD EPYC processors delivered near linear scaling efficiency up to four nodes [2].

The “EPYC” Advantage

These results build on earlier benchmarks conducted on 2nd Gen AMD EPYC processors with Altair solvers. The performance advantage comes from a variety of EPYC design features, including:

  • Clock speeds of up to 4.0 GHz with max. boost enabled [3]
  • 8 x DDR4 memory channels clocked @ 3200 MT/sec
  • Up to 256 MB of L3 cache per socket
  • 128 PCIe® Gen4 lanes per socket (supporting high-performance InfiniBand adapters)

The latest AMD EPYC 7003 series processors introduced in March of 2021 provide additional features including:

  • A unified 8 core cache complex sharing a single 32 MB L3 cache per Core Complex Die (CCD) providing up to twice the amount of directly accessible L3 cache per core with low latency. [4]
  • Up to a 19% improvement in instructions per cycle (IPC). [5]
  • A faster Infinity Fabric™ clocked at 1600 MHz enabling synchronous transfers with the 3200 MT/sec DDR memory

What are the Takeaways?

For both structural and CFD simulations, Altair tools on AMD EPYC 7003 series processors deliver impressive performance that can help boost engineering productivity. Both workloads benefit from a combination of high clock speeds, large amounts of memory bandwidth, and large amounts of L3 cache per core. Not surprisingly, the smaller Neon model delivered better relative performance than the Taurus simulation because more of the model could fit in the L3 cache.

There is no one-size-fits-all CPU for HPC workloads. The optimal choice of processor will depend on the application. Dynamic structural and CFD simulations generally benefit from CPUs with high frequencies and mid-core counts. AMD offers 3rd Gen EPYC Processors with up to 64 cores per socket. However, the high-frequency 32 and 24 core SKUs will generally deliver better CAE throughput by providing more cache and memory bandwidth per core.

For design teams that want high simulation throughput per node, the 32 core EPYC 75F3 delivers the best overall performance. For sites that prefer to optimize per-core performance to maximize commercial software investments, systems using the 24 core EPYC 74F3 part may offer better value and throughput.

Learning more

To learn more about Altair structural and CFD tools, visit altair.com. To learn more about AMD EPYC 7003 series processors, please visit  amd.com/en/processors/epyc-7003-series.

The benchmarks referenced above are available at the links below:

[1] These are per node results across all cores on the dual-socket socket servers tested.

[2] Running across 4 MPI connected hosts yielded a 3.86x performance gain versus a single node – ~97% of the theoretical 4x maximum performance. Details are available in the AMD document Altair Radioss Performance with AMD EPYC 7003 Series Processors.

[3] EPYC-18: Max. boost for AMD EPYC processors is the maximum frequency achievable by any single core on the processor under normal operating conditions for server systems.

[4] See section 2.2 Core Complex (CCX) and Complex Die (CCD) – https://www.amd.com/system/files/documents/high-performance-computing-tuning-guide-amd-epyc7003-series-processors.pdf

[5] https://www.amd.com/en/press-releases/2021-03-15-amd-epyc-7003-series-cpus-set-new-standard-highest-performance-server