Ansys RedHawk-SC™ on Azure: Hold on to Your Socks

<SPONSORED CONTENT> By: Marc Swinnen, Dir. Product Marketing, Semiconductors, Ansys and Andy Chan, Director, Azure Global Solutions, Semiconductor/EDA/CAE

Abstract:

This article describes the extensive evaluation testing of the optimal operational configuration for running Ansys RedHawk-SC Electronic Design Automation (EDA) tool on Microsoft Azure. The results identified two categories of results: the first being the best choices for selecting from Azure’s service portfolio. And the second identifies the optimal number of CPUs required to minimize the overall cloud TCO for these workloads.

The main takeaway is that the total cost of running a RedHawk-SC job on Azure actually decreases as you increase the core count up to the optimum threshold. Read on to better understand the details!

What is Ansys RedHawk-SC?

Modern semiconductor integrated circuits (IC) can contain a staggering 50 billion transistors or more and would be impossible to design without software tools grouped under the Electronic Design Automation (EDA) category that support, automate, and verify every step of the chip design process.

RedHawk-SC is an EDA tool developed by Ansys that is the market leader for power integrity and reliability sign-off, which provide a vital sign-off step in the design process for all semiconductor chip design. Sign-off algorithms are extremely resource-intensive requiring hundreds of CPU cores running over many hours, making it an ideal application for cloud computing.

Designed for the Cloud

RedHawk-SC was architected on a cloud-friendly analysis platform called Ansys SeaScape™. RedHawk-SC’s SeaScape database is fully distributed and thrives on distributed disk access across a network. RedHawk-SC distributes the computational workload across many CPUs, or “workers”, that have low memory requirements – less than 32GB per worker. This elastic compute architecture allows for instant start as soon as just a few workers become available.

The distribution of the computational workload is extremely memory efficient, allowing the optimal utilization of over 2,500 CPUs. There is also no need for a heavy master node because the distribution is orchestrated by an ultra-light master scheduler using less than 2GB for even the largest chips. The same is true for loading, viewing, or debugging results.

RedHawk-SC Workloads on Azure

EDA applications like RedHawk-SC have specific requirements for optimal cloud deployment. We can summarize these considerations with the following points:

Sign-off generates very large workloads requiring thousands of CPUs
Large design sizes necessitate persistent or distributed storage for data and results in the cloud
Worker communication calls for a high-bandwidth network (10Gbps or more)

Ansys and Microsoft have worked together to evaluate the performance of realistic RedHawk-SC workloads on the Azure cloud and how to optimally configure the hardware setup.

Table-1: RedHawk-SC resource requirements for representative small “Block” workloads, medium “Cluster/Partition” workloads, and large “Full Chip” workloads

Table-1 lists the resources required to run RedHawk-SC on a variety of workload sizes.

Cloud Compute Models for EDA

Microsoft worked closely with Ansys to develop finely tuned solutions for RedHawk-SC running on Azure’s high-performance computing (HPC) infrastructure. These targeted reference architectures help ease the transition to Azure and allow design teams to run faster at a much lower cost.

IC design companies may choose to contract with cloud providers like Azure under an “all-in” model where the entire design project is conducted in the cloud or may look for a “hybrid” use model where cloud resources complement their existing in-house capacity (Figure-1).

Figure-1. Hybrid versus all-in model with both the head and compute nodes in the cloud.

Ansys and Microsoft Azure have verified that RedHawk-SC successfully accommodates both “all-in” and “hybrid” use models and licensing.

Azure infrastructure optimized for EDA

To achieve the fastest possible runtimes, companies typically start by investing in processors that support the highest clock speed available. Additionally, the cloud poses other efficiency considerations such as datacenter efficiency and workflow architecture. Benchmarks show that storage in the cloud is a high-impact architectural component, as are scale technologies. Through extensive testing with realistic workloads, Microsoft and Ansys have recommended an optimized hardware configuration for running RedHawk-SC on Azure in Figure-2 (below) The Azure Silicon team selected the following infrastructure to power this test:

AMD’s EPYC powered HBv2
Intel Cascade Lake powered FX VM family
Azure NetApp Files
CycleCloud Operations Orchestration

Azure NetApp Files is a high-performance, NFS-metered file storage service enables RedHawk-SC file applications to run without the need for code changes. CycleCloud cloud-scaling was used to support RedHawk-SC in orchestrating dynamic VM deployment.

Figure-2: Reference architecture for running Ansys RedHawk-SC on an Azure hybrid cloud

RedHawk-SC shows near-linear runtime scaling as the number of CPUs is increased. This is shown for the three different workloads in Graph-1 (below). The favorable scaling reflects the efficient distribution technology underlying RedHawk-SC’s SeaScape architecture.

Graph-1: Runtime required to run various RedHawk-SC workloads on Microsoft Azure
as a function of the number of CPUs

In a surprising finding from Graph-1, the total cost of running a RedHawk-SC job on Azure actually decreases as you increase the number of workers (up to the optimum threshold). This contradicts the commonly held assumption that the total cost will increase as you enlist more CPUs (Graph-2). The reason for this is the very high CPU utilization RedHawk-SC can achieve. The optimal number of CPUs is the number of power partitions automatically calculated by RedHawk-SC.

Graph-2: This plot illustrates the non-intuitive decrease in total Azure costs for RedHawk-SC runs as the number of CPUs is increased to an optimal value – the number of power partitions in RedHawk-SC

This result is not intuitively obvious and indicates that customers should not try to reduce the CPU count to save money. In fact, they should actually increase their CPU count to the optimal value to achieve lower cost and a faster runtime.

Conclusion

Extensive testing of RedHawk-SC on Azure has allowed Microsoft to identify an optimized VM configuration for cloud-based EDA work. This configuration has demonstrated excellent scalability to over 2500 CPUs running on a range of realistic EDA workloads of enormous sizes. The testing further identified the optimal number of CPUs to minimize the total cost for running RedHawk-SC on Azure. The result is that customers can easily set up their power integrity signoff analysis jobs on Azure with optimal configurations for both throughput and cost.

For further information contact your local sales representative or visit www.ansys.com

Authors

Marc Swinnen, Dir. Product Marketing, Semiconductors, Ansys

Andy Chan, Director, Azure Global Solutions, Semiconductor/EDA/CAE

Sponsored Guest Articles

Why Tier 0 Is a Game-Changer for GPU Storage

White Papers

The Power of Now: Accelerate the Datacenter

Featured RSS Feed

More News from insideAI News