Author: Andy Chan and Raymond Tsai
Pure Storage has partnered with Microsoft to deliver a solution for electronic design automation (EDA) that enables engineers to keep data on-prem while leveraging Azure for dynamic compute allocation.
Demand for smart devices is growing rapidly, but tape-out schedules are not. Customers are looking for increased functionality at lower power consumption, which is driving the move to next generation technology nodes, with designers already targeting 3nm and below. There is huge pressure to get designs “right” on first silicon in order to meet manufacturing and market windows.
Azure and Pure Storage have validated EDA tool acceleration in the front-end and back-end process using Azure VMs and FlashBlade in Equinix using ExpressRoute. These findings also show that:
- Business owners can move IP and design data on demand to FlashBlade in Equinix locations that are in close proximity to the local design engineering centers. Using automation tools like Ansible, there’s less management overhead on the data mobility between the local data center and colocation at an Equinix data center.
- Connectivity to Azure allows chip design organizations to burst into the cloud seamlessly to scale the number of VMs (cores) independent of the storage on the FlashBlade device during simulation and modeling phases.
Higher consumer demand from the pandemic creates the need for more functionality and improvements in electronic gadgets that are powered by silicon on chip (SoC). This drives design engineers to strive for better and faster ways to accelerate the pace of design flows, methods, tool usage, and integration practices. Demand often requires growth, and these days, extending infrastructure growth in the cloud often comes with concerns. For sensitive workloads like these, there have always been concerns for IP security, data sovereignty, cost efficiency, cloud lock-in, and more.
Electronic design automation (EDA) and manufacturing processes require numerous high performance computing (HPC) workloads that consist of simulations, physical design, and verification to tape out workflows. The chip design workloads vary from high metadata IO operations (IOPs) to high bandwidth in a high file count environment. The EDA tools used during design and tapeout phases generate concurrent read and write operations to one or many files in parallel from shared storage.
The practically endless availability of central processing unit (CPU) and graphics processing unit (GPU) resources in the public cloud allows designers to run large numbers of EDA jobs in parallel, without being limited to on-premises data center resources. This enables them to achieve faster time to results with added business value.
With the advanced and modern chip design process, elastic compute and storage requirements have crossed the data center boundaries and extended to cloud computing for provisioning virtual machines (VMs) on demand and accelerating time to market. However, organizations face the following challenges:
- During the chip design process, the compute cores scale independently of the underlying storage. Typically, on-premises data centers don’t have the flexibility to disaggregate the scaling of a high number of cores from the underlying storage on demand.
- Moving intellectual property and design data to the cloud introduces security and legal challenges. Business owners lose direct control and ownership of the data in the cloud as the data could be stored in various data centers located in any region or country.
Many semiconductor companies use EDA tools from vendors like Synopsys, Cadence, and Mentor Graphics. Pure Storage® FlashBlade® is the preferred data platform for many design and tapeout workflows at many EDA/semiconductor organizations. FlashBlade provides an efficient way of accessing and securing data using Network File System (NFS) v3/v4.1 for object store and encryption for data security.
To mitigate the on-demand scalability and data security risks, EDA tools can be configured on Azure VMs connecting to Platform Equinix® via Equinix Fabric™ in a connected-cloud data center available in the vicinity of different Azure regions using ExpressRoute. FlashBlade can be set up and configured in a customer’s colocation environment in an Equinix data center or through a managed Pure Storage on Equinix Metal™ solution integrated with Equinix Fabric. This infrastructure design allows more control over data ownership and the elasticity to burst in Azure cloud for compute requirements on demand.
Figure 1: Scaling Azure compute with FlashBlade in Equinix.
In a true production environment, the data can be replicated between the FlashBlade devices located in on-premises and Equinix data centers in a controlled and secure manner. Once the data is staged in Equinix, EDA tools running on Azure VMs can scale linearly, independently of the FlashBlade device with respect to capacity and performance.
We conducted a scalability test with the primary objective of measuring the IOPs and latency from the FlashBlade device as the load increases. Reading and writing a large number of small files or large files from deep directory structures generates massive amounts of metadata and bandwidth. FlashBlade is designed to handle the high metadata and throughput-based workloads that are common in chip development environments.
Azure CycleCloud
To dynamically allocate compute resources and scheduler integration, Azure CycleCloud was used as part of this run. Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing complex EDA environments on Azure. With CycleCloud, you can provision infrastructure for EDA workflows, deploy familiar EDA schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale. Through CycleCloud, you can create different types of filesystems and mount them to the compute cluster nodes to support your customized flows.
Creation, management, operation, and optimization are all done in parallel and fully integrated out of the box with commonly used schedulers such as Slurm, PBSPro, LSF, Grid Engine, and HTCondor.
FlashBlade in Equinix Architecture
As shown in Figure 2 below, the test performed on one Azure E96as_v4 VM came with AMD's 2.35Ghz EPYC 7452 with 96 vCPUs, 672GiB memory, and 32K Mbps network bandwidth. We used the operating system CentOS 7.9 with the latest patch and NFS utilities for this test.
Figure 2: Performance validation with SpecStorage2020 on Azure VM and FlashBlade.
Four working directories were specified in the CLIENT_MOUNTPOINTS parameter in the SPECstorage2020 configuration file to distribute EDA_BLENDED workload definition that includes the frontend and backend module. The frontend module simulates high metadata operation normally generated during logical verification in build cycles. The backend definition includes the metadata and high-bandwidth workload that is synonymous during simulation phases of the chip design process.
FlashBlade in the Equinix data center has been observed to have a 2ms latency round-trip time from the Azure VMs. The network connection from FlashBlade to Equinix Fabric and the ExpressRoute to the Azure VMs were configured with a speed of 10Gbps. The Azure VMs and the Equinix data center were located in the US West region.
The Specsfs2020 performance tests were performed over NFSv3 and NFSv4.1 protocols respectively.
The following NFS mount options were used on the Azure VM for each of the four mounts for running the SpecStorage2020 EDA_BLENDED definition.
The following kernel parameters were used to tune the Linux kernel for optimizing the TCP buffers.*
Observations and Results
The Specsfs2020 test was performed with a single Azure VM using an ultra-performance gateway over NFSv3. The Azure VM with ultra-performance gateway generated 45,000 operations per second and around 2ms average latency consistently.
The operations per second can scale linearly as more Azure VMs are added to the cluster and the knee of the latency to operations per second curve moves further to the right until the network bandwidth is saturated. The test indicates a moderate increase in latency as the number of operations per second scales, thus indicating that the network bandwidth is saturated as shown in Figure 3 below.
Figure 3: IOPs/sec vs. latency over NFS.
Figure 4 below shows that FlashBlade had enough headroom to accommodate more workloads as the Azure compute scaled independently until the network limitations were met.
Figure 4: Metadata IOPs vs. Bandwidth vs. Latency on FlashBlade.
The test results validate the performance of FlashBlade in Equinix over the ExpressRoute link. For the EDA workloads, E96as_v4 Azure VM with ultra-performance gateway and 10Gbps ExpressRoute connection to FlashBlade scales the IOPs linearly and consistently under 2ms.
Figure 5: SPEC SFS 2020 (EDA Mixed) Latency NFSv3, NFSv4.1 with AMD instances.
A similar SpecStorage2020 EDA_BLENDED test was performed using an Azure E96as_v4 Azure VM and ultra-performance gateway to FlashBlade over NFSv4.1. Figure 5 above shows the NFSv4.1 operations per second and latency comparison with NFSv3.
The test results indicated that NFSv4.1 was able to generate about 30,000 operations per second with around 2ms latency. The NFSv4.1 operations per second and the latency curve aren’t at parity with NFSv3. However, the knee of the curve for NFSv4.1 isn’t too far behind when compared with NFSv3.
Conclusion
The chip shortage in recent times has disrupted many industries like automotive, gaming, consumer appliances that are primarily dependent on semiconductors in their products and services. The larger silicon chips are used in more advanced devices and are very expensive. However, smaller silicon chipsets used in automobiles, gaming consoles, consumer household products, hand held devices etc. are in high demand. The pandemic and climate change has further slowed down the global supply chain due to higher cost overhead that supports industries for semiconductor requirements.
To wrap up, the extension to Azure cloud for consuming additional compute resources on demand along with data continuity and sovereignty with FlashBlade in a hybrid cloud environment accelerates the design flows and integration practices in the the front end and back end design processes for faster time to market.
With advancing technology nodes and exploding design complexity, the capability of using secure compute in the cloud is becoming increasingly critical for customers to meet project and market requirements. The connected-cloud architecture outlined above solves many customer challenges for silicon design in the cloud and accelerates product and service innovation more efficiently than ever before.
Pure storage has published a series of blogs around this solution starting with Connected Cloud with FlashBlade and Microsoft Azure HPC for EDA Workloads.
*Note:
- Some of the kernel tuning parameters may not apply for newer Linux kernels 4.12 and later.
- “nconnect” was not used for this performance validation with Specsfs2020 over NFSv3 and NFSv4.1