Annual Roundup of AI Infrastructure Breakthroughs for 2023
Published Mar 27 2024 01:16 PM 3,141 Views
Microsoft

What a difference a year makes! Last year I said 2022 was a banner year for AI developments… if 2022 was a banner year then how should we describe 2023? How about …. Supercalifragilisticexpialidocious? Introduction of ChatGPT at the beginning of last year revolutionized how the world thinks and talks about AI. Immediately after its release, it went viral and took the world by storm. It only took ChatGPT less than two months to reach 100 million users – an unprecedented feat. Case in point, it took TikTok about nine months after its global launch to reach 100 million users and Instagram more than two years.  Personally, I want to thank OpenAI and ChatGPT for making me cool with my kids!

 

Even though it seems like 2023 was all about OpenAI and ChatGPT, there were other very notable AI news from Microsoft last year and yes, it was indeed supercalifragilisticexpialidocious! Here are some of the most noteworthy AI product announcements and achievements this past year.
 

Azure previews ND MI300Xv5 optimized for demanding AI and HPC workloads

Azure ND MI300X v5-series virtual machines are designed to accelerate the processing of AI workloads for high range AI model training and generative inferencing. Powered by the AMD Instinct MI300X accelerator and connected by NVIDIA Quantum-2 CX7 InfiniBand, each VM provides a staggering 1.5 TB of high bandwidth memory (HBM), 5.2 TB/s of memory bandwidth and 3.2Tb/s of scale out bandwidth. With ND MI300X v5-series virtual machines you can scale up to thousands of VMs and tens of thousands of GPUs to train and run the latest large language models (LLMs) faster. Learn more
 

Introducing Azure NC H100 v5 VMs for mid-range AI and HPC workloads

The new NC H100 v5 Virtual Machine (VM) Series is built on the latest NVL variant of the NVIDIA Hopper 100 (H100), which will offer greater memory per GPU. The new VM series will provide customers with greater performance, reliability and efficiency for mid-range AI training and generative AI inferencing. Powered by NVIDIA Hopper Generation H100 NVL 94GB PCIe Tensor Core GPUs and 4th Gen AMD EPYC™ Genoa processors. The NC H100v5-series provides up to 2X GPU compute performance, 2X Host to GPU interconnect bandwidth per GPU and 2X front end network bandwidth per GPU VM over previous NC-series. Learn more 
 

Azure confidential VMs with NVIDIA H100 Tensor Core GPUs in preview

Microsoft announced the preview of Azure confidential VMs with NVIDIA H100 Tensor core GPUs, bringing a secure computing stack from the virtual machine to the GPU architecture itself. Users can build and deploy AI applications with confidential computing on Microsoft Azure, knowing their data and AI models are protected from end to end. Learn more

 

Introducing Azure ND H100 v5 VMs for massively scalable AI training workloads

The new ND H100 v5 Virtual Machine (VM) Series, in production, is built on the latest generation of NVIDIA GPUs called Hopper (H100), which will offer significantly more compute power and faster memory per GPU. The new VM series will provide customers with greater performance, reliability and scalability for extreme AI training and computationally demanding HPC applications. Powered by NVIDIA Hopper Generation H100 SXM Tensor Core GPUs and the latest Intel Xeon Sapphire Rapids processors. The ND H100v5-series provides up to 2.5X GPU compute performance, >2X Host to GPU interconnect bandwidth per GPU and 2X scale-out network bandwidth per GPU VM over previous ND-series. Learn more

 

Custom-built silicon for AI and enterprise workloads in the Microsoft Cloud

At Ignite 2023, Microsoft announced new custom silicon that complements Microsoft’s offerings with industry partners. The two new chips, Microsoft Azure Maia, and Microsoft Azure Cobalt were built with a holistic view of hardware and software systems to optimize performance and price. Microsoft Azure Maia is an AI Accelerator chip designed to run cloud-based training and inferencing for AI workloads, such as OpenAI models, Bing, GitHub Copilot and ChatGPT. Microsoft Azure Cobalt is a cloud-native chip based on Arm architecture optimized for performance, power efficiency and cost-effectiveness for general purpose workloads. Learn more

 

Azure has the most powerful Supercomputer for AI & HPC

At Supercomputing 2023 in November, Microsoft Azure’s “Eagle” supercomputer made its debut on the Top 500 list as the #3 most powerful Supercomputer in the world. The world of High-Performance Computing has been looking to democratize supercomputing capability for the masses and it is now here. The same supercomputing capability that powers this leading supercomputer in the cloud is available to anyone with a credit card on Microsoft Azure. With 14,400 NVIDIA H100 GPUs, Intel Xeon Sapphire Rapids host processors all connected via high performance, low-latency Quantum-2 CX7 InfiniBand network, Microsoft Azure’s Eagle represents a paradigm shift for Cloud Supercomputers.

 

And in addition to turning in a powerful Top 500 ranking, Microsoft Azure has delivered industry leading results for AI inference workloads amongst cloud service providers in the most recent MLPerf Inference results published publicly by MLcommons. The Azure results were achieved using the new NC H100 v5 Virtual Machines (VMs), and noteworthy among these achievements is a 46% performance gain over competing products equipped with GPUs of 80GB of memory, solely based on the impressive 17.5% increase in memory size (94 GB) of the NC H100 v5-series. This leap in performance is attributed to the series' ability to efficiently fit the large models into fewer GPUs. Learn more

 

AI lessons from healthcare

Healthcare AI tech is notoriously challenging to scale, from the complexity of the applications to the intricacies of the licensing and regulatory environment in an industry where failure can mean life or death. For much of the last decade, Elekta, a global health tech innovator, has been developing and commercializing ML-powered systems for radiology and radiation therapy used in the treatment of cancer and brain disorders. It was also crucial to set up a development and operational environment for machine learning and AI activities that could easily scale. Elekta partnered with Microsoft Azure to deliver needed performance, scalability, and security. Learn more

 

Wrapping up the roundup

#AIInfraMarketPulse  #AzureAI  #AI

Co-Authors
Version history
Last update:
‎Mar 29 2024 01:37 PM
Updated by: