ai infrastructure
87 TopicsAzure announces new AI optimized VM series featuring AMD’s flagship MI300X GPU
In our relentless pursuit of pushing the boundaries of artificial intelligence, we understand that cutting-edge infrastructure and expertise is needed to harness the full potential of advanced AI. At Microsoft, we've amassed a decade of experience in supercomputing and have consistently supported the most demanding AI training and generative inferencing workloads. Today, we're excited to announce the latest milestone in our journey. We’ve created a virtual machine (VM) with an unprecedented 1.5 TB of high bandwidth memory (HBM) that leverages the power of AMD’s flagship MI300X GPU. Our Azure VMs powered with the MI300X GPU give customers even more choices for AI optimized VMs.Performance considerations for large scale deep learning training on Azure NDv4 (A100) series
Modern DL training jobs require large Clusters of multi-GPUs with high floating-point performance connected with high bandwidth, low latency networks. The Azure NDv4 VM series is designed specifically for these types of workloads. We will be focusing on HPC+AI Clusters built with the ND96asr_v4 virtual machine type and providing specific optimization recommendations to get the best performance.Introducing Azure NC H100 v5 VMs for mid-range AI and HPC workloads
Today at Ignite, Microsoft is announcing the public preview of the NC H100 v5 Virtual Machine Series, the latest addition to our portfolio of purpose-built infrastructure for High Performance Computing (HPC) and Artificial Intelligence (AI) workloads.Running GPU accelerated workloads with NVIDIA GPU Operator on AKS
The focus of this article will be on getting NVIDIA GPUs managed and configured in the best way on Azure Kuberentes Services using NVIDIA GPU Operator for HPC/AI workloads requiring a high degree of customization and granular control over the compute-resources configurationExploring CPU vs GPU Speed in AI Training: A Demonstration with TensorFlow
In the ever-evolving landscape of artificial intelligence, the speed of model training is a crucial factor that can significantly impact the development and deployment of AI applications. Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are two types of processors commonly used for this purpose. In this blog post, we will delve into a practical demonstration using TensorFlow to showcase the speed differences between CPU and GPU when training a deep learning model.