Azure Machine Learning

208 Topics

The Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.
TrishWH
Oct 31, 2025 Place Azure AI Foundry Blog
36Views
0likes
0Comments
The Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.
Yina Arenas
Oct 03, 2025 Place Azure AI Foundry Blog
7.3KViews
1like
1Comment
The Future of AI: Deploying your LoRA Fine-tuned Llama 3.1 8B on Azure AI, why it's a breeze!
In this article, you will discover how to seamlessly deploy your LoRA fine-tuned Llama 3.1 8B model using Azure AI Studio and the Python SDK.
cedricvidal
Oct 03, 2025 Place Azure AI Foundry Blog
3.1KViews
0likes
1Comment
The Future of AI: Harnessing AI for E-commerce - personalized shopping agents
Explore the development of personalized shopping agents that enhance user experience by providing tailored product recommendations based on uploaded images. Leveraging Azure AI Foundry, these agents analyze images for apparel recognition and generate intelligent product recommendations, creating a seamless and intuitive shopping experience for retail customers.
manniarora
Oct 03, 2025 Place Azure AI Foundry Blog
1.3KViews
5likes
3Comments
Stay Informed with the Azure AI Foundry Status Dashboard
The status dashboard in Azure AI Foundry is a centralized view into the status, uptime, and availability of key Azure AI Foundry services.
joshuawillard
Sep 30, 2025 Place Azure AI Foundry Blog
336Views
0likes
1Comment
An Introduction to LLMOps: Operationalizing and Managing Large Language Models using Azure ML
An Introduction to LLMOps: Operationalizing LLMs with Azure Machine Learning
Lucky_Pamula_MSFT
Sep 09, 2025 Place Azure AI Foundry Blog
90KViews
16likes
12Comments
Fine-tuning gpt-oss-20b Now Available on Managed Compute
Earlier this month, we made available OpenAI’s open‑source model gpt‑oss on Azure AI Foundry and Windows AI Foundry. Today, you can fine-tune gpt‑oss‑20b using Managed Compute on Azure — available in preview and accessible via notebook.
NandiniMuralidharan
Sep 01, 2025 Place Azure AI Foundry Blog
604Views
0likes
0Comments
Azure Machine Learning now supports Large-Scale AI Training and Inference with ND H200 v5 VMs
TL;DR: Azure Machine Learning now offers ND H200 v5 VMs accelerated by NVIDIA H200 Tensor Core GPUs, purpose‑built to train and serve modern generative AI more efficiently at cloud scale. With massive on‑GPU memory and high intra‑node bandwidth, you can fit larger models and batches, keep tensors local, and cut cross‑GPU transfers - doing more with fewer nodes. Start with a single VM or scale out to hundreds in a managed cluster to capture cloud economics, while Azure’s AI‑optimized infrastructure delivers consistent performance across training and inference. Why this matters The AI stack is evolving with bigger parameter counts, longer context windows, multimodal pipelines, and production-scale inference. ND H200 v5 on Azure ML is designed to address these needs with a memory-first, network-optimized, and workflow-friendly approach, enabling data science and MLOps teams to move from experiment to production efficiently. Memory, the real superpower At the heart of each ND H200 v5 VM are eight NVIDIA H200 GPUs, each packing 141 GB of HBM3e memory - representing a 76% increase in HBM capacity over H100. That means you can now process more per GPU, larger models, more tokens and better performance. Aggregate that across all eight GPUs and you get a massive 1,128 GB of GPU memory per VM. HBM3e throughput: 4.8 TB/s per GPU ensures continuous data flow, preventing compute starvation. Larger models with fewer compromises: Accommodate wider context windows, larger batch sizes, deeper expert mixtures, or higher-resolution vision tokens without needing aggressive sharding or offloading techniques. Improved scaling: Increased on-GPU memory reduces cross-device communication and enhances step-time stability. Built to scale-within a VM and across the cluster When training across multiple GPUs, communication speed is crucial. Inside the VM: Eight NVIDIA H200 GPUs are linked via NVIDIA NVLink, delivering 900 GB/s of bidirectional bandwidth per GPU for ultra-fast all-reduce and model-parallel operations with minimal synchronization overhead. Across VMs: Each instance comes with eight 400 Gb/s NVIDIA ConnectX-7 InfiniBand adapters connecting to NVIDIA Quantum-2 InfiniBand switches, totaling 3.2 Tb/s interconnect per VM. GPUDirect RDMA: Enables data to move GPU-to-GPU across nodes with lower latency and lower CPU overhead, which is essential for distributed data/model/sequence parallelism. The result is near-linear scaling characteristics for many large-model training and fine-tuning workloads. Built into Azure ML workflows (no friction) Azure Machine Learning integrates ND H200 v5 with the tools your teams already use: Frameworks: PyTorch, TensorFlow, JAX, and more Containers: Optimized Docker images available via Azure Container Registry Distributed training: NVIDIA NCCL fully supported to maximize performance of NVLink and InfiniBand Bring your existing training scripts, launch distributed runs, and integrate into pipelines, registries, managed endpoints, and MLOps with minimal change. Real-world gains Early benchmarks show up to 35% throughput improvements for large language model inference compared to the previous generation, particularly on models like Llama 3.1 405B. The increased HBM capacity allows for larger inference batches, improving utilization and cost efficiency. For training, the combination of additional memory and higher bandwidth supports larger models or more data per step, often potentially reducing overall training time. Your mileage will vary by model architecture, precision, parallelism strategy, and data loader efficiency—but the headroom is real. Quick spec snapshot GPUs: 8× NVIDIA H200 Tensor Core GPUs HBM3: 141 GB per GPU (1,128 GB per VM) HBM bandwidth: 4.8 TB/s per GPU Inter-GPU: NVIDIA NVLink 900 GB/s (intra-VM) Host: 96 vCPUs (Intel Xeon Sapphire Rapids), 1,850 GiB RAM Local storage: 28 TB NVMe SSD Networking: 8× 400 Gb/s NVIDIA ConnectX-7 InfiniBand adapters (3.2 Tb/s total) with GPUDirect RDMA Getting started (it’s just a CLI away) Create an auto-scaling compute cluster in Azure ML: az ml compute create \ --name h200-training-cluster \  --size Standard_ND96isr_H200_v5 \  --min-instances 0 \  --max-instances 8 \  --type amlcompute Auto-scaling means you only pay for what you use - perfect for research bursts, scheduled training, and production inference with variable demand. What you can do now Train foundation models with larger batch sizes and longer sequences Fine-tune LLMs with fewer memory workarounds, reducing the need for offloading and resharding Deploy high-throughput inference ND H200 v5 documentation for chat, RAG, MoE, and multimodal use cases Accelerate scientific and simulation workloads that require high bandwidth + memory Pro tips to unlock performance Optimize HBM usage: Increase batch size/sequence length until you reach the HBM bandwidth limit of approximately 4.8 TB/s per GPU). Utilize parallelism effectively: Combine tensor/model parallel (NVLink-aware) with data parallelism across nodes (InfiniBand + GPUDirect RDMA). Optimize your input pipeline: Parallelize tokenization/augmentation,and store frequently accessed data on local NVMe to prevent GPU stalls. Leverage NCCL: Configure your communication backend to take advantage of the topology, using NVLink intra-node and InfiniBand inter-node. The bottom line This is more than a hardware bump - it’s a platform designed for the next wave of AI. With ND H200 v5 on Azure ML, you gain the memory capacity, network throughput, and operational simplicity needed to transform ambitious models into production-grade systems. For comprehensive technical specifications and deployment guidance, visit the official ND H200 v5 documentation and explore our detailed announcement blog for additional insights and use cases.
JagatjitTuruk
Aug 23, 2025 Place Azure AI Foundry Blog
505Views
1like
1Comment
Connecting Azure Kubernetes Service Cluster to Azure Machine Learning for Multi-Node GPU Training
TLDR Create an Azure Kubernetes Service cluster with GPU nodes and connect it to Azure Machine Learning to run distributed ML training workloads. This integration provides a managed data science platform while maintaining Kubernetes flexibility under the hood, enables multi-node training that spans multiple GPUs, and bridges the gap between infrastructure and ML teams. The solution works for both new and existing clusters, supporting specialized GPU hardware and hybrid scenarios. Why Should You Care? Integrating Azure Kubernetes Service (AKS) clusters with GPUs into Azure Machine Learning (AML) offers several key benefits: Utilize existing infrastructure: Leverage your existing AKS clusters with GPUs via a managed data science platform like AML Flexible resource sharing: Allow both AKS workloads and AML jobs to access the same GPU resources Organizational alignment: Bridge the gap between infrastructure teams (who prefer AKS) and ML teams (who prefer AML) Hybrid scenarios: Connect on-premises GPUs to AML using Azure Arc in a similar way to this tutorial We are looking at Multi-Node Training because it is needed for most bigger training jobs. If you just need a single GPU or single VM we also look at how to do this. Prerequisites Before you begin, ensure you have: Azure subscription with privileges to create and manage AKS clusters and add compute targets in AML. We recommend the AKS and AML resources to be in the same region. Sufficient quota for GPU compute resources. Check this article on how to request quota How to Increase Quota for Specific Types of Azure Virtual Machines. We are using two Standard_NC8as_T4_v3. So, 4 T4s in total. You can also opt for other GPU enabled compute. Azure CLI version 2.24.0 or higher (az upgrade) Azure CLI k8s-extension version 1.2.3 or higher (az extension update --name k8s-extension) kubectl installed and updated Step 1: Create an AKS Cluster with GPU Nodes For Windows users, it's recommended to use WSL (Ubuntu 22.04 or similar). # Login to Azure az login # Create resource group az group create -n ResourceGroup -l francecentral # Create AKS cluster with a system node az aks create -g ResourceGroup -n MyCluster \ --node-vm-size Standard_D16s_v5 \ --node-count 2 \ --enable-addons monitoring # Get cluster credentials az aks get-credentials -g ResourceGroup -n MyCluster # Add GPU node pool (Spot Instances are not recommended) az aks nodepool add \ --resource-group ResourceGroup \ --cluster-name MyCluster \ --name gpupool \ --node-count 2 \ --vm-size standard_nc8as_t4_v3 \ # Verify cluster configuration kubectl get namespaces kubectl get nodes Step 2: Install NVIDIA Device Plugin Next, we need to make sure that our GPUs exactly work as expected. The NVIDIA Device Plugin is a Kubernetes plugin that enables the use of NVIDIA GPUs in containers running on Kubernetes clusters. It acts as a bridge between Kubernetes and the physical GPU hardware. Create and apply the NVIDIA device plugin to enable GPU access within AKS: kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml To confirm that the GPUs are working as expected follow the steps here and run a test workload Use GPUs on Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn. Step 3: Register the KubernetesConfiguration Provider The KubernetesConfiguration Provider enables Azure to deploy and manage extensions on Kubernetes clusters, including the Azure Machine Learning extension. Before installing extensions, ensure the required resource provider is registered: # Install the k8s-extension Azure CLI extension az extension add --name k8s-extension # Check if the provider is already registered az provider list --query "[?contains(namespace,'Microsoft.KubernetesConfiguration')]" -o table # If not registered, register it az provider register --namespace Microsoft.KubernetesConfiguration az account set --subscription <YOUR-AZURE-SUBSCRIPTION-ID> az feature registration create --namespace Microsoft.KubernetesConfiguration --name ExtensionTypes # Check the status after a few minutes and wait until it shows Registered az feature show --namespace Microsoft.KubernetesConfiguration --name ExtensionTypes # Install the Dapr extension az k8s-extension create --cluster-type managedClusters \ --cluster-name MyCluster \ --resource-group ResourceGroup \ --name dapr \ --extension-type Microsoft.Dapr \ --auto-upgrade-minor-version false You can also check out the “Before you begin” section here Install the Dapr extension for Azure Kubernetes Service (AKS) and Arc-enabled Kubernetes - Azure Kubernetes Service | Microsoft Learn. Step 4: Deploy the Azure Machine Learning Extension Install the AML extension on your AKS cluster for training: az k8s-extension create \ --name azureml-extension \ --extension-type Microsoft.AzureML.Kubernetes \ --config enableTraining=True enableInference=False \ --cluster-type managedClusters \ --cluster-name MyCluster \ --resource-group ResourceGroup \ --scope cluster There are several options on the extension installation available which are listed here Deploy Azure Machine Learning extension on Kubernetes cluster - Azure Machine Learning | Microsoft Learn. Verify Extension Deployment az k8s-extension create \ --name azureml-extension \ --extension-type Microsoft.AzureML.Kubernetes \ --config enableTraining=True enableInference=False \ --cluster-type managedClusters \ --cluster-name MyCluster \ --resource-group ResourceGroup \ --scope cluster The extension is successfully deployed when provisioning state shows "Succeeded" and all pods in the "azureml" namespace are in the "Running" state. Step 5: Create a GPU-Enabled Instance Type By default, AML only has access to an instance type that doesn't include GPU resources. Create a custom instance type to utilize your GPUs: # Create a custom instance type definition cat > t4-full-node.yaml << EOF apiVersion: amlarc.azureml.com/v1alpha1 kind: InstanceType metadata: name: t4-full-node spec: nodeSelector: agentpool: gpupool kubernetes.azure.com/accelerator: nvidia resources: limits: cpu: "6" nvidia.com/gpu: 2 # Integer value equal to the number of GPUs memory: "55Gi" requests: cpu: "6" memory: "55Gi" EOF # Apply the instance type kubectl apply -f t4-full-node.yaml This configuration creates an instance type that allocates two T4 GPU nodes, making it ideal for ML training jobs. Step 6: Attach the Cluster to Azure Machine Learning Once your instance type is created, you can attach the AKS cluster to your AML workspace: In the Azure Machine Learning Studio, navigate to Compute > Kubernetes clusters Click New and select your AKS cluster Specify your custom instance type ("t4-full-node") when configuring the compute target Complete the attachment process following the UI workflow Alternatively, you can use the Azure CLI or Python SDK to attach the cluster programmatically Attach a Kubernetes cluster to Azure Machine Learning workspace - Azure Machine Learning | Microsoft Learn. Step 7: Test Distributed Training With your GPU-enabled AKS cluster now attached to AML, you can: Create an AML experiment that uses distributed training Specify your custom instance type in the training configuration Submit the job to take advantage of multi-node GPU capabilities You can now run advanced ML workloads like distributed deep learning, which requires multiple GPUs across nodes, all managed through the AML platform. If you want to submit such a job you simply need to list the compute name, the registered instance_type and the number of instances. As an example, clone yuvmaz/aml_labs: Labs to showcase the capabilities of Azure ML and switch to Lab 4 - Foundations of Distributed Deep Learning. Lab 4 introduces you on how distributed training works in general and in AML. In the Jupyter Notebook that guides through that tutorial you will find that the first job definition is in simple_environment.yaml. Open this file an make the following adjustments to use the AKS compute target: $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json command: env | sort | grep -e 'WORLD' -e 'RANK' -e 'MASTER' -e 'NODE' environment: image: library/python:latest distribution: type: pytorch process_count_per_instance: 2 # We use 2 GPUs per node, Cross GPUs compute: azureml:<Kubernetes-compute_target_name> resources: instance_count: 2 # We want to VMs/instances in total, Cross node instance_type: <instance type name><instance type name> display_name: simple-env-vars-display experiment_name: distributed-training-foundations You can proceed in the same way for all other distributed training jobs. Conclusion By integrating AKS clusters with GPUs into Azure Machine Learning, you get the best of both worlds - the container orchestration and infrastructure capabilities of Kubernetes with the ML workflow management features of AML. This setup is particularly valuable for organizations that want to: Maximize GPU utilization across both operational and ML workloads Provide data scientists with self-service access to GPU resources Establish a consistent ML platform that spans both cloud and on-premises resources For production deployments, consider implementing additional security measures, networking configurations, and monitoring solutions appropriate for your organization's requirements. Thanks a lot, to Yuval Mazor and Alan Weaver for their collaboration on this blog post.
christinpohl
Jul 25, 2025 Place Azure AI Foundry Blog
514Views
1like
1Comment
Distributed Databases: Adaptive Optimization with Graph Neural Networks and Causal Inference
This blog post introduces a new adaptive framework for distributed databases that leverages Graph Neural Networks (GNNs) and causal inference to overcome the classic limitations imposed by the CAP theorem. Traditional distributed systems often rely on static policies for consistency, availability, and partitioning, which struggle to keep up with rapidly changing workloads and data relationships. The proposed GNN-based approach models the complex, interconnected nature of distributed databases, enabling predictive consistency management, intelligent load balancing for availability, and dynamic, graph-aware partitioning. By integrating temporal modeling and reinforcement learning, the framework adapts in real time, delivering significant improvements in latency, load balancing, and partition efficiency across real-world and synthetic benchmarks. This marks a major step toward intelligent, self-optimizing database systems that can meet the demands of modern applications.
Piyush_Patel
Jun 27, 2025 Place Azure AI Foundry Blog
222Views
0likes
0Comments