azure machine learning
176 TopicsBeyond the Model: Empower your AI with Data Grounding and Model Training
Discover how Microsoft Foundry goes beyond foundational models to deliver enterprise-grade AI solutions. Learn how data grounding, model tuning, and agentic orchestration unlock faster time-to-value, improved accuracy, and scalable workflows across industries.312Views4likes3CommentsThe Evolution of GenAI Application Deployment Strategy: Building Custom Copilot (PoC)
The article discusses the use of Azure OpenAI in developing a custom Copilot, a tool that can assist with a wide range of activities. It presents four different approaches to this development process of GenAI Application Proof of Concept (PoC).2.7KViews0likes0CommentsThe Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.1.7KViews1like0CommentsThe Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.7.4KViews1like1CommentThe Future of AI: Harnessing AI for E-commerce - personalized shopping agents
Explore the development of personalized shopping agents that enhance user experience by providing tailored product recommendations based on uploaded images. Leveraging Azure AI Foundry, these agents analyze images for apparel recognition and generate intelligent product recommendations, creating a seamless and intuitive shopping experience for retail customers.1.5KViews5likes3CommentsFine-tuning gpt-oss-20b Now Available on Managed Compute
Earlier this month, we made available OpenAI’s open‑source model gpt‑oss on Azure AI Foundry and Windows AI Foundry. Today, you can fine-tune gpt‑oss‑20b using Managed Compute on Azure — available in preview and accessible via notebook.734Views0likes0CommentsAzure Machine Learning now supports Large-Scale AI Training and Inference with ND H200 v5 VMs
TL;DR: Azure Machine Learning now offers ND H200 v5 VMs accelerated by NVIDIA H200 Tensor Core GPUs, purpose‑built to train and serve modern generative AI more efficiently at cloud scale. With massive on‑GPU memory and high intra‑node bandwidth, you can fit larger models and batches, keep tensors local, and cut cross‑GPU transfers - doing more with fewer nodes. Start with a single VM or scale out to hundreds in a managed cluster to capture cloud economics, while Azure’s AI‑optimized infrastructure delivers consistent performance across training and inference. Why this matters The AI stack is evolving with bigger parameter counts, longer context windows, multimodal pipelines, and production-scale inference. ND H200 v5 on Azure ML is designed to address these needs with a memory-first, network-optimized, and workflow-friendly approach, enabling data science and MLOps teams to move from experiment to production efficiently. Memory, the real superpower At the heart of each ND H200 v5 VM are eight NVIDIA H200 GPUs, each packing 141 GB of HBM3e memory - representing a 76% increase in HBM capacity over H100. That means you can now process more per GPU, larger models, more tokens and better performance. Aggregate that across all eight GPUs and you get a massive 1,128 GB of GPU memory per VM. HBM3e throughput: 4.8 TB/s per GPU ensures continuous data flow, preventing compute starvation. Larger models with fewer compromises: Accommodate wider context windows, larger batch sizes, deeper expert mixtures, or higher-resolution vision tokens without needing aggressive sharding or offloading techniques. Improved scaling: Increased on-GPU memory reduces cross-device communication and enhances step-time stability. Built to scale-within a VM and across the cluster When training across multiple GPUs, communication speed is crucial. Inside the VM: Eight NVIDIA H200 GPUs are linked via NVIDIA NVLink, delivering 900 GB/s of bidirectional bandwidth per GPU for ultra-fast all-reduce and model-parallel operations with minimal synchronization overhead. Across VMs: Each instance comes with eight 400 Gb/s NVIDIA ConnectX-7 InfiniBand adapters connecting to NVIDIA Quantum-2 InfiniBand switches, totaling 3.2 Tb/s interconnect per VM. GPUDirect RDMA: Enables data to move GPU-to-GPU across nodes with lower latency and lower CPU overhead, which is essential for distributed data/model/sequence parallelism. The result is near-linear scaling characteristics for many large-model training and fine-tuning workloads. Built into Azure ML workflows (no friction) Azure Machine Learning integrates ND H200 v5 with the tools your teams already use: Frameworks: PyTorch, TensorFlow, JAX, and more Containers: Optimized Docker images available via Azure Container Registry Distributed training: NVIDIA NCCL fully supported to maximize performance of NVLink and InfiniBand Bring your existing training scripts, launch distributed runs, and integrate into pipelines, registries, managed endpoints, and MLOps with minimal change. Real-world gains Early benchmarks show up to 35% throughput improvements for large language model inference compared to the previous generation, particularly on models like Llama 3.1 405B. The increased HBM capacity allows for larger inference batches, improving utilization and cost efficiency. For training, the combination of additional memory and higher bandwidth supports larger models or more data per step, often potentially reducing overall training time. Your mileage will vary by model architecture, precision, parallelism strategy, and data loader efficiency—but the headroom is real. Quick spec snapshot GPUs: 8× NVIDIA H200 Tensor Core GPUs HBM3: 141 GB per GPU (1,128 GB per VM) HBM bandwidth: 4.8 TB/s per GPU Inter-GPU: NVIDIA NVLink 900 GB/s (intra-VM) Host: 96 vCPUs (Intel Xeon Sapphire Rapids), 1,850 GiB RAM Local storage: 28 TB NVMe SSD Networking: 8× 400 Gb/s NVIDIA ConnectX-7 InfiniBand adapters (3.2 Tb/s total) with GPUDirect RDMA Getting started (it’s just a CLI away) Create an auto-scaling compute cluster in Azure ML: az ml compute create \ --name h200-training-cluster \ --size Standard_ND96isr_H200_v5 \ --min-instances 0 \ --max-instances 8 \ --type amlcompute Auto-scaling means you only pay for what you use - perfect for research bursts, scheduled training, and production inference with variable demand. What you can do now Train foundation models with larger batch sizes and longer sequences Fine-tune LLMs with fewer memory workarounds, reducing the need for offloading and resharding Deploy high-throughput inference ND H200 v5 documentation for chat, RAG, MoE, and multimodal use cases Accelerate scientific and simulation workloads that require high bandwidth + memory Pro tips to unlock performance Optimize HBM usage: Increase batch size/sequence length until you reach the HBM bandwidth limit of approximately 4.8 TB/s per GPU). Utilize parallelism effectively: Combine tensor/model parallel (NVLink-aware) with data parallelism across nodes (InfiniBand + GPUDirect RDMA). Optimize your input pipeline: Parallelize tokenization/augmentation,and store frequently accessed data on local NVMe to prevent GPU stalls. Leverage NCCL: Configure your communication backend to take advantage of the topology, using NVLink intra-node and InfiniBand inter-node. The bottom line This is more than a hardware bump - it’s a platform designed for the next wave of AI. With ND H200 v5 on Azure ML, you gain the memory capacity, network throughput, and operational simplicity needed to transform ambitious models into production-grade systems. For comprehensive technical specifications and deployment guidance, visit the official ND H200 v5 documentation and explore our detailed announcement blog for additional insights and use cases.672Views1like1Comment