kubernetes
169 TopicsAzure Availability Zone Mapping and VM Resilience Analysis Guidance using SRE.AZURE.COM Agent
Overview This guidance, supported and tested using SRE.Azure.com, helps Azure platform engineers understand how Availability Zones are mapped within their subscription and how virtual machines (VMs) are distributed across those zones. SRE.Azure.com enables discovery and analysis of zone mappings, VM placement, and infrastructure resilience. Why This Matters Azure uses logical zones (1, 2, 3), but these map differently to physical datacenter zones (az1, az2, az3) in each subscription. This means workloads in the same logical zone across subscriptions may not be physically co-located. Understanding this is critical for high availability, disaster recovery, compliance, and resilience planning. Example sub-prod-eastus-01 -> Zone 1 → az3 sub-prod-eastus-01 -> Zone 2 → az1 sub-prod-eastus-01 -> Zone 3 → az2 sub-prod-weu-01 -> Zone 1 → az1 sub-prod-weu-01 -> Zone 2 → az2 sub-prod-weu-01 -> Zone 3 → az3 Key takeaway: Logical zone numbers do not guarantee physical separation across subscriptions. What SRE.Azure.com agent Enables - Discover logical-to-physical zone mappings - Analyze VM distribution across zones - Identify resilience gaps - Generate presentation-ready reports Suggested Prompt “Act as an Azure platform engineer and generate a clean, presentation-ready analysis for availability zone design. For Azure subscription <subscription-id>, produce two outputs inline in chat. Output 1 — Zone Mapping Summary - Query Azure directly for region availability zone mappings - Show how logical zones map to physical zones - Include a takeaway and tables Output 2 — VM Resilience Distribution - List VMs with zone, physical mapping, and protection level Formatting: - Use markdown tables - No raw JSON - Screenshot-friendly layout - End with 3 observations” Example output: And so on …… Next Steps: Get Started | Azure SRE Agent What is SRE Agent? | Azure SRE Agent171Views1like0CommentsPreview of multiparty analytics with Azure Confidential Clean Rooms
Today, we are excited to announce the preview of multiparty analytics feature of Azure Confidential Clean Rooms, a fully managed service that allows customers and their partners to securely analyze privacy-sensitive datasets from multiple parties. It uses confidential compute enabled Apache Spark-based big-data analytics (Spark SQL) which helps protect their raw data from other collaborators and from the Azure operator by performing computations in a Trusted Execution Environment (TEE). Privacy-sensitive datasets include personally identifiable information (PII), protected health information (PHI) and cryptographic secrets. Organizations across industries are increasingly looking to supplement their data with data from business partners, to build a complete view of their business. For example, brands, publishers, and their partners need to collaborate using datasets containing Intellectual Property (IP) to improve the relevance of their campaigns. Confidential data clean rooms help solve this challenge by enabling organizations to share and analyze granular datasets in a secure environment that helps prevent raw data exfiltration—protecting intellectual property, preserving customer privacy, and addressing concerns around regulatory compliance. You can sign up for the preview here Key Features Fully Managed: Azure takes care of the infrastructure provisioning and scaling with no user intervention. This significantly reduces your onboarding effort allowing you to focus on the queries and insights, not on infra management. Confidential Spark SQL: Spark SQL allows you to query large datasets and run complex queries in a distributed computing environment. In the confidential computing enabled version, the Spark driver and executors are fully attested policy-governed enclaves running as virtual nodes on confidential Azure Container Instances (ACI) which helps prevent exfiltration of collaborators’ data during query execution. Governance: Helps manage membership to cleanrooms, enables and verifies approval for queries from relevant collaborators before executing them and verifies consent to access sensitive collaborator data. It also helps generate tamper-resistant audit trails containing salient clean room events. This is made possible with the help of an implementation of the Confidential Consortium Framework (CCF). Telemetry: Throughout every clean-room run, detailed logs are streamed out in real time to monitor performance, troubleshoot issues, and keep the analytics healthy — all without ever exposing the collaborators’ data at any time. Verifiable trust: Cryptographic remote attestation viz. full attestation based on confidential hardware reports allows independent verification of the TEE along with along with all components that are part of it, without just trusting the cloud provider, before sensitive data and decryption keys are made available to the TEE Open-source containers: All Microsoft provided cleanroom containers and sidecars are open-sourced here and can be verified for provenance and integrity guarantees using GitHub artifact attestation Use Cases Multi-party confidential big-data analytics unlocks value in scenarios where data sensitivity, regulatory pressure, or competitive concerns previously blocked collaboration. These are some early scenarios that can benefit from this. Media & Advertising Collaboration of advertiser CRM data with publisher data for audience targeting and segment activation. Collaboration of audience data with measurement partners for measurement and attribution. Banking & Finance Collaboration between banks and insurance firms to upsell relevant products to existing bank customers without sharing raw data from either side Collaboration with retailers to generate customized offers for bank customers, without exposing either party’s underlying data. Government & Public Sector Secure collaboration of data across government departments to deliver better citizen welfare outcomes. Secure collaboration between government and private enterprises on shared-interest workloads such as traffic monitoring and weather systems. Healthcare Enable healthcare firms — including biopharma organizations — to combine their data with third-party institutions to accelerate clinical development, like identifying eligible participants for a clinical trial, without exposing underlying patient data. Combine patient datasets across hospitals to study disease patterns or outcomes without exposing sensitive protected health information. "A higher standard for protecting user privacy and trust, the phase-out of third-party cookies, and global regulations demand more sophisticated data collaboration tools to support advertising marketplaces. Azure Confidential Cleanrooms (ACCR) provides a secure, feature-rich, and flexible foundation to implement privacy-preserving functions and enable insights without sharing privacy-sensitive data outside of organization boundaries. Built on the Azure Confidential Compute (ACC) platform and offering cohesion with Azure's diverse set of services, ACCR offers the attestation, audit, fine-grained access control, and verifiable trust tools required for secure and privacy-safe data collaboration in today's world." — Andrei Mackenzie, Engineering Manager, Microsoft AI "Azure Confidential Clean Rooms enabled our team to evaluate how clean room capabilities can support secure, governed data collaboration at scale. Through the Proof-of-Concept (PoC), we explored how privacy-preserving workflows, trusted access controls, and scalable compute can create a stronger foundation for responsibly leveraging first-party data. This helps reduce operational friction while supporting business growth, improving customer engagement, and enabling more relevant customer experiences." — Nic Dregne, Director, Microsoft AdTech Engineering Beyond Spark SQL Realizing other multi-party scenarios like custom analytics, ML training and inferencing on Azure Confidential Clean Rooms is in our roadmap. If you have such a scenario to be realized, you can fill in and submit the preview signup form with the details of your scenario and we’ll get back to you. Learn More · Signup for the preview of Azure Confidential Clean Rooms for Analytics · Confidential Consortium Framework (CCF) · Virtual Nodes on Azure Container InstancesBuild, deploy, and govern sovereign AI with Foundry Local on Azure Local
Not every AI workload can run in the cloud. For many of our customers, data needs to stay within defined boundaries, connectivity may be limited or absent, and latency, governance, and auditability are non-negotiable. With Foundry Local on Azure Local, you can use the same model catalog, developer workflows, and governance capabilities you know from Azure, while running AI entirely within your own environment where your data resides. Foundry Local provides the model catalog and developer experience. Azure Local provides the customer-managed infrastructure. Azure Arc provides unified policy, governance, and lifecycle management across cloud and local environments. This gives developers a consistent way to build, deploy, and operate AI. The same az commands, the same model catalog, the same Arc policies, all running on hardware you control. Expansion of Foundry Local on Azure Local We're expanding the Foundry Local model offering on Azure Local, with support for multi-node deployments and new agents and tools that run locally, in preview. Deploy and run AI models locally. Run models with Foundry Local in customer-managed environments on Azure Local, across sovereign, private, and edge scenarios, including fully disconnected operation. Choose from a flexible, high-performance model catalog. Access proprietary and community models through Foundry Local, now expanded with vLLM-optimized models alongside ONNX-based offerings. You explore and deploy through the same catalog API experience, then operate locally on Azure Local. Build for production realities. Bring governance, identity, and auditability into your applications while keeping execution inside your controlled boundary. See what’s new in Foundry Local on Azure Local in the Tech Community blog. From intelligence to action: agents and tools inside the enterprise boundary Most production AI use cases need two things: grounded answers and the ability to act on them, without sending data outside the environment. Here's how we're enabling that locally. Preview: Agentic retrieval with Foundry Local: Ground agents in enterprise data using retrieval-augmented generation across local Microsoft 365 services, including Exchange and SharePoint. Read the Tech Community blog to learn more. Preview: Agents and tools with Foundry Local: Build AI systems that reason, retrieve information, and take action within customer-controlled environments. Learn more. Preview: Developer acceleration templates: Jump-start local AI application development with new Foundry solution templates, including local chat experiences and video agents, powered by Azure AI Video Indexer. Read the Tech Community to learn more. GitHub Enterprise Local: Now available in public preview Sovereign AI is also about how systems are built and secured, not just where they run. With GitHub Enterprise Local on Azure Local, you can bring your full software development lifecycle on-premises: Source control and repositories CI/CD pipelines Security and DevSecOps workflows GitHub Enterprise Local deploys entirely within customer-owned infrastructure, so teams get the developer tools they expect without compromising on data residency or operational control. This extends modern DevSecOps practice into sovereign environments and pairs naturally with the AI development workflows above: build, secure, and ship your AI applications within the same boundary where they run. Read the tech community blog to learn more about GitHub Enterprise Local and how to join the preview. Accelerating High-performance AI at the Edge with NVIDIA We are expanding our collaboration with NVIDIA to deliver high-performance AI capabilities directly at the edge. At Build, we are bringing: Azure Local and Foundry Local on NVIDIA-powered GPUs, including NVIDIA RTX PRO 6000 Blackwell Server Edition, with expanded GPU support coming soon Integration with Nemotron models, optimized for enterprise performance A scalable foundation for data-intensive, low-latency workloads This partnership ensures that organizations can run advanced AI workloads where data is generated - without dependency on centralized cloud infrastructure. Hardware options: AI factory configurations are available now in the catalog Alongside our hardware partners, we’re bringing integrated solutions to customers building AI within sovereign environments. The Azure Local hardware catalog now includes AI factory configurations from our OEM partners, including NVIDIA-certified 8xH100 systems, with options from DataON, Dell, HPE, and Lenovo. These configurations are sized for the performance that model serving and agentic workloads require on customer-managed infrastructure. Together with Microsoft, we are advancing sovereign AI by bringing the open NVIDIA Nemotron model family to Microsoft Foundry Local on Azure Local. This collaboration gives organizations a production-ready AI platform that enables them to deploy AI where their data resides while maintaining the governance, control, and performance needed to scale AI across the enterprise.” Kari Briski, VP Generative AI Software Products, NVIDIA ”Sovereign AI is becoming increasingly important for governments, regulated industries, and enterprises that want to use AI while maintaining control of their data, location, and operations. Lenovo’s ThinkAgile MX Series delivers trusted, enterprise-grade infrastructure with global deployment expertise to help customers run AI wherever their data resides. Co-engineered with Foundry Local and Azure Local, this solution provides an optimized platform to deploy, run, and scale AI locally with greater simplicity, consistency, and control, while helping meet strict data residency, security, and compliance requirements." Scott Patti - VP Infrastructure Solutions Group (ISG), Lenovo From AI models to trusted, mission-critical systems: what this unlocks for developers and operators AI is evolving from systems that answer questions to systems that plan, reason, and take action across workloads. These capabilities move AI from a cloud-only assumption to something you can deploy where sensitive work actually happens, with governance and operational controls intact. For our customers, this means you can now: Keep data, identities, and audit trails inside your sovereign boundary. Run AI inference and agentic workloads in connected, intermittently connected, or fully disconnected modes. Apply consistent policy and governance across cloud and local environments through Azure Arc. Use the same Foundry catalog and developer experience you already know, on infrastructure you own. Build, secure, and ship your AI applications with GitHub Enterprise Local, keeping source control, CI/CD, and DevSecOps workflows inside the same sovereign boundary. Resources Join us at Build OD837 Shipping physical AI to the edge with Azure Local and Foundry Local https://github.com/microsoft/build26-OD837 OD839 Foundry Local: AI solutions for industrial and sovereign needs https://github.com/microsoft/build26-OD839 LTG425 Expanding horizons: Foundry Local for devices and on-prem https://build.microsoft.com/en-US/sessions/LTG425 Request to join the Foundry Local on Azure Local preview Hands-on walkthrough: Your first model deployment on Foundry Local on Azure Local: from catalog to inference in 10 minutes | Microsoft Community Hub Read our Tech Community blogs: Foundry Local announcing multi-node and vLLM support Agentic Retrival with Foundry Local blog: https://aka.ms/AgentsAndToolsBuildBlog2026 Code sample / model catalog blog: https://aka.ms/foundry-local-model-catalog-blog For more details on the expanded capabilities of Foundry Local for highly secure environments, contact your Microsoft account team Discover Microsoft Sovereign Cloud Explore product documentation at: Foundry Local models on Azure Local: https://aka.ms/FoundryLocalonAzureLocal_documentation Local Agentic retrieval with Foundry Local: https://aka.ms/edge-agentic-retrieval-docs783Views0likes0CommentsUnlock On-Prem Productivity with Agentic Retrieval in Foundry Local
In today’s connected world, customers expect instant, context-rich interactions, even in environments where cloud connectivity isn’t guaranteed. That’s where Retrieval-Augmented Generation at the edge comes in. Since we launched into public preview, we’ve watched teams across regulated, disconnected, and mission-critical environments push this technology into places cloud GenAI simply couldn’t reach. What we heard back shaped everything in this release: customers don’t just want retrieval. They want reasoning, they want agency, and they want an end-user experience that feels as natural as the one they already use in the cloud. Today at Build 2026, we're excited to introduce Agentic Retrieval, the next evolution of our on-prem RAG platform, enabled by Azure Arc and powered by Foundry language models. Agentic Retrieval is part of Microsoft's Adaptive Cloud approach, which extends Azure capabilities to wherever customer data and workloads actually live, with Edge AI focused on bringing reasoning and grounding to on-prem, distributed, and disconnected environments. Together with Foundry Local, Agentic Retrieval continues to shape Microsoft's Foundry Anywhere commitment: flexibility, resilience, and intelligence wherever customers operate. What’s new at Build 2026 This release introduces three major pillars that work independently or together: Agentic Retrieval engine: a first-party orchestration runtime for planning, reasoning, conversation state, and tool calls over your local data Knowledge: a dedicated layer for organizing, curating, and governing your grounding data, exposed via MCP and connectable to any agentic retrieval layer Chat UI: a production-ready, polished conversational experience that ships as the default UX for Agentic Retrieval and can also be deployed standalone Alongside, we’re delivering the platform upgrades customers asked for: flexible deployment modes (Agentic-only, Knowledge-only, or Combined), BYOM with pluggable backends, Foundry Local model catalog integration, Entra ID support, disconnected-ready, and hybrid search combined with agentic retrieval. Agentic Retrieval: From Answering to Reasoning Classic RAG retrieves, then generates. Agentic Retrieval plans, reasons, and acts, running multi-step retrieval and tool invocation under a first-party orchestration runtime, entirely on your infrastructure. Under the hood it manages query planning, iterative multi-hop retrieval, tool calls via MCP, conversation state, and mandatory grounding with citations and audit logging built in. What customers can achieve: Compliance, policy, and permit workflows for public sector, regulators, and defense operations, with data never leaving sovereign infrastructure Multi-document synthesis across standards, technical manuals, contracts, and field procedures for industrial operators An agentic chat experience for regulated and operational teams (engineers, inspectors, analysts) that reasons like a subject-matter expert Auditable AI for sovereign and mission-critical environments, with every answer traceable to its source Knowledge: A First-Class, Governed Data Layer Great answers start with great knowledge. Knowledge is now a standalone component customers can deploy on its own or alongside Agentic Retrieval, exposed through an MCP wrapper so it can connect to any agentic retrieval layer, ours or yours. This release brings Collections (segmented groups of indexed knowledge with granular access permissions), multi-source ingestion across documents, tables, images, and SharePoint (indexed source moving to public preview), high-fidelity parsing for complex enterprise content, Bring Your Own MCP to connect customer-owned data sources directly into Agentic Retrieval and the chat experience, and governance enforced at the data layer itself. ent view - collections, sources, and permission scopes What customers can achieve: Scope knowledge access to different slices of the same corpus, by plant, site, classification, or jurisdiction Enforce data sovereignty, residency, and regulatory compliance at the knowledge layer itself Ground both first-party Agentic Retrieval and BYO orchestration through a single governed source of truth across distributed sites Keep classified, proprietary, and operational data fully on-prem while delivering premium chat experiences Chat UI: Production-Ready Conversational Experience Agentic Retrieval now ships with a polished, production-ready Chat UI as its default experience, and the same component can be deployed standalone for customers building their own stack on Foundry Local. Highlights include Entra ID authentication (MSAL login, Bearer tokens, user identity display), pluggable backends across AI Foundry, BYOM, or mock mode with zero code changes, Chain-of-Thought visibility and inline citations that make grounding transparent to end users, standalone frontend deployment via Helm chart and container image, and disconnected-ready operation for air-gapped environments. What customers can achieve: Deliver a polished end-user experience to operators, inspectors, and analysts without building UI from scratch Build trust in regulated and industrial workflows through transparent, inspectable reasoning and grounding Run the same UI across air-gapped facilities, sovereign clouds, and connected industrial sites Accelerate rollout across public sector, defense, manufacturing, and other mission-critical environments Why This Release Matters Every update to our on-prem RAG platform has moved us toward a simple conviction: GenAI should be useful wherever customers operate, whether regulated or open, connected or disconnected, centralized or distributed. With Agentic Retrieval, Knowledge, and Chat UI coming together, backed by Foundry on Arc, BYOM, and fully disconnected support, this is no longer “cloud RAG, but local.” It’s an agentic knowledge platform purpose-built for the realities of enterprise data: on-prem, governed, and increasingly autonomous. Learn More Explore Agentic retrieval documentation Read Foundry Local on Azure Local model inferencing blog post For more information reach out to the team at FoundryLocalOnAzure@microsoft.com486Views0likes0CommentsScale On-Prem AI with Foundry Local on Azure Local: Multi-Node Inference and vLLM Support
Since announcing the public preview of Foundry Local on Azure Local for single-node, we’ve seen strong adoption in regulated industries and consistent customer demand to expand the platform for scalable deployments. Today, we’re expanding Foundry Local model offering on Azure Local (preview) with three additions that broaden where and how you can use it: Multi-node scheduling - distribute inference workloads across the GPU capacity in your Azure Local cluster, not just a single node vLLM runtime support - a high-throughput serving engine purpose-built for large language models and concurrent workloads An expanded model catalog - new models available in vLLM optimized format alongside the existing ONNX offerings Together, these additions let you scale to higher concurrency, serve more users from a single endpoint, and run larger models on-premises. They round out Foundry Local on Azure Local into a more complete, production-grade on-premises inference platform - covering a wider range of model sizes, concurrency profiles, and hardware footprints, while preserving the same Kubernetes-native, OpenAI-compatible patterns you're already using. Runs disconnected - no cloud round-trip required Foundry Local on Azure Local is designed to run fully on-premises, including in disconnected and intermittently-connected environments. Model weights, prompts, and inference traffic stay entirely inside your Arc-enabled cluster - there is no per-request call to Azure, no data exfiltration to the cloud, and no dependency on a live WAN to serve inference. Models are cached locally on Persistent Volumes after the first pull. Once cached, the inference endpoint keeps serving even when the WAN is down - across reboots, network outages, and extended disconnected operation. API-key authentication continues working uninterrupted during disconnected periods. Microsoft Entra ID auth resumes seamlessly when connectivity returns. The control plane is local to the cluster. The Foundry Local operator, the model catalog, and the inference runtimes all live inside Azure Local - Arc is used for fleet management and updates, not for the inference data path. For factory floors, offshore platforms, sovereign data centers, classified sites, and remote branch offices where cloud connectivity is unreliable, restricted, or prohibited, this is what makes on-premises AI inference actually viable in production. Multi-node scheduling: more scenarios, more capacity Foundry Local on Azure Local now expands to support multiple nodes in your cluster. The inference operator schedules and manages deployments across the GPU capacity available cluster-wide, so you can: GPU capacity from any node in the cluster, not just a single node’s resources Place inference workloads where the hardware lives, with the operator managing deployments across nodes The same Model Deployment custom resource you already use defines the workload, and it is served through the standard OpenAI-compatible endpoint (POST /v1/chat/completions). The API used to interact with conversational AI models by sending structured messages and receiving model-generated responses. Existing applications work against multi-node deployments with zero code changes. vLLM runtime: high-throughput serving for production workloads Alongside ONNX-GenAI, Foundry Local now offers vLLM as a first-class inference runtime. vLLM is an open-source, high-throughput serving engine that has become the standard for production LLM inference in the cloud. Bringing it to Foundry Local on Azure Local means the same performance characteristics are available on your factory floor, in your sovereign data center, or at your remote site. Why vLLM matters for edge and on-premises inference Capability ONNX-GenAI vLLM Hardware CPU and GPU GPU only Throughput Optimized for single-user, low-latency Optimized for high-throughput, multi-user concurrency Memory management Standard allocation PagedAttention - efficient KV-cache management reduces VRAM waste Continuous batching Not supported Supported - incoming requests are batched dynamically for higher GPU utilization FP8 KV cache Not supported Supported on compatible models and GPUs - roughly doubles token capacity Best for Compact models, CPU-only nodes, single-client scenarios Larger models, multi-user workloads, GPU-equipped clusters Automatic GPU inference tuning with the vLLM planner One of the operational challenges with vLLM is configuration tuning - setting GPU memory utilization, context length, batch sizes, and other parameters for a given model on a given hardware profile. Get it wrong and the pod either OOMs (runs out of memory) on startup or wastes GPU capacity. Foundry Local addresses this with the vLLM planner, an automatic tuning component that inspects the available GPU resources, analyzes the target model's footprint, and generates a memory-safe, high-performance configuration before the model server starts. You declare what model you want to run; the planner figures out how to run it optimally on your hardware. Full configuration reference is in the vLLM planner docs. Identity-based access for multi-user workloads Serving more concurrent users isn't only a throughput problem - it's also an access-control problem. Foundry Local supports two authentication modes side by side on the same endpoint: API keys - primary and secondary keys per deployment, with zero-downtime rotation. Ideal for service-to-service traffic and automated pipelines. Microsoft Entra ID with Azure RBAC - per-identity access using the Cognitive Services OpenAI User role (or any role granting the equivalent data-plane action). JWT validation runs inside the inference pod; authorization is enforced through the cluster's Arc-managed identity. Enable both, and clients can present either credential type in the same Authorization: Bearer header - the platform detects which one was sent and routes to the right validation path. API-key callers also keep working uninterrupted if external connectivity is briefly lost, giving you a natural degradation story for edge and disconnected sites. For a multi-user AI assistant on the factory floor or in a sovereign data center, this is the difference between a shared service account and a per-user audit trail. Expanded model catalog: ONNX and vLLM side by side The Foundry Local model catalog now includes models in both ONNX and vLLM formats. The same model can appear multiple times in the catalog - once per runtime/compute target - so you can pick the build that matches your hardware without leaving the platform. The operator selects the right container image automatically based on the entry you reference. Broader open-model support Beyond the Phi and GPTOSS families, the catalog now includes additional models across multiple open-source lineups that customers have requested for on-prem and sovereign deployments, including Mistral and NVIDIA Nemotron. Both are available as catalog entries, served by the vLLM runtime on GPU, and accessible through the same OpenAI-compatible endpoint you already use. In collaboration with NVIDIA, Foundry Local now supports the latest Nemotron models, optimized for enterprise performance on NVIDIA powered Azure Local hardware including NVIDIA RTX Pro 6000. Nemotron models are tuned for reasoning, instruction-following, and agentic workflows, and run on the vLLM runtime with PagedAttention, continuous batching, and FP8 KV cache on compatible GPUs. The vLLM planner handles GPU memory utilization and context-length sizing automatically. you declare the catalog entry, the platform sizes the deployment to your hardware. Models available in vLLM format (see the model catalog docs for the full, regularly updated list) Model ONNX vLLM Notes Phi-4 ✓ ✓ Microsoft's flagship SLM Phi-4-mini ✓ ✓ Compact, fast inference Phi-4-mini-reasoning ✓ ✓ Chain-of-thought reasoning Phi-4-reasoning — ✓ vLLM-only, reasoning-focused gpt-oss-20b ✓ ✓ Mid-range generative gpt-oss-120b — ✓ Large generative, vLLM-only Mistral-7B-v0.2 ✓ ✓ Popular open-source LLM DeepSeek-R1 (7b/14b) ✓ — Reasoning-focused Qwen2.5 (0.5b–14b) ✓ — Multilingual, coder variants Qwen3 (0.6b–14b) ✓ — Latest generation Whisper (multiple sizes) ✓ — Speech-to-text Nemotron ✓ (CPU) ✓ The catalog now includes a growing list of models across both runtimes. Models in vLLM format are served using the vLLM engine with all its performance benefits - PagedAttention, continuous batching, FP8 KV cache - while ONNX models continue to serve on CPU or GPU through the ONNX-GenAI runtime. Bring-your-own model (BYOM) When you need a model that isn’t in the catalog, bring-your-own model still works the same way: package your model as an OCI artifact in any ORAS-compatible registry (Azure Container Registry, GitHub Container Registry, Docker Hub) and reference it from your ModelDeployment. The operator caches it locally and reuses the cached copy on subsequent deployments. Choosing the right runtime ONNX-GenAI when you're running on CPU-only hardware, serving a single application with a compact model, or need the broadest model compatibility including speech and predictive workloads. vLLM when you have GPU hardware, need to serve concurrent users, want to run larger models, or need production-grade throughput from your inference endpoint. Both runtimes expose the same OpenAI-compatible REST API - the choice is transparent to application code. vLLM ModelDeployment is as simple as this: Everything else - memory utilization, context length, batch sizing - is handled by the vLLM planner. See the model catalog docs for the BYO pattern and full configuration options. What hasn't changed Everything from the public preview remains fully supported: Two installation paths - Azure Arc extension (recommended for fleet management) and Helm chart (for platform engineers who need full control) OpenAI-compatible REST endpoints - POST /v1/chat/completions and standard patterns API key and Microsoft Entra ID authentication - secured with bearer tokens, with the per-identity RBAC model described above TLS-enabled ingress - encrypted traffic in transit Disconnected operation - models cached on local PersistentVolumes continue serving when WAN connectivity drops Bring-your-own predictive models - deploy custom ONNX models from OCI registries Multi-model orchestration - agent-style patterns coordinating multiple local models Your existing ModelDeployment manifests continue to work. Applications targeting the ONNX-GenAI runtime don't need any changes. The new capabilities are additive. Real-world scenarios, now at scale Over the past few months, we’ve partnered with customers in early preview to build and validate real-world scenarios. A consistent theme across these engagements is the need to run AI where data resides—on-premises—while maintaining the governance and consistency enabled by Azure Arc. "In energy operations, AI needs to run where the work happens – at remote facilities, offshore platforms, and field locations where connectivity is often limited, and safety is paramount. Foundry Local gives us a path to bring AI-driven decision-making closer to our operational data, with the governance our industry demands. The ability to deploy and run AI workloads consistently across edge and field environments, even when disconnected, is critical as we advance Chevron's vision for autonomous and intelligent operations." (Chevron) Ed Moore - OT Strategist and Distinguished Engineer With multi-node and vLLM, the scenarios from our initial preview scale to meet production demands: Manufacturing: multi-user quality inspection A quality-control system on a production line previously ran Phi-4-mini for single-station anomaly explanation. With vLLM's continuous batching, the same Foundry Local endpoint now serves 10+ inspection stations concurrently - each sending defect images and sensor telemetry for real-time root-cause analysis - without response-time degradation. Sovereign: identity-scoped document processing A government agency processing sensitive casework needs production-grade throughput and a strict audit trail. Foundry Local serves the workload on-premises across multiple GPU nodes, with per-analyst access enforced through Entra ID and Azure RBAC, so every inference call is tied to a real identity - and no data leaves the cluster. Energy: disconnected multi-user operations An offshore platform runs Foundry Local on a multi-node Azure Local cluster. When WAN connectivity drops, the vLLM-powered endpoint continues serving safety procedure lookups, maintenance guidance, and operational queries to multiple crew members simultaneously - each accessing the inference endpoint from their local application. API-key auth keeps working through the outage; Entra ID resumes seamlessly when the WAN comes back. Getting started If you're already running Foundry Local on Azure Local in the public preview: Once installed the Foundry Local extension is automatically kept up to date, with multi-node and vLLM support included. Browse the updated catalog to discover models available in vLLM format Deploy a vLLM model by setting runtime: vllm in your ModelDeployment manifest Let the vLLM planner optimize - override only the preferences you care about and let the planner handle the rest If you're new to Foundry Local on Azure Local: Follow the get-started code-sample blog to see the end-to-end flow Request preview deployment access to get started Read the documentation for architecture overview and deployment guide What's next Multi-node and vLLM are just the beginning. We're continuing to invest in: Distributed LLM serving with LLM-D - KV-cache-aware routing and disaggregated serving for large models that span multiple nodes Autoscaling for inference workloads - dynamic capacity that follows demand Broader model catalog expansion - more model families, more sizes, more task types Enhanced monitoring and observability for inference workloads Performance optimization for specific Azure Local hardware profiles Expanded GPU hardware validation across the Azure Local catalog We're building Foundry Local to be the production AI inference platform for edge and sovereign environments. Your feedback is shaping every release - keep it coming. Learn more: Foundry Local Model and inferencing on multi node demo Foundry Local for devices (GA) For more information reach out to the team at FoundryLocalOnAzure@microsoft.com458Views0likes0CommentsAzure Arc Server April 2026 Forum
Please find the recording for the monthly Azure Arc Server Forum on YouTube! During the April 2026 Azure Arc Server Forum, we discussed: Public Preview of Essential Machine Management, learn more at aka.ms/EMM-blog and sign up at aka.ms/EMM-feedback Engage with product group on exploration of AI on bring your own Kubernetes by signing up at aka.ms/arc-ai-survey Product group is investing in extending the Multi-cloud Connector provide customers the ability to connect their MECM environments to Azure for inventory, monitoring, and management To sign up for the Azure Arc Server Forum and newsletter, please register with contact details at https://aka.ms/arcserverforumsignup/. For the latest agent release notes, check out What's new with Azure Connected Machine agent - Azure Arc | Microsoft Learn. Our May 2026 forum will be held on Thursday, May 21 at 9:30 AM PST / 12:30 PM EST. We look forward to you joining us, thank you!263Views1like0CommentsIntroducing cert-manager for Azure Arc-enabled Kubernetes: now in Public Preview
Today we’re releasing a public preview of cert-manager for Azure Arc-enabled Kubernetes. It’s an Arc extension that automates TLS certificate and trust bundle management for edge Kubernetes clusters. If you’re running Kubernetes at the edge: in factories, retail stores, remote sites, you’ve probably hit the certificate problem already. Certificates expire. Each cluster has its own tooling. Nobody owns the renewal process until something breaks. We routinely hear from customers that certificate issues are a common source of unplanned outages and last-minute firefighting, especially as workload counts grow. This extension packages the open-source cert-manager and trust-manager into a managed Arc extension with Microsoft support. You get automated lifecycle management and trust distribution without having to run and maintain these tools yourself. What it does The extension bundles two CNCF-graduated projects: cert-manager and trust-manager, into a single Arc-K8s extension that you install once per cluster. From there: 1. You can issue, renew, and rotate certificates automatically. You do not need to manage them manually. 2. You can distribute trusted CA certificates consistently across namespaces. No more per-workload trust configuration. 3. You choose the CA issuer: built-in self-signed for dev/test, or your enterprise PKI for production. 4. The extension ships with enterprise support, regular security patches, and proactive maintenance from Microsoft team. Why we built it We built Microsoft cert-manager for Azure Arc-enabled Kubernetes to address three recurring problems we saw in real hybrid and edge environments. Problem 1: Manual certificate issuance. Many organisations still issue, install, and renew certificates through manual steps across clusters and namespaces. That creates operational overhead, slows teams down, and increases the risk of outages when certificates expire or are configured incorrectly. The answer is automation. With cert-manager running as an Arc-enabled extension, teams can automate certificate issuance, renewal, and rotation through Kubernetes-native workflows instead of relying on tickets, scripts, and manual intervention. Problem 2: Fragmented approaches to automation. Even when teams try to automate, they often end up with a mix of scripts, custom controllers, product-specific setups, and one-off operational patterns. That fragmentation makes certificate management harder to scale, harder to standardise, and harder to operate consistently across environments. The answer is to standardise on cert-manager. It provides a common, Kubernetes-native approach to certificate lifecycle management, helping teams reduce tool sprawl, align on a consistent operating model, and simplify how certificates are managed across clusters. Problem 3: Maintenance and upgrade burden for open-source cert-manager. cert-manager is a powerful open-source project, but many organisations do not want the ongoing burden of packaging, validating, patching, upgrading, and supporting it themselves as a production dependency. That can create operational risk, delay updates, and make long-term ownership unclear. The answer is a Microsoft-supported Arc-enabled extension. Microsoft cert-manager for Azure Arc-enabled Kubernetes gives customers a supported way to use cert-manager, with Microsoft handling packaging, delivery, and ongoing maintenance so teams can adopt the capability without taking on the full operational burden of managing the OSS component themselves. What’s in the public preview Here’s what you get: Certificate lifecycle automation with cert-manager: issuance, renewal, rotation, all handled for you. Trust bundle distribution with trust-manager: push trusted CA certs to every namespace that needs them. Self-signed or external CA. Start with the built-in CA, swap in your enterprise PKI when you’re ready. Secure by default. We turned on the security settings you’d want enabled anyway: TLS enforcement, least-privilege RBAC, restricted pod security. Tested at the edge. Validated on AKS Edge Essentials, AKS on Azure Local, and several third-party Kubernetes distros. Works offline. Fits into your Arc stack If you’re already running Azure IoT Operations or Azure Monitor on Arc-enabled clusters, the extension handles TLS between those services with minimal setup. No custom certificate plumbing required: install the extension and the other Arc components pick it up. Get started The extension is available now in public preview. 👉 Documentation and quickstart376Views0likes0CommentsAnnouncing Public Preview of Argo CD extension on AKS and Azure Arc enabled Kubernetes clusters
We are excited to announce public preview of the Argo CD extension for Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes clusters. As GitOps becomes the standard for deploying and operating applications at scale, enterprises need a way to implement GitOps while staying compliant with best practices for security and identity management. Argo CD extension delivers on this need across 3 pillars - Trusted Identity and Secure Access The Argo CD extension integrates with Microsoft Entra ID to provide a secure, enterprise-ready experience for: Secure authentication using Workload Identity federation to Azure Container Registry (ACR) and Azure DevOps. This removes the need for long-lived credentials or hard-coded secrets in Git Repos, moving your CD pipelines closer to a true zero-trust architecture. Single Sign-On (SSO) using existing Azure identities. Enterprise-Grade Hardening and Security This preview introduces several enhancements to improve your security posture: To minimize the attack surface, the extension’s images are built on Azure Linux, specifically engineered for reduced CVEs and improved baseline security. Opt-in to automatic patch releases to stay current on security fixes while maintaining full control over your change management processes. Parity with upstream Argo CD Argo CD extension is designed to remain fully aligned with the upstream Argo CD open‑source project, so teams can use Argo CD as they do today with support for Configuring Argo CD extension with High availability (HA) for production‑grade deployments of critical workloads. Using hub‑and‑spoke architecture for multi‑cluster GitOps scenarios. Application and ApplicationSet, enabling automated and scalable application delivery across large fleets of clusters. Getting Started We invite you to explore the Argo CD extension and provide feedback as we continue to evolve GitOps capabilities for Kubernetes. To get started today, you can enable the extension on your clusters using the Azure CLI. Argo CD extension management via the Azure Portal will be available in a few weeks.1.6KViews1like1CommentAzure IoT Operations 2603 is now available: Powering the next era of Physical AI
Industrial AI is entering a new phase. For years, AI innovation has largely lived in dashboards, analytics, and digital decision support. Today, that intelligence is moving into the real world, onto factory floors, oil fields, and production lines, where AI systems don’t just analyze data, but sense, reason, and act in physical environments. This shift is increasingly described as Physical AI: intelligence that operates reliably where safety, latency, and real‑world constraints matter most. With the Azure IoT Operations 2603 (v1.3.38) release, Microsoft is delivering one of its most significant updates to date, strengthening the platform foundation required to build, deploy, and operate Physical AI systems at industrial scale. Why Physical AI needs a new kind of platform Physical AI systems are fundamentally different from digital‑only AI. They require: Real‑time, low‑latency decision‑making at the edge Tight integration across devices, assets, and OT systems End‑to‑end observability, health, and lifecycle management Secure cloud‑to‑edge control planes with governance built in Industry leaders and researchers increasingly agree that success in Physical AI depends less on isolated models, and more on software platforms that orchestrate data, assets, actions, and AI workloads across the physical world. Azure IoT Operations was built for exactly this challenge. What’s new in Azure IoT Operations 2603 The 2603 release delivers major advancements across data pipelines, connectivity, reliability, and operational control, enabling customers to move faster from experimentation to production‑grade Physical AI. Cloud‑to‑edge management actions Cloud‑to‑edge management actions enable teams to securely execute control and configuration operations on on‑premises assets, such as invoking methods, writing values, or adjusting settings, using Azure Resource Manager and Event Grid–based MQTT messaging. This capability extends the Azure control plane beyond the cloud, allowing intent, policy, and actions to be delivered reliably to physical systems while remaining decoupled from protocol and device specifics. For Physical AI, this closes the loop between perception and action: insights and decisions derived from models can be translated into governed, auditable changes in the physical world, even when assets operate in distributed or intermittently connected environments. Built‑in RBAC, managed identity, and activity logs ensure every action is authorized, traceable, and compliant, preserving safety, accountability, and human oversight as intelligence increasingly moves from observation to autonomous execution at the edge. No‑code dataflow graphs Azure IoT Operations makes it easier to build real‑time data pipelines at the edge without writing custom code. No‑code data flow graphs let teams design visual processing pipelines using built‑in transforms, with improved reliability, validation, and observability. Visual Editor – Build multi-stage data processing systems in the Operations Experience canvas. Drag and connect sources, transforms, and destinations visually. Configure map rules, filter conditions, and window durations inline. Deploy directly from the browser or define in Bicep/YAML for GitOps. Composable Transforms, Any Order – Chain map, filter, branch, concatenate, and window transforms in any sequence. Branch splits messages down parallel paths based on conditions. Concatenate merges them back. Route messages to different MQTT topics based on content. No fixed pipeline shape. Expressions, Enrichment, and Aggregation – Unit conversions, math, string operations, regex, conditionals, and last-known-value lookups, all built into the expression language. Enrich messages with external data from a state store. Aggregate high-frequency sensor data over tumbling time windows to compute averages, min/max, and counts. Open and Extensible – Connect to MQTT, Kafka, and OpenTelemetry (OTel) endpoints with built-in security through Azure Key Vault and managed identities. Need logic beyond what no-code covers? Drop a custom Wasm module (even embed and run ONNX AI ML models) into the middle of any graph alongside built-in transforms. You're never locked into declarative configuration. Together, these capabilities allow teams to move from raw telemetry to actionable signals directly at the edge without custom code or fragile glue logic. Expanded, production‑ready connectivity The MQTT connector enables customers to onboard MQTT devices as assets and route data to downstream workloads using familiar MQTT topics, with the flexibility to support unified namespace (UNS) patterns when desired. By leveraging MQTT’s lightweight publish/subscribe model, teams can simplify connectivity and share data across consumers without tight coupling between producers and applications. This is especially important for Physical AI, where intelligent systems must continuously sense state changes in the physical world and react quickly based on a consistent, authoritative operational context rather than fragmented data pipelines. Alongside MQTT, Azure IoT Operations continues to deliver broad, industrial‑grade connectivity across OPC UA, ONVIF, Media, REST/HTTP, and other connectors, with improved asset discovery, payload transformation, and lifecycle stability, providing the dependable connectivity layer Physical AI systems rely on to understand and respond to real‑world conditions. Unified health and observability Physical AI systems must be trustworthy. Azure IoT Operations 2603 introduces unified health status reporting across brokers, dataflows, assets, connectors, and endpoints, using consistent states and surfaced through both Kubernetes and Azure Resource Manager. This enables operators to see—not guess—when systems are ready to act in the physical world. Optional OPC UA connector deployment Azure IoT Operations 2603 introduces optional OPC UA connector deployment, reinforcing a design goal to keep deployments as streamlined as possible for scenarios that don’t require OPC UA from day one. The OPC UA connector is a discrete, native component of Azure IoT Operations that can be included during initial instance creation or added later as needs evolve, allowing teams to avoid unnecessary footprint and complexity in MQTT‑only or non‑OPC deployments. This reflects the broader architectural principle behind Azure IoT Operations: a platform built for composability and decomposability, where capabilities are assembled based on scenario requirements rather than assumed defaults, supporting faster onboarding, lower resource consumption, and cleaner production rollouts without limiting future expansion. Broker reliability and platform hardening The 2603 release significantly improves broker reliability through graceful upgrades, idempotent replication, persistence correctness, and backpressure isolation—capabilities essential for always‑on Physical AI systems operating in production environments. Physical AI in action: What customers are achieving today Azure IoT Operations is already powering real‑world Physical AI across industries, helping customers move beyond pilots to repeatable, scalable execution. Procter & Gamble Consumer goods leader P&G continually looks for ways to drive manufacturing efficiency and improve overall equipment effectiveness—a KPI encompassing availability, performance, and quality that’s tracked in P&G facilities around the world. P&G deployed Azure IoT Operations, enabled by Azure Arc, to capture real-time data from equipment at the edge, analyze it in the cloud, and deploy predictive models that enhance manufacturing efficiency and reduce unplanned downtime. Using Azure IoT Operations and Azure Arc, P&G is extrapolating insights and correlating them across plants to improve efficiency, reduce loss, and continue to drive global manufacturing technology forward. More info. Husqvarna Husqvarna Group faced increasing pressure to modernize its fragmented global infrastructure, gain real-time operational insights, and improve efficiency across its supply chain to stay competitive in a rapidly evolving digital and manufacturing landscape. Husqvarna Group implemented a suite of Microsoft Azure solutions—including Azure Arc, Azure IoT Operations, and Azure OpenAI—to unify cloud and on-premises systems, enable real-time data insights, and drive innovation across global manufacturing operations. With Azure, Husqvarna Group achieved 98% faster data deployment and 50% lower infrastructure imaging costs, while improving productivity, reducing downtime, and enabling real-time insights across a growing network of smart, connected factories. More info. Chevron With its Facilities and Operations of the Future initiative, Chevron is reimagining the monitoring of its physical operations to support remote and autonomous operations through enhanced capabilities and real-time access to data. Chevron adopted Microsoft Azure IoT Operations, enabled by Azure Arc, to manage and analyze data locally at remote facilities at the edge, while still maintaining a centralized, cloud-based management plane. Real-time insights enhance worker safety while lowering operational costs, empowering staff to focus on complex, higher-value tasks rather than routine inspections. More info. A platform purpose‑built for Physical AI Across manufacturing, energy, and infrastructure, the message is clear: the next wave of AI value will be created where digital intelligence meets the physical world. Azure IoT Operations 2603 strengthens Microsoft’s commitment to that future—providing the secure, observable, cloud‑connected edge platform required to build Physical AI systems that are not only intelligent, but dependable. Get started To explore the full Azure IoT Operations 2603 release, review the public documentation and release notes, and start building Physical AI solutions that operate and scale confidently in the real world.820Views3likes0CommentsAnnouncing the General Availability of the Azure Arc Gateway for Arc-enabled Kubernetes!
We’re excited to announce the General Availability of Arc gateway for Arc‑enabled Kubernetes. Arc gateway dramatically simplifies the network configuration required to use Azure Arc by consolidating outbound connectivity through a small, predictable set of endpoints. For customers operating behind enterprise proxies or firewalls, this means faster onboarding, fewer change requests, and a smoother path to value with Azure Arc. What’s new: To Arc‑enable a Kubernetes Cluster, customers previously had to allow 18 distinct endpoints. With Arc gateway GA, you can do the same with just 9, a 50% reduction that removes friction for security and networking teams. Why This Matters Organizations with strict outbound controls often spend days, or weeks, coordinating approvals for multiple URLs before they can onboard resources to Azure Arc. By consolidating traffic to a smaller set of destinations, Arc gateway: Accelerates onboarding for Arc‑enabled Kubernetes by cutting down the proxy/firewall approvals needed to get started. Simplifies operations with a consistent, repeatable pattern for routing Arc agent and extension traffic to Azure. How Arc gateway works Arc gateway introduces two components that work together to streamline connectivity: Arc gateway (Azure resource): A single, unique endpoint in your Azure tenant that receives incoming traffic from on‑premises Arc workloads and forwards it to the right Azure services. You configure your enterprise environment to allow this endpoint. Azure Arc Proxy (on every Arc‑enabled Kubernetes Cluster): A component of the Arc K8s agent that routes agent and extension traffic to Azure via the Arc gateway endpoint. It’s part of the core Arc agent; no separate install is required. At a high level, traffic flows: Arc-enabled Kubernetes agent → Arc Proxy → Enterprise Proxy → Arc gateway → Target Azure service. Scenario Coverage As part of this GA release, Arc-enabled Kubernetes Onboarding and other common Arc‑enabled Kubernetes scenarios are supported through Arc gateway, including: Arc-enabled Kubernetes Cluster Connect Arc-enabled Kubernetes Resource View Custom Location Azure Policy's Extension for Azure Arc For other scenarios, including Microsoft Defender for Containers, Azure Key Vault, Container Insights in Azure Monitor, etc., some customer‑specific data plane destinations (e.g., your Log Analytics workspaces, Storage Accounts, or Key Vault URLs) still need to be allow‑listed per your environment. Please consult the Arc gateway documentation for the current scenario‑by‑scenario coverage and any remaining per‑service URLs. Get started Create an Arc gateway resource using the Azure portal, Azure CLI, or PowerShell. Allow the Arc gateway endpoint (and the small set of core endpoints) in your enterprise proxy/firewall. Onboard or update clusters to use your Arc gateway resource. For step‑by‑step guidance, see the Arc gateway documentation on Microsoft Learn. FAQs Does Arc gateway require new software on my clusters? No additional installation - Arc Proxy is part of the standard Arc-enabled Kubernetes Agent. Will every Arc scenario route through the gateway today? Arc-enablement, and other scenarios are covered at GA; some customer‑specific data plane endpoints (for example, Log Analytics workspace FQDNs) may still need to be allowed. Check the docs for the latest coverage details. What is the status of Arc gateway for other infrastructure types? Arc gateway is already GA for Arc-enabled Servers, and Azure Local. Tell us what you think We’d love your feedback on Arc gateway GA for Kubernetes - what worked well, what could be improved, and which scenarios you want next. Use the Arc gateway feedback form to share your input with the product team.1.1KViews3likes0Comments