azure arc
295 TopicsUnlocking the Human Telemetry Layer for Safer Industrial Operations
What if we could track human health & safety conditions as precisely as we do with machines, and take immediate actions to protect our greatest asset, our people? Many industrial organizations still lack visibility into real-time human conditions, even as worker safety and operational risk remain major investment priorities. One of the most important operational signals has largely remained outside the industrial data estate: the human telemetry. VOORMI and Microsoft have joined forces to fill this gap in understanding real human conditions. Through the Mij™ platform, VOORMI brings human telemetry into Azure IoT, enabling enterprises to integrate worker conditions such as heat stress and fatigue into the same operational architecture already used for machines and industrial systems. VOORMI, SWNR’s performance apparel brand, is among the first to bring this technology into garments designed for real industrial field conditions. This integration brings their proprietary wearable technology directly into high-impact worker safety and field operations scenarios. The partnership helps establish a new telemetry layer for industrial operations, allowing human, machine, and environmental signals to converge and drive safer operations, real-time awareness, and adaptive AI workflows. Bringing the Human Signal into Industrial AI with Azure Industrial organizations increasingly recognize that many safety, productivity, and operational challenges occur at the intersection of people and machines. Workers operate in high-heat environments, hazardous conditions, remote sites, and physically demanding field scenarios where situational awareness matters in real time. Historically, worker telemetry has remained fragmented across proprietary wearable platforms and disconnected safety systems, creating governance and operational challenges for enterprise IT and OT teams. Mij™ is designed differently, integrating directly into customer-controlled Azure environments through Azure IoT Operations running at the edge or Azure IoT Hub in the cloud rather than introducing another isolated platform. Running intelligence at the edge enables virtual safety agents and operational workflows to execute closer to the worker, supporting low-latency responses, local interaction with OT systems, and operational resilience even in disconnected or bandwidth-constrained environments. This gives enterprises flexibility to support real-time worker safety responses at the edge while also enabling long-term analytics, reporting, and operational intelligence through Microsoft Fabric. Telemetry from garment-integrated sensors flows through edge gateways into Azure services including Azure IoT Operations, Azure Data Explorer, Azure Managed Grafana, and Microsoft Fabric. The result is a unified operational environment where worker telemetry can live beside machine, site, and environmental data under the customer’s existing identity, security, governance, and analytics model. The vision is simple and transformative: make human telemetry a trusted, first-class industrial data source. Azure Digital Operations as the Intelligence Layer The reference architecture demonstrates how Azure IoT Operations can serve as a scalable operational intelligence layer for worker safety and connected operations scenarios across manufacturing, energy, and field environments. Mij™-enabled garments broadcast Bluetooth Low Energy (BLE) telemetry that can be processed locally through edge gateways and routed into Azure IoT Operations using MQTT and dataflows. Data is then operationalized through Azure Data Explorer and visualized using Azure Managed Grafana dashboards for field operations, worker safety, fleet health, gateway monitoring, and operational readiness scenarios. Telemetry can also be made available to Foundry Local-hosted GenAI agents to support real-time, context aware safety guidance, such as prompting workers operating in high-heat conditions to hydrate or seek cooler environments. While Mij™-enabled garments are the initial implementation, the edge device-to-cloud architecture creates a broader onboarding point for additional wearable, sensor, and field telemetry scenarios over time. This allows enterprises to bring more human and operational signals into a unified Azure-native operational environment. The architecture also supports flexible ingestion patterns for environments where dedicated edge gateways are not practical. Using Microsoft Entra External ID, Azure Container Apps, and Azure IoT Hub, telemetry can securely flow into Azure services without exposing operational infrastructure credentials to client devices. This pattern aligns with the broader Azure adaptive cloud approach: enabling customers to run distributed edge-native services on Arc-enabled Kubernetes infrastructure while maintaining centralized security, governance, and analytics capabilities across the enterprise. Depending on customer architecture preferences, telemetry can be processed through Azure IoT Operations at the edge or ingested directly through Azure IoT Hub for cloud-first analytics and downstream processing in services such as Microsoft Fabric. Edge processing also enables real-time sensor fusion across worker telemetry, ambient environmental conditions, machine parameters, and site-level operational signals, supporting faster safety interventions and more context-aware operational decisions. This gives enterprises flexibility in how they balance edge processing, operational responsiveness, governance and privacy requirements. Enabling the Next Generation of Industrial Workflows The long-term opportunity extends well beyond visualization dashboards. As worker telemetry becomes part of the operational fabric, enterprises can begin building more adaptive and intelligent workflows across worker safety, field readiness, incident response, compliance, environmental monitoring, and industrial AI systems. Human telemetry can provide critical real-time context that complements machine and environmental signals enabling more responsive operations and eventually more autonomous decision-support experiences. By bringing human telemetry into enterprise AI and analytics workflows, organizations can build more adaptive operational systems that improve worker safety, situational awareness, and real-time decision making at scale. This partnership reflects a broader industry shift: industrial transformation is no longer only about connected machines. It is about connected operations where people, equipment, environments, and AI systems participate in a shared operational intelligence layer. With SWNR’s Mij™platform and Azure IoT Operations, Microsoft and VOORMI are helping unlock that future. Learn more: Mij™ product page: https://swnrtechnologies.com/pages/mij Learn more about Azure IoT Operations: Documentation & Getting Started See what’s new with Azure IoT Hub: Preview Documentation To get started with a pilot, contact: pilots@swnrtechnologies.com58Views1like0CommentsAnsible + Azure Arc: Use Ansible modules to deploy and manage Azure Arc machine extensions at scale
We are making Azure Arc extensible and increasing the flexibility of the tooling you can use to operate your machines using Azure’s control plane. We are excited to announce new modules in Ansible Galaxy that make it easier to manage Azure Arc machine extensions at scale. With the latest updates to the azure.azcollection on Ansible Galaxy, you no longer need to switch between existing tools. You can now deploy and manage Azure Arc extensions using familiar, declarative Ansible workflows. These new modules include: Azure Arc machine extensions module Azure Arc extensions info module Together, they enable infrastructure and platform teams to automate extension lifecycle management across their hybrid estate—bringing consistency, security, and efficiency to Azure Arc-enabled servers. Why this matters Azure Arc machine extensions power critical scenarios such as security, monitoring, update management, configuration and compliance. Until now, managing these Azure Arc extensions across hybrid estates often required Azure CLI scripts, ARM templates, or manual operations. With these new Ansible modules, you can: Integrate Azure Arc extension management into existing Ansible playbooks Enforce consistent configuration across hybrid servers Reduce operational overhead through declarative automation Align extension deployment with broader configuration management workflows What’s included azure_rm_arcmachineextensions This module allows you to manage the full lifecycle of Azure Arc machine extensions, including: Creating and deploying extensions Updating extension settings Removing extensions when no longer needed You can define extension state declaratively, ensuring consistent enforcement across your Azure Arc-enabled servers. azure_rm_arcmachineextensions_info This module provides visibility into extension state by retrieving: Installed extensions on Azure Arc-enabled machines Provisioning status and configuration details Extension metadata for reporting and validation This is useful for compliance validation, auditing, and conditional automation in playbooks. Scenario: Enforcing identity-based SSH access across a hybrid fleet Consider a regulated enterprise that must ensure all Linux servers—whether on-premises or in a multicloud environment—use Microsoft Entra ID for SSH access. The organization wants to: Eliminate local SSH credentials Enforce centralized identity and access controls Audit access consistently across all environments By combining Azure Arc with Ansible, the organization can deploy the Microsoft Entra SSH for Linux extension across all Azure Arc-enabled servers as part of a standardized playbook, ensuring compliance and reducing operational overhead. Example: Deploy Microsoft Entra SSH for Linux extension Below is an example of using Ansible to deploy the Microsoft Entra SSH extension to an Azure Arc-enabled server: - name: Deploy Entra SSH extension to Arc server hosts: localhost connection: local tasks: - name: Install Entra SSH extension for Linux azure_rm_arcmachineextensions: resource_group: myResourceGroup machine_name: myArcServer name: AADSSHLoginForLinux publisher: Microsoft.Azure.ActiveDirectory type: AADSSHLoginForLinux type_handler_version: "1.0" settings: {} state: present Example: Retrieve extension information Below is an example of using Ansible to retrieve details about your Azure Arc extensions: - name: Get Arc machine extension details hosts: localhost connection: local tasks: - name: Fetch extensions azure_rm_arcmachineextensions_info: resource_group: myResourceGroup machine_name: myArcServer Integrating with existing Ansible workflows If you’re already using Ansible for: OS configuration Patch and update management Application deployment You can now extend those workflows to include Azure Arc extension management—without introducing new tools or processes. This allows you to manage on-premises servers, Edge infrastructure and multicloud environments through a unified automation approach powered by Azure Arc and Ansible. Read more at Enable VM Extensions Using Red Hat Ansible - Azure Arc | Microsoft Learn What’s next These modules are part of our continued investment in making Azure Arc a first-class platform for managing Windows and Linux machines in hybrid and multicloud infrastructure. By bringing extension lifecycle management into Ansible, we’re enabling teams to enforce security, compliance, and operational consistency at scale—using the tools they already trust. Stay connected Join the Azure Arc Monthly Forum here: aka.ms/ArcServerForumSignup Let us know what you’d like to see next in the comments!462Views0likes0CommentsResource Guide: Making Physical AI Practical for Real‑World Industrial Operations
Microsoft’s adaptive cloud approach enables organizations to turn operational technology (OT) data into intelligent actions, autonomously, without requiring everything to live in the cloud by unifying cloud-to-edge management plane, data plane, and intelligence platform. At the center of this approach are key foundational technologies: Key Purpose Offering Direct-to-cloud device management + telemetry ingestion Azure IoT Hub Industrial connectivity + edge data plane Azure IoT Operations Unified analytics + real-time intelligence Microsoft Fabric On-device AI inferencing runtime Foundry Local Microsoft Azure IoT Gartner winner: Microsoft named a Leader in the 2025 Gartner® Magic Quadrant™ for Global Industrial IoT Platforms See it all come together Before diving into each component, watch this end-to-end demo showing how Azure IoT Operations, Azure IoT Hub, Microsoft Fabric, and Foundry Local work as one stack across the edge-to-cloud lifecycle - Making industrial AI practical for real-world operations with adaptive cloud. How these components work together Azure IoT Operations and Azure IoT Hub collect real-time data from operational assets and send semantically-ready, modeled data to Microsoft Fabric, where it's contextualized with enterprise data for downstream analytics. Microsoft Foundry extends to the edge through Foundry Local, so the same tooling used to deploy and manage AI models in the cloud applies to edge use cases. All of it integrates into Azure Resource Manager, bringing OT devices, assets, and edge AI models into the same management and security paradigm as every other Azure-managed resource. This blog walks through where to get started with each product capability: 1. Manage Cloud-Connected Devices and Telemetry with Azure IoT Hub Azure IoT Hub is a fully managed cloud service that enables secure bidirectional communication, device-to-cloud telemetry ingestion, cloud-to-device command execution, per-device authentication, remote management and more. Telemetry from IoT Hub can also be routed downstream into analytics platforms like Microsoft Fabric for visualization or AI modeling. Recommended Usage: Devices that utilize IoT Hub are distributed, stand-alone devices with fixed-functions. These devices typically do not require cloud-managed containerized workloads or cloud-managed proximal industrial protocol connectivity. Examples of appropriate device-to-cloud IoT Hub endpoint devices include water monitoring stations, vehicle telematics, distributed fluid level sensors, etc. Resources Current in-market services overview: IoT Hub: What is Azure IoT Hub? - Azure IoT Hub DPS: Overview of Azure IoT Hub Device Provisioning Service - Azure IoT Hub Device Provisioning Service ADU: Introduction to Device Update for Azure IoT Hub Building scalable solutions with Azure IoT platform: Best practices for large-scale IoT deployments - Azure IoT Hub Device Provisioning Service Scale Out an Azure IoT Hub-based Solution to Support Millions of Devices - Azure Architecture Center Azure IoT Hub scaling Try out our preview of new IoT Hub capabilities (integration with Azure Device Registry and Certificate Management) Learn more about these capabilities on our blog post: Azure IoT Hub + Azure Device Registry (Preview Refresh): Device Trust and Management at Fleet Scale… Integration with Azure Device Registry (preview): Integration with Azure Device Registry (preview) - Azure IoT Hub Microsoft-backed X.509 certificate management (preview): What is Microsoft-backed X.509 Certificate Management (Preview)? - Azure IoT Hub How to start with the preview: Deploy IoT Hub with ADR integration and certificate management (Preview) - Azure IoT Hub 2. Connect Industrial Assets with Azure IoT Operations Azure IoT Operations provides a unified data plane for the edge that runs on Azure Arc–enabled Kubernetes clusters and supports open industrial standards. It allows organizations to connect and capture equipment telemetry, normalize OT data locally, route hot-path signals to real-time analytics, securely manage layered industrial networks, and more. Edge‑processed data can then be sent upstream to Microsoft Fabric for AI‑driven analysis. Recommended Usage: Azure IoT Operations is intended to be the data plane for an adaptive cloud deployment extending the management, data, and AI capabilities of the Microsoft cloud to an on-prem device. This device binds to these cloud planes providing a platform for local data processing and intermittent connectivity. The target for these devices range from a small-gateway-style PC to a full data center. Azure IoT Operations endpoints enable cloud-managed containerized workloads and cloud-managed proximal industrial protocol connectivity. Examples of appropriate adaptive cloud and Azure IoT Operations endpoints include, on-robot computers, industrial machine controllers, retail store sensor/vision processing, and top-of-factory site infrastructure for line of business applications. Resources Azure IoT Operations Overview Azure IoT Operations Documentation Hub Quickstart: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Open-source framework for scaling robotics from simulation to production on Azure + NVIDIA: microsoft/physical-ai-toolchain Demo video showcasing this in action: Making industrial AI practical for real-world operations with adaptive cloud How we built the demo: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Edge-AI: microsoft/edge-ai: Production-ready Infrastructure as Code, applications, pluggable components, and… Latest Announcements & Blogs Making Physical AI Practical for Real-World Industrial Operations: Part 1 | Microsoft Community Hub Making Physical AI Practical for Real-World Industrial Operations: Part 2 | Microsoft Community Hub Unlock Industrial Intelligence | Microsoft Hannover Messe 2026 From pilots to production: How Microsoft and partners are accelerating intelligent operations 3. Advanced Analytics with Microsoft Fabric Microsoft Fabric delivers a unified, end‑to‑end analytics platform that transforms streaming OT telemetry into real‑time insights and live dashboards. Fabric Operations Agents monitor industrial signals to recommend targeted actions, while Fabric IQ provides a shared semantic foundation that enables AI agents to reason over enterprise data with business context. Together, Fabric turns live industrial data into AI‑powered operational intelligence. Resources Get Started with Microsoft Fabric Learning Path Fabric Real-Time Intelligence documentation - Microsoft Fabric | Microsoft Learn Create and Configure Operations Agents - Microsoft Fabric | Microsoft Learn Fabric IQ documentation - Microsoft Fabric | Microsoft Learn 4.Run AI Models On‑Device with Foundry Local Foundry Local extends on‑device AI to Arc‑enabled Kubernetes edge clusters, providing a Microsoft‑validated inferencing layer for running AI models in industrial, disconnected or sovereign environments. Resources Foundry Local on Azure Local Documentation Participate in Foundry Local on Azure Local preview form Foundry Local on Azure Local: HELM deployment Demo Customer Stories Chevron: Chevron plans facilities of the future with Azure IoT Operations Husqvarna: Husqvarna Group Boosts Operational Efficiency with Azure Adaptive Cloud Ecopetrol: Azure IoT Operations and Azure IoT for energy help Ecopetrol optimize energy distribution while lowering operational costs P&G: Procter & Gamble cuts model deployment time up to 90% with Azure IoT Operations Toyota: Toyota Industries innovates its paint shop processes with Azure industrial AI and Azure IoT Hub750Views1like0CommentsYour first model deployment on Foundry Local on Azure Local: from catalog to inference in 10 minutes
Foundry Local on Azure Local lets you run open-source models directly on your own Azure Local cluster, behind an OpenAI-compatible AP. It's the same experience you've gotten used to in the cloud, but the inference runs on hardware you own. Foundry Local on Azure Local is in public preview at the time of this writing. You've installed Foundry Local on your Azure Local cluster. The operator's pods are running, the CRDs are registered, you've checked it twice with kubectl get pods . Now what? This blog covers the part that comes right before all of that - the lifecycle pattern you'll use to deploy any model on Foundry Local on Azure Local. Our recent announcement covers the bigger picture: multi-node inference, vLLM as a first-class runtime alongside ONNX-GenAI, and an expanded catalog. We'll keep this walkthrough single-node for clarity, but the same ModelDeployment pattern scales without changes to your client code or workflow. By the end of this walkthrough, you'll have gone from an empty kubectl prompt to a working, OpenAI-compatible inference endpoint serving Phi-4. All in about ten minutes, using nothing but kubectl , Python, and a small sample script. We'll also show you how to switch that same flow to the new vLLM runtime by changing roughly five lines of YAML. All the code lives in Azure-Samples/foundry-local-model-catalog. Clone it and follow along. What you'll build The sample walks through five steps, each driven by the same Python script with different flags: Query the model catalog - read the ConfigMap the operator syncs from the Microsoft Foundry catalog API. Deploy a model - create a ModelDeployment custom resource pointing at one catalog entry. Wait for ready - the operator pulls the model image, schedules pods, and reports state. Run inference - call the OpenAI-compatible /v1/chat/completions endpoint with an API key the operator generates for you. Clean up - delete the deployment. The same five steps apply whether you're serving an ONNX model on a CPU node or a vLLM model on a GPU node. We'll start with the simpler path: Phi-4 on CPU on the ONNX runtime, then show you the vLLM variant at the end. Before you start You'll need: An Azure Local cluster (or any Arc-enabled Kubernetes cluster) with the Foundry Local extension installed. If you haven't set this up yet, the Foundry Local on Azure Local install guide walks through the cluster, extension, and resource requirements. kubectl configured against that cluster, with permissions to read ConfigMaps and Secrets and to create ModelDeployment resources in the foundry-local-operator namespace. Python 3.9 or later. Verify the operator is alive before you go further: kubectl get pods -n foundry-local-operator kubectl get crd | grep foundry You should see operator pods in Running state and at least one CRD named modeldeployments.foundrylocal.azure.com . If you don't, the install docs are the right place to back up to. Then clone and install: git clone https://github.com/Azure-Samples/foundry-local-model-catalog.git && cd foundry-local-model-catalog && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt Step 1 - See what's in your catalog Start with the lowest-risk command in the sample, it talks to the cluster but changes nothing: python catalog_sample.py --catalog-only A few things to notice in that table: Most models appear more than once. The same model gets packaged for different runtime/hardware combinations, and the operator picks the right container image based on which entry you reference. There's a RUNTIME column. We'll come back to vLLM at the end of this post; for now, the default Phi-4-generic-cpu ONNX entry is what we'll deploy. The catalog itself is just a ConfigMap. No magic, no hidden registry. The operator syncs it from the Microsoft Foundry catalog API on a CronJob, and the sample reads it the same way you would: kubectl get configmap foundry-local-catalog -n foundry-local-operator -o yaml If you ever wonder what's actually available on your cluster, that's the source of truth. Step 2 - Deploy a model Now the side-effecting part. We're going to ask the operator to deploy Phi-4 on CPU: python catalog_sample.py --deploy-only Under the hood, the sample builds and applies a ModelDeployment manifest that looks like this: apiVersion: foundrylocal.azure.com/v1 kind: ModelDeployment metadata: name: phi-4-generic-cpu namespace: foundry-local-operator spec: model: catalog: name: Phi-4-generic-cpu workloadType: generative compute: cpu replicas: 1 port: 5000 The operator takes it from there. It pulls the model container image, schedules a pod, generates an API-key Secret named phi-4-generic-cpu-api-keys , and walks the deployment through Pending → Creating → Running states. The sample polls until both status.state == Running and status.deploymentReady == true . Step 3 - Run inference The endpoint is up. Time to actually use it. If you're running this script from inside the cluster (say, from a debug pod), the endpoint lives at the in-cluster service DNS and the sample picks that up automatically. Most readers will be running from a laptop, though, so we'll cover that path explicitly. In one terminal, port-forward the deployment's service: kubectl port-forward svc/phi-4-generic-cpu 5000:5000 -n foundry-local-operator In another, run the sample's inference-only mode against the forwarded endpoint: python catalog_sample.py --infer-only --endpoint https://localhost:5000 --insecure The sample reads the API key from the auto-generated Secret and sends it as Authorization: Bearer <key>. same pattern as cloud OpenAI. The request body is the standard OpenAI chat-completions shape: { "model": "Phi-4-generic-cpu:1.0.0", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France? Reply in one sentence."} ], "max_tokens": 256 } That response just traveled from your terminal, through kubectl port-forward , into a model serving inside your Azure Local cluster, and back. Step 4 - Clean up The sample's default flow deletes the deployment on its way out, so if you ran the full python catalog_sample.py (no flags) you're already clean. If you used --deploy-only or --skip-cleanup , drop it explicitly: kubectl delete modeldeployment phi-4-generic-cpu -n foundry-local-operator The operator garbage-collects the pod, service, and API-key Secret. The cached model image stays on its PersistentVolume, so your next deploy of the same model skips the image pull. Going beyond CPU: vLLM in 5 lines Now for the variant we've been pointing at. The same ModelDeployment CR, the same OpenAI-compatible endpoint, switched to a runtime built for concurrent users, demonstrated on a popular open-source model. The diff against the manifest from Step 2: spec: - compute: cpu + compute: gpu + runtime: vllm model: catalog: - name: Phi-4-generic-cpu + name: Mistral-7B-v0.2 The same sample script handles it: python catalog_sample.py \ --model Mistral-7B-v0.2 \ --compute gpu \ --runtime vllm What you get back is the same OpenAI-compatible endpoint your client code already knows, but now backed by vLLM's PagedAttention, continuous batching, and automatic planner-tuned configuration. Your application code doesn't need to know any of that; it's still POST /v1/chat/completions with a Bearer token. That's the point of the lifecycle pattern: ONNX or vLLM, CPU or GPU, the platform engineer's deployment loop looks the same. For the architectural "why" behind multi-node, vLLM, and the expanded catalog, read the announcement. What you've built Ten minutes ago you had an installed-but-empty Foundry Local cluster. You now have: A working OpenAI-compatible chat endpoint serving Phi-4. A clear sense of what the operator manages for you ( ModelDeployment CR, model image cache, API-key Secret) and what stays in your hands (which model, which runtime, which compute target). A small, modular script you can wire into three things: A smoke test for every new cluster you stand up ( --catalog-only → --deploy-only → --infer-only is a one-line CI step). An internal demo when you need to show a team that Foundry Local on Azure Local is real and reachable. The foundation for your team's own deployment automation - copy the manifest builder, drop the CLI, and wire it into your existing GitOps or platform tooling. From endpoint to chat surface The model deployment above exposes a standard OpenAI-compatible API - enough for any existing chat client to point at. If you'd like to see exactly that, Azure-Samples/local-chat-with-foundry-local picks up where this walkthrough ends: it wires the running endpoint into the Sovereign Chat Experience starter UI. About ten more minutes from working endpoint to working chat in the browser. Where to go from here Try the rest of the sample: --catalog-only , --deploy-only , --infer-only , and --skip-cleanup compose into whichever workflow you're testing. Read the announcement for the architectural why behind multi-node, vLLM, and the expanded catalog. Read the docs for the full operator and CRD reference. Feel free to share your feedback with us at FoundryLocalOnAzure@microsoft.com. The product is in public preview, so your feedback shapes what ships next.287Views0likes0CommentsBuild, deploy, and govern sovereign AI with Foundry Local on Azure Local
Not every AI workload can run in the cloud. For many of our customers, data needs to stay within defined boundaries, connectivity may be limited or absent, and latency, governance, and auditability are non-negotiable. With Foundry Local on Azure Local, you can use the same model catalog, developer workflows, and governance capabilities you know from Azure, while running AI entirely within your own environment where your data resides. Foundry Local provides the model catalog and developer experience. Azure Local provides the customer-managed infrastructure. Azure Arc provides unified policy, governance, and lifecycle management across cloud and local environments. This gives developers a consistent way to build, deploy, and operate AI. The same az commands, the same model catalog, the same Arc policies, all running on hardware you control. Expansion of Foundry Local on Azure Local We're expanding the Foundry Local model offering on Azure Local, with support for multi-node deployments and new agents and tools that run locally, in preview. Deploy and run AI models locally. Run models with Foundry Local in customer-managed environments on Azure Local, across sovereign, private, and edge scenarios, including fully disconnected operation. Choose from a flexible, high-performance model catalog. Access proprietary and community models through Foundry Local, now expanded with vLLM-optimized models alongside ONNX-based offerings. You explore and deploy through the same catalog API experience, then operate locally on Azure Local. Build for production realities. Bring governance, identity, and auditability into your applications while keeping execution inside your controlled boundary. See what’s new in Foundry Local on Azure Local in the Tech Community blog. From intelligence to action: agents and tools inside the enterprise boundary Most production AI use cases need two things: grounded answers and the ability to act on them, without sending data outside the environment. Here's how we're enabling that locally. Preview: Agentic retrieval with Foundry Local: Ground agents in enterprise data using retrieval-augmented generation across local Microsoft 365 services, including Exchange and SharePoint. Read the Tech Community blog to learn more. Preview: Agents and tools with Foundry Local: Build AI systems that reason, retrieve information, and take action within customer-controlled environments. Learn more. Preview: Developer acceleration templates: Jump-start local AI application development with new Foundry solution templates, including local chat experiences and video agents, powered by Azure AI Video Indexer. Read the Tech Community to learn more. GitHub Enterprise Local: Now available in public preview Sovereign AI is also about how systems are built and secured, not just where they run. With GitHub Enterprise Local on Azure Local, you can bring your full software development lifecycle on-premises: Source control and repositories CI/CD pipelines Security and DevSecOps workflows GitHub Enterprise Local deploys entirely within customer-owned infrastructure, so teams get the developer tools they expect without compromising on data residency or operational control. This extends modern DevSecOps practice into sovereign environments and pairs naturally with the AI development workflows above: build, secure, and ship your AI applications within the same boundary where they run. Read the tech community blog to learn more about GitHub Enterprise Local and how to join the preview. Accelerating High-performance AI at the Edge with NVIDIA We are expanding our collaboration with NVIDIA to deliver high-performance AI capabilities directly at the edge. At Build, we are bringing: Azure Local and Foundry Local on NVIDIA-powered GPUs, including NVIDIA RTX PRO 6000 Blackwell Server Edition, with expanded GPU support coming soon Integration with Nemotron models, optimized for enterprise performance A scalable foundation for data-intensive, low-latency workloads This partnership ensures that organizations can run advanced AI workloads where data is generated - without dependency on centralized cloud infrastructure. Hardware options: AI factory configurations are available now in the catalog Alongside our hardware partners, we’re bringing integrated solutions to customers building AI within sovereign environments. The Azure Local hardware catalog now includes AI factory configurations from our OEM partners, including NVIDIA-certified 8xH100 systems, with options from DataON, Dell, HPE, and Lenovo. These configurations are sized for the performance that model serving and agentic workloads require on customer-managed infrastructure. Together with Microsoft, we are advancing sovereign AI by bringing the open NVIDIA Nemotron model family to Microsoft Foundry Local on Azure Local. This collaboration gives organizations a production-ready AI platform that enables them to deploy AI where their data resides while maintaining the governance, control, and performance needed to scale AI across the enterprise.” Kari Briski, VP Generative AI Software Products, NVIDIA ”Sovereign AI is becoming increasingly important for governments, regulated industries, and enterprises that want to use AI while maintaining control of their data, location, and operations. Lenovo’s ThinkAgile MX Series delivers trusted, enterprise-grade infrastructure with global deployment expertise to help customers run AI wherever their data resides. Co-engineered with Foundry Local and Azure Local, this solution provides an optimized platform to deploy, run, and scale AI locally with greater simplicity, consistency, and control, while helping meet strict data residency, security, and compliance requirements." Scott Patti - VP Infrastructure Solutions Group (ISG), Lenovo From AI models to trusted, mission-critical systems: what this unlocks for developers and operators AI is evolving from systems that answer questions to systems that plan, reason, and take action across workloads. These capabilities move AI from a cloud-only assumption to something you can deploy where sensitive work actually happens, with governance and operational controls intact. For our customers, this means you can now: Keep data, identities, and audit trails inside your sovereign boundary. Run AI inference and agentic workloads in connected, intermittently connected, or fully disconnected modes. Apply consistent policy and governance across cloud and local environments through Azure Arc. Use the same Foundry catalog and developer experience you already know, on infrastructure you own. Build, secure, and ship your AI applications with GitHub Enterprise Local, keeping source control, CI/CD, and DevSecOps workflows inside the same sovereign boundary. Resources Join us at Build OD837 Shipping physical AI to the edge with Azure Local and Foundry Local https://github.com/microsoft/build26-OD837 OD839 Foundry Local: AI solutions for industrial and sovereign needs https://github.com/microsoft/build26-OD839 LTG425 Expanding horizons: Foundry Local for devices and on-prem https://build.microsoft.com/en-US/sessions/LTG425 Request to join the Foundry Local on Azure Local preview Hands-on walkthrough: Your first model deployment on Foundry Local on Azure Local: from catalog to inference in 10 minutes | Microsoft Community Hub Read our Tech Community blogs: Foundry Local announcing multi-node and vLLM support Agentic Retrival with Foundry Local blog: https://aka.ms/AgentsAndToolsBuildBlog2026 Code sample / model catalog blog: https://aka.ms/foundry-local-model-catalog-blog For more details on the expanded capabilities of Foundry Local for highly secure environments, contact your Microsoft account team Discover Microsoft Sovereign Cloud Explore product documentation at: Foundry Local models on Azure Local: https://aka.ms/FoundryLocalonAzureLocal_documentation Local Agentic retrieval with Foundry Local: https://aka.ms/edge-agentic-retrieval-docs538Views0likes0CommentsUnlock On-Prem Productivity with Agentic Retrieval in Foundry Local
In today’s connected world, customers expect instant, context-rich interactions, even in environments where cloud connectivity isn’t guaranteed. That’s where Retrieval-Augmented Generation at the edge comes in. Since we launched into public preview, we’ve watched teams across regulated, disconnected, and mission-critical environments push this technology into places cloud GenAI simply couldn’t reach. What we heard back shaped everything in this release: customers don’t just want retrieval. They want reasoning, they want agency, and they want an end-user experience that feels as natural as the one they already use in the cloud. Today at Build 2026, we're excited to introduce Agentic Retrieval, the next evolution of our on-prem RAG platform, enabled by Azure Arc and powered by Foundry language models. Agentic Retrieval is part of Microsoft's Adaptive Cloud approach, which extends Azure capabilities to wherever customer data and workloads actually live, with Edge AI focused on bringing reasoning and grounding to on-prem, distributed, and disconnected environments. Together with Foundry Local, Agentic Retrieval continues to shape Microsoft's Foundry Anywhere commitment: flexibility, resilience, and intelligence wherever customers operate. What’s new at Build 2026 This release introduces three major pillars that work independently or together: Agentic Retrieval engine: a first-party orchestration runtime for planning, reasoning, conversation state, and tool calls over your local data Knowledge: a dedicated layer for organizing, curating, and governing your grounding data, exposed via MCP and connectable to any agentic retrieval layer Chat UI: a production-ready, polished conversational experience that ships as the default UX for Agentic Retrieval and can also be deployed standalone Alongside, we’re delivering the platform upgrades customers asked for: flexible deployment modes (Agentic-only, Knowledge-only, or Combined), BYOM with pluggable backends, Foundry Local model catalog integration, Entra ID support, disconnected-ready, and hybrid search combined with agentic retrieval. Agentic Retrieval: From Answering to Reasoning Classic RAG retrieves, then generates. Agentic Retrieval plans, reasons, and acts, running multi-step retrieval and tool invocation under a first-party orchestration runtime, entirely on your infrastructure. Under the hood it manages query planning, iterative multi-hop retrieval, tool calls via MCP, conversation state, and mandatory grounding with citations and audit logging built in. What customers can achieve: Compliance, policy, and permit workflows for public sector, regulators, and defense operations, with data never leaving sovereign infrastructure Multi-document synthesis across standards, technical manuals, contracts, and field procedures for industrial operators An agentic chat experience for regulated and operational teams (engineers, inspectors, analysts) that reasons like a subject-matter expert Auditable AI for sovereign and mission-critical environments, with every answer traceable to its source Knowledge: A First-Class, Governed Data Layer Great answers start with great knowledge. Knowledge is now a standalone component customers can deploy on its own or alongside Agentic Retrieval, exposed through an MCP wrapper so it can connect to any agentic retrieval layer, ours or yours. This release brings Collections (segmented groups of indexed knowledge with granular access permissions), multi-source ingestion across documents, tables, images, and SharePoint (indexed source moving to public preview), high-fidelity parsing for complex enterprise content, Bring Your Own MCP to connect customer-owned data sources directly into Agentic Retrieval and the chat experience, and governance enforced at the data layer itself. ent view - collections, sources, and permission scopes What customers can achieve: Scope knowledge access to different slices of the same corpus, by plant, site, classification, or jurisdiction Enforce data sovereignty, residency, and regulatory compliance at the knowledge layer itself Ground both first-party Agentic Retrieval and BYO orchestration through a single governed source of truth across distributed sites Keep classified, proprietary, and operational data fully on-prem while delivering premium chat experiences Chat UI: Production-Ready Conversational Experience Agentic Retrieval now ships with a polished, production-ready Chat UI as its default experience, and the same component can be deployed standalone for customers building their own stack on Foundry Local. Highlights include Entra ID authentication (MSAL login, Bearer tokens, user identity display), pluggable backends across AI Foundry, BYOM, or mock mode with zero code changes, Chain-of-Thought visibility and inline citations that make grounding transparent to end users, standalone frontend deployment via Helm chart and container image, and disconnected-ready operation for air-gapped environments. What customers can achieve: Deliver a polished end-user experience to operators, inspectors, and analysts without building UI from scratch Build trust in regulated and industrial workflows through transparent, inspectable reasoning and grounding Run the same UI across air-gapped facilities, sovereign clouds, and connected industrial sites Accelerate rollout across public sector, defense, manufacturing, and other mission-critical environments Why This Release Matters Every update to our on-prem RAG platform has moved us toward a simple conviction: GenAI should be useful wherever customers operate, whether regulated or open, connected or disconnected, centralized or distributed. With Agentic Retrieval, Knowledge, and Chat UI coming together, backed by Foundry on Arc, BYOM, and fully disconnected support, this is no longer “cloud RAG, but local.” It’s an agentic knowledge platform purpose-built for the realities of enterprise data: on-prem, governed, and increasingly autonomous. Learn More Explore Agentic retrieval documentation Read Foundry Local on Azure Local model inferencing blog post For more information reach out to the team at FoundryLocalOnAzure@microsoft.com372Views0likes0CommentsScale On-Prem AI with Foundry Local on Azure Local: Multi-Node Inference and vLLM Support
Since announcing the public preview of Foundry Local on Azure Local for single-node, we’ve seen strong adoption in regulated industries and consistent customer demand to expand the platform for scalable deployments. Today, we’re expanding Foundry Local model offering on Azure Local (preview) with three additions that broaden where and how you can use it: Multi-node scheduling - distribute inference workloads across the GPU capacity in your Azure Local cluster, not just a single node vLLM runtime support - a high-throughput serving engine purpose-built for large language models and concurrent workloads An expanded model catalog - new models available in vLLM optimized format alongside the existing ONNX offerings Together, these additions let you scale to higher concurrency, serve more users from a single endpoint, and run larger models on-premises. They round out Foundry Local on Azure Local into a more complete, production-grade on-premises inference platform - covering a wider range of model sizes, concurrency profiles, and hardware footprints, while preserving the same Kubernetes-native, OpenAI-compatible patterns you're already using. Runs disconnected - no cloud round-trip required Foundry Local on Azure Local is designed to run fully on-premises, including in disconnected and intermittently-connected environments. Model weights, prompts, and inference traffic stay entirely inside your Arc-enabled cluster - there is no per-request call to Azure, no data exfiltration to the cloud, and no dependency on a live WAN to serve inference. Models are cached locally on Persistent Volumes after the first pull. Once cached, the inference endpoint keeps serving even when the WAN is down - across reboots, network outages, and extended disconnected operation. API-key authentication continues working uninterrupted during disconnected periods. Microsoft Entra ID auth resumes seamlessly when connectivity returns. The control plane is local to the cluster. The Foundry Local operator, the model catalog, and the inference runtimes all live inside Azure Local - Arc is used for fleet management and updates, not for the inference data path. For factory floors, offshore platforms, sovereign data centers, classified sites, and remote branch offices where cloud connectivity is unreliable, restricted, or prohibited, this is what makes on-premises AI inference actually viable in production. Multi-node scheduling: more scenarios, more capacity Foundry Local on Azure Local now expands to support multiple nodes in your cluster. The inference operator schedules and manages deployments across the GPU capacity available cluster-wide, so you can: GPU capacity from any node in the cluster, not just a single node’s resources Place inference workloads where the hardware lives, with the operator managing deployments across nodes The same Model Deployment custom resource you already use defines the workload, and it is served through the standard OpenAI-compatible endpoint (POST /v1/chat/completions). The API used to interact with conversational AI models by sending structured messages and receiving model-generated responses. Existing applications work against multi-node deployments with zero code changes. vLLM runtime: high-throughput serving for production workloads Alongside ONNX-GenAI, Foundry Local now offers vLLM as a first-class inference runtime. vLLM is an open-source, high-throughput serving engine that has become the standard for production LLM inference in the cloud. Bringing it to Foundry Local on Azure Local means the same performance characteristics are available on your factory floor, in your sovereign data center, or at your remote site. Why vLLM matters for edge and on-premises inference Capability ONNX-GenAI vLLM Hardware CPU and GPU GPU only Throughput Optimized for single-user, low-latency Optimized for high-throughput, multi-user concurrency Memory management Standard allocation PagedAttention - efficient KV-cache management reduces VRAM waste Continuous batching Not supported Supported - incoming requests are batched dynamically for higher GPU utilization FP8 KV cache Not supported Supported on compatible models and GPUs - roughly doubles token capacity Best for Compact models, CPU-only nodes, single-client scenarios Larger models, multi-user workloads, GPU-equipped clusters Automatic GPU inference tuning with the vLLM planner One of the operational challenges with vLLM is configuration tuning - setting GPU memory utilization, context length, batch sizes, and other parameters for a given model on a given hardware profile. Get it wrong and the pod either OOMs (runs out of memory) on startup or wastes GPU capacity. Foundry Local addresses this with the vLLM planner, an automatic tuning component that inspects the available GPU resources, analyzes the target model's footprint, and generates a memory-safe, high-performance configuration before the model server starts. You declare what model you want to run; the planner figures out how to run it optimally on your hardware. Full configuration reference is in the vLLM planner docs. Identity-based access for multi-user workloads Serving more concurrent users isn't only a throughput problem - it's also an access-control problem. Foundry Local supports two authentication modes side by side on the same endpoint: API keys - primary and secondary keys per deployment, with zero-downtime rotation. Ideal for service-to-service traffic and automated pipelines. Microsoft Entra ID with Azure RBAC - per-identity access using the Cognitive Services OpenAI User role (or any role granting the equivalent data-plane action). JWT validation runs inside the inference pod; authorization is enforced through the cluster's Arc-managed identity. Enable both, and clients can present either credential type in the same Authorization: Bearer header - the platform detects which one was sent and routes to the right validation path. API-key callers also keep working uninterrupted if external connectivity is briefly lost, giving you a natural degradation story for edge and disconnected sites. For a multi-user AI assistant on the factory floor or in a sovereign data center, this is the difference between a shared service account and a per-user audit trail. Expanded model catalog: ONNX and vLLM side by side The Foundry Local model catalog now includes models in both ONNX and vLLM formats. The same model can appear multiple times in the catalog - once per runtime/compute target - so you can pick the build that matches your hardware without leaving the platform. The operator selects the right container image automatically based on the entry you reference. Broader open-model support Beyond the Phi and GPTOSS families, the catalog now includes additional models across multiple open-source lineups that customers have requested for on-prem and sovereign deployments, including Mistral and NVIDIA Nemotron. Both are available as catalog entries, served by the vLLM runtime on GPU, and accessible through the same OpenAI-compatible endpoint you already use. In collaboration with NVIDIA, Foundry Local now supports the latest Nemotron models, optimized for enterprise performance on NVIDIA powered Azure Local hardware including NVIDIA RTX Pro 6000. Nemotron models are tuned for reasoning, instruction-following, and agentic workflows, and run on the vLLM runtime with PagedAttention, continuous batching, and FP8 KV cache on compatible GPUs. The vLLM planner handles GPU memory utilization and context-length sizing automatically. you declare the catalog entry, the platform sizes the deployment to your hardware. Models available in vLLM format (see the model catalog docs for the full, regularly updated list) Model ONNX vLLM Notes Phi-4 ✓ ✓ Microsoft's flagship SLM Phi-4-mini ✓ ✓ Compact, fast inference Phi-4-mini-reasoning ✓ ✓ Chain-of-thought reasoning Phi-4-reasoning — ✓ vLLM-only, reasoning-focused gpt-oss-20b ✓ ✓ Mid-range generative gpt-oss-120b — ✓ Large generative, vLLM-only Mistral-7B-v0.2 ✓ ✓ Popular open-source LLM DeepSeek-R1 (7b/14b) ✓ — Reasoning-focused Qwen2.5 (0.5b–14b) ✓ — Multilingual, coder variants Qwen3 (0.6b–14b) ✓ — Latest generation Whisper (multiple sizes) ✓ — Speech-to-text Nemotron ✓ (CPU) ✓ The catalog now includes a growing list of models across both runtimes. Models in vLLM format are served using the vLLM engine with all its performance benefits - PagedAttention, continuous batching, FP8 KV cache - while ONNX models continue to serve on CPU or GPU through the ONNX-GenAI runtime. Bring-your-own model (BYOM) When you need a model that isn’t in the catalog, bring-your-own model still works the same way: package your model as an OCI artifact in any ORAS-compatible registry (Azure Container Registry, GitHub Container Registry, Docker Hub) and reference it from your ModelDeployment. The operator caches it locally and reuses the cached copy on subsequent deployments. Choosing the right runtime ONNX-GenAI when you're running on CPU-only hardware, serving a single application with a compact model, or need the broadest model compatibility including speech and predictive workloads. vLLM when you have GPU hardware, need to serve concurrent users, want to run larger models, or need production-grade throughput from your inference endpoint. Both runtimes expose the same OpenAI-compatible REST API - the choice is transparent to application code. vLLM ModelDeployment is as simple as this: Everything else - memory utilization, context length, batch sizing - is handled by the vLLM planner. See the model catalog docs for the BYO pattern and full configuration options. What hasn't changed Everything from the public preview remains fully supported: Two installation paths - Azure Arc extension (recommended for fleet management) and Helm chart (for platform engineers who need full control) OpenAI-compatible REST endpoints - POST /v1/chat/completions and standard patterns API key and Microsoft Entra ID authentication - secured with bearer tokens, with the per-identity RBAC model described above TLS-enabled ingress - encrypted traffic in transit Disconnected operation - models cached on local PersistentVolumes continue serving when WAN connectivity drops Bring-your-own predictive models - deploy custom ONNX models from OCI registries Multi-model orchestration - agent-style patterns coordinating multiple local models Your existing ModelDeployment manifests continue to work. Applications targeting the ONNX-GenAI runtime don't need any changes. The new capabilities are additive. Real-world scenarios, now at scale Over the past few months, we’ve partnered with customers in early preview to build and validate real-world scenarios. A consistent theme across these engagements is the need to run AI where data resides—on-premises—while maintaining the governance and consistency enabled by Azure Arc. "In energy operations, AI needs to run where the work happens – at remote facilities, offshore platforms, and field locations where connectivity is often limited, and safety is paramount. Foundry Local gives us a path to bring AI-driven decision-making closer to our operational data, with the governance our industry demands. The ability to deploy and run AI workloads consistently across edge and field environments, even when disconnected, is critical as we advance Chevron's vision for autonomous and intelligent operations." (Chevron) Ed Moore - OT Strategist and Distinguished Engineer With multi-node and vLLM, the scenarios from our initial preview scale to meet production demands: Manufacturing: multi-user quality inspection A quality-control system on a production line previously ran Phi-4-mini for single-station anomaly explanation. With vLLM's continuous batching, the same Foundry Local endpoint now serves 10+ inspection stations concurrently - each sending defect images and sensor telemetry for real-time root-cause analysis - without response-time degradation. Sovereign: identity-scoped document processing A government agency processing sensitive casework needs production-grade throughput and a strict audit trail. Foundry Local serves the workload on-premises across multiple GPU nodes, with per-analyst access enforced through Entra ID and Azure RBAC, so every inference call is tied to a real identity - and no data leaves the cluster. Energy: disconnected multi-user operations An offshore platform runs Foundry Local on a multi-node Azure Local cluster. When WAN connectivity drops, the vLLM-powered endpoint continues serving safety procedure lookups, maintenance guidance, and operational queries to multiple crew members simultaneously - each accessing the inference endpoint from their local application. API-key auth keeps working through the outage; Entra ID resumes seamlessly when the WAN comes back. Getting started If you're already running Foundry Local on Azure Local in the public preview: Once installed the Foundry Local extension is automatically kept up to date, with multi-node and vLLM support included. Browse the updated catalog to discover models available in vLLM format Deploy a vLLM model by setting runtime: vllm in your ModelDeployment manifest Let the vLLM planner optimize - override only the preferences you care about and let the planner handle the rest If you're new to Foundry Local on Azure Local: Follow the get-started code-sample blog to see the end-to-end flow Request preview deployment access to get started Read the documentation for architecture overview and deployment guide What's next Multi-node and vLLM are just the beginning. We're continuing to invest in: Distributed LLM serving with LLM-D - KV-cache-aware routing and disaggregated serving for large models that span multiple nodes Autoscaling for inference workloads - dynamic capacity that follows demand Broader model catalog expansion - more model families, more sizes, more task types Enhanced monitoring and observability for inference workloads Performance optimization for specific Azure Local hardware profiles Expanded GPU hardware validation across the Azure Local catalog We're building Foundry Local to be the production AI inference platform for edge and sovereign environments. Your feedback is shaping every release - keep it coming. Learn more: Foundry Local Model and inferencing on multi node demo Foundry Local for devices (GA) For more information reach out to the team at FoundryLocalOnAzure@microsoft.com332Views0likes0CommentsIntroducing GitHub Enterprise Local (Preview): DevOps for Sovereign and Private Cloud Environments
Across the world, many organizations, particularly in government, defense, financial services, and critical infrastructure, must operate within strict sovereign boundaries, often due to regulatory, security, or disconnected environment requirements. Microsoft’s Sovereign Private Cloud is a customer operated cloud model designed for scenarios where sovereignty, operational control, and resiliency are non negotiable. It enables organizations to operate securely and at scale, even in restricted or disconnected environments, while maintaining governance aligned with regulatory and national obligations. Azure Local is the foundation that makes this possible. With Azure Local, organizations can run critical workloads—including virtual machines, Kubernetes, virtual desktop infrastructure, and AI workloads—on infrastructure they own and control, while still benefiting from Azure consistent management, governance, and lifecycle operations. We’re continuing to expand the set of workloads and capabilities supported on Azure Local to meet the needs of organizations operating in sovereign and highly regulated environments. With Microsoft 365 Local, Azure Local now extends beyond infrastructure to support communication and collaboration workloads, enabling productivity and resiliency even in disconnected or restricted conditions. And with Foundry Local, we are supporting modern AI workloads on Azure Local, bringing advanced AI capabilities to infrastructure customers own and operate. We are excited to announce the public preview of GitHub Enterprise Local, which brings GitHub’s enterprise developer platform into sovereign and private cloud environments. GitHub Enterprise Local is fully hosted on customer owned infrastructure, enabling organizations to modernize application development while keeping source code, build pipelines, and development artifacts entirely within their own operational boundaries. What Is GitHub Enterprise Local? GitHub Enterprise Local enables organizations to deploy GitHub Enterprise Server (GHES) entirely within customer‑owned infrastructure using Azure Local as the underlying private cloud platform. The solution is delivered as a prebuilt virtual machine image that runs on Azure Local and operates fully within the customer’s security and network perimeter. All repositories, metadata, CI/CD workflows, and artifacts remain on‑premises. GitHub Enterprise Local is designed to run without internet connectivity by default, making it suitable for both connected and fully disconnected or air‑gapped environments. At the same time, it preserves a GitHub‑consistent experience for developers, allowing teams to continue using familiar workflows for source control, collaboration, and automation. Developer and Platform Capabilities GitHub Enterprise Local provides a comprehensive set of enterprise developer platform capabilities. Teams can host private repositories, manage organizations, and collaborate through pull requests, branch protection rules, and structured code reviews. Issues, wikis, and project collaboration features are also available, enabling end‑to‑end development workflows within the same platform. GitHub Enterprise Local can run on either a single-node or multi-node Azure Local instance depending on customer needs. Single‑node Azure Local runs GHES as a standalone VM, ideal for preview, PoC, and low‑risk scenarios focused on simplicity and cost efficiency. For production-oriented deployments, the same single GHES VM can run on a multi‑node Azure Local cluster, where Azure Local provides VM‑level high availability and failover. For automation and delivery, GitHub Enterprise Local supports GitHub Actions using self‑hosted runners. This allows organizations to build and run CI/CD pipelines entirely within their own environments, with full control over execution context, dependencies, and network access. GitHub Packages can be used for artifact management, supporting common ecosystems such as npm, NuGet, Maven, and container images. GitHub Enterprise Local extends modern development workflows with AI assisted experiences while keeping sensitive data within customer-controlled environments. Developers can use GitHub Copilot in several ways, including as a standalone experience, through Copilot CLI, and in VS Code. They can choose GitHub-managed models by connecting to GitHub.com, or connecting directly to model providers from Copilot CLI, allowing source code to avoid passing through GitHub Cloud. Foundry Local provides an on-premises inference layer that keeps prompts, code context, and model execution inside organizational boundaries. Together, these capabilities create a clear integration path across code automation and AI application development, enabling organizations to modernize the developer experience while preserving operational control, compliance, and auditability. Developer AI Workflow Architecture This architecture demonstrates how GitHub Enterprise Local serves as the secure, customer-managed foundation for source control, collaboration, and workflow orchestration, enabling developers to layer AI-assisted capabilities through GitHub Copilot, GitHub CLI, and Foundry Local—while ensuring that code, data, and AI execution remain fully within organizational boundaries. Architecture Overview GitHub Enterprise Local follows a layered architecture model. Infrastructure Layer Azure Local forms the foundation, deployed on Azure Local–certified hardware. It provides: The virtualization platform for running GitHub Enterprise Local Infrastructure availability and update management Customer‑controlled networking, identity, and security policies Azure Arc‑enabled management for infrastructure lifecycle operations GitHub Enterprise Local Appliance Layer GitHub Enterprise Server (GHES) is deployed as a prebuilt virtual machine image on Azure Local. This VM includes: The GHES application stack Persistent data disks for repositories and metadata Support for replica‑based failover configurations, depending on customer requirements All application data remains within customer infrastructure boundaries. Operations Layer Operational responsibilities are clearly separated: Azure Local administrators manage the Azure Local infrastructure through Azure GitHub administrators manage GHES configuration, upgrades, user access, and ongoing maintenance through the GitHub Management control and site admin dashboard This separation aligns with common enterprise operational models. Connectivity Modes and Deployment Scenarios GHES is designed to operate fully offline, making it suitable for air‑gapped and restricted environments. Azure Local complements this capability by supporting both connected and fully disconnected operational modes. In connected environments, customers can take advantage of centralized management and monitoring of GHES appliance. In disconnected environments, the entire solution can operate in complete isolation, ensuring compliance with strict sovereignty or security mandates. This flexibility allows organizations to adopt a deployment model that aligns with their regulatory, operational, and security requirements. Hardware and Capacity Planning GitHub Enterprise Local virtual machine sizing depends on customer use cases, including: Number of developers Repository size and growth CI/CD pipeline frequency Artifact storage requirements Azure Local supports running GitHub Enterprise Local on both Integrated and Premier hardware solutions, provided sufficient capacity is available. Customers should plan compute, memory, storage, and network resources accordingly. Minimum recommended requirements Billing Overview GitHub Enterprise Local combines user-based application licensing, Azure Local infrastructure-based billing, and separate pricing for AI services such as Copilot and Foundry. GitHub Enterprise Local is billed per user seat. (GitHub Enterprise license) Azure Local is billed per physical CPU core. (Azure Local Billing) Copilot and Foundry have separate service-based pricing. (GitHub Copilot Plans & pricing) Public Preview Access GitHub Enterprise Local on Azure Local is available today in public preview. Customers can request access by completing the public preview registration form. Submissions are reviewed as part of the preview onboarding process. Participate in public preview: GitHub Enterprise Local Preview Sign-Up Learn More GitHub Enterprise Local documentation1.3KViews1like0CommentsSimplified access to Hotpatching enabled by Azure Arc for Windows Server 2025
With Windows Server 2025, we introduced hotpatch enabled by Azure Arc, delivering security updates to Windows Server across hybrid and multicloud environments – minimizing downtime (no reboot), accelerating protection, and unifying patch management. We know that keeping your servers updated with the latest patches is one of the critical tasks that IT teams perform day-to-day. We want to make it simpler to install the latest operating system (OS) updates without rebooting machines after every installation. The resounding feedback we have received from you underscored the criticality of this feature in the lifecycle management and security of your infrastructure. We are now taking it one step further to reduce the friction to deploying these critical updates: hotpatch enabled by Azure Arc is now available at no additional cost for Windows Server 2025. Which machines are eligible for this offer? To use hotpatch for Windows Servers running on-premises or in multicloud environments, you must be using Windows Server 2025 Standard or Datacenter, and your server must be connected to Azure Arc. With this announcement, enabling and usage of the hotpatching service is available at no additional charge. Please take note that there are no charges for customers running on Azure IaaS, or Azure Local, wherein hotpatching is available as part of the functionality of Windows Server Datacenter: Azure Edition. This feature is already included both with Windows Server 2022 Datacenter: Azure Edition and Windows Server 2025 Datacenter: Azure Edition. How do I manage hotpatches enabled by Azure Arc for Windows Server 2025? If your Windows Server 2025 machines aren't already connected to Azure Arc, install the Azure Connected Machine agent — it takes just a few minutes per server and supports at-scale rollout via Group Policy, service principal, or Terraform. Once connected, enable Hotpatch from the Azure portal, Azure PowerShell, Azure CLI, or the REST API — just confirm Virtualization-based security (VBS is enabled) first. From there, use Azure Update Manager to schedule and monitor rollouts at scale. For instructions on how to enable hotpatch for Azure Arc-enabled machines using group policy or scripts, learn more here: https://aka.ms/ws-hotpatch For patch orchestration at scale, you can use Azure Update Manager to deliver hotpatches enabled by Azure Arc for Windows server 2025 machines. This enables greater uptime with fewer reboots and faster deployment of updates with easy patch orchestration. Alternatively, you can use APIs or other management tools to manage hotpatches. Centralized management of hotpatch updates across hybrid and multicloud environments enabled by Azure Arc Once your machines are connected to Azure Arc, you can also use the cloud-native services from Azure to manage your windows machines running on-prem. Azure Arc enables you to standardize security and governance across a wide range of resources so you can easily organize, govern and secure Windows, Linux, SQL servers, and Kubernetes clusters running across data centers, edge, and multi-cloud environments – using Azure services such as Azure Policy, Azure Monitor, Microsoft Defender and more. At no additional cost for machines attached to Azure Arc Basic inventory across on-prem and multi-cloud Tag your resources, organize them into resource groups, subscriptions, and management groups, and query at scale with Azure Resource Graph to unify your environments. Infra as Code (Bicep, Terraform) Infra as code for provisioning and management of resources. VM Self Service Perform lifecycle management such as (create, resize, update and delete) and power cycle operations such as (start, stop, and restart on VMware vCenter and System Center Virtual Machine Manager Virtual Machines. Hotpatch for Windows Server 2025 NEW Windows Server hot patching enables you to apply security updates without rebooting, keeping systems secure while maintaining continuous uptime. VM Management Administrate your servers anywhere using SSH for Azure Arc, Run Command, and Custom Script Extension. Mgmt. Services included for no additional costs with Windows Server Software Assurance or Extended Security Updates Azure Update Manager Provides a unified, centralized service to monitor, orchestrate, and automate patching across Azure, on‑prem, and multi‑cloud environments ensuring security, compliance, and minimal downtime at scale. Azure Machine Configuration (Policy) Policy‑driven auditing and enforcement of OS and application settings as code across Azure and hybrid machines—ensuring consistent, compliant state at scale. Including compliance policies like CIS Benchmark and WinRE Change Tracking & Inventory Real‑time visibility into configuration changes and system state across your fleet enabling faster troubleshooting, improved security, and continuous compliance at scale. VM insights from Azure Monitor Delivers a unified, pre‑built observability experience that provides real‑time performance, health, and dependency visibility across VMs—enabling faster troubleshooting, optimization, and capacity planning at scale. Windows Admin Center Unified, browser‑based management plane to securely manage Windows servers, VMs, and hybrid infrastructure from anywhere—simplifying operations and improving efficiency at scale. Best Practices Assessment Continuously evaluation your server configurations against Microsoft-recommended standards to proactively identify risks and provide actionable remediation guidance—improving security, performance, and operational health at scale. Frequently Asked Questions What are hotpatch updates? Hotpatch updates are monthly security updates that take effect without requiring you to restart the device. They contain a full set of security updates equivalent to the standard updates released the same day. What is the hotpatch update cycle? All eligible Windows Server 2025 machines enrolled in hotpatch are offered up to 8 monthly hotpatch updates in a calendar year in a quarterly cycle: Baseline month: In January, April, July, and October, devices install the monthly cumulative security update and must restart for the update to take effect. This update includes the latest security fixes, cumulative new features, and enhancements since the last baseline. Subsequent two months: Devices receive hotpatch updates, which only include security updates and don't require a restart for the update to take effect. These devices will catch up on features and enhancements with the next cumulative baseline month (quarterly). Will billing be stopped for existing enrolled machines? Yes, as of 15 th May 2026 all billing for hotpatch has been stopped for all existing machines enrolled in hotpatch. What action do we need to take if we have machines enrolled in hotpatch already? There is no additional action needed for machines that are currently enrolled in hotpatch. These machines will remain enrolled in hotpatch and receive hotpatch updates when available. I want all my Windows Server 2025 machines to get hotpatches. How do I do it? If you have Windows Server 2025 machines on-premises or on cloud (other than Azure) then you can enable hotpatch on them. To do so, ensure these machines have Virtualization Based Security enabled and are connected to Azure Arc and then you can use Azure Arc portal, Azure Update manager or APIs to enable hotpatch. Learn more: https://aka.ms/ws-hotpatch Is anything changing for Hotpatching on Azure? Hotpatch continues to be available on Azure for your Windows Server 2022 and Windows Server 2025 VMs when using Azure Edition. There is no fee associated with Hotpatching on Azure. Learn more here. Is there a community forum for Arc? Yes, you can join the Azure Arc Monthly Forum here: aka.ms/ArcServerForumSignup3.6KViews10likes5Comments[Now Generally Available] Customizable Security Baseline Policies in Machine Configuration!
Background: Azure Machine Configuration remains committed to enabling greater security and simplicity in at-scale server management for all Azure customers. Machine Configuration (previously known as Azure Policy Guest Configuration) enables both built-in and custom configuration as code allowing you to audit and configure OS, app, and workload level settings at scale, both for machines running in Azure and hybrid Azure Arc-enabled servers. We're excited to announce the General Availability of Customizable Security Baselines in Azure Policy and Machine Configuration. What began as a Public Preview is now a mature, production-grade capability that empowers you to tailor industry security benchmarks to your organization's unique compliance standards across both Azure and Arc-connected machines, at scale. This release moves the experience from "useful" to "everyday default." Standards coverage has expanded, the customization and assignment flow is faster, full lifecycle management is now possible directly from the Azure Portal, and a new Overview page gives you a single pane of glass into which parts of your estate are unprotected. What is Baseline Customization? The core experience remains: tailor security standards through the Modify Settings wizard under Policy > Machine Configuration. You can enable, exclude, or adjust rules from existing benchmarks, apply organization-specific parameters, and export your custom configuration as a downloadable JSON file. Each baseline JSON file serves as a reusable, declarative artifact, ideal for policy-as-code workflows, version control, and CI/CD integration. What's New? GA brings four substantive shifts to the customizable baselines experience: broader standards coverage, a faster path from customization to deployment, lifecycle management directly in the portal, and a new Overview page that surfaces compliance gaps at the subscription level. Together, these changes reflect what we heard from early customers during Preview: that custom baselines need to live alongside the rest of their governance workflows, not in a one-time wizard. This cloud-native approach continues to embody Microsoft's Secure by Design and Secure by Default principles, with a sharper focus on the operational reality of running compliance at scale. Built-in Policy Standards Coverage GA expands what you can customize and where it's supported. Standard Status Notes CIS Benchmarks for Linux Generally Available Expanded distribution coverage since Public Preview. See the full list of supported distros in the official documentation. [NEW!] CIS Benchmarks for Windows Public Preview Initial release covers L1 settings for WS2025 Domain Controller and Member Server roles. Azure Compute Security Baseline for Windows Generally Available Now supports customization for Windows Server 2016 and 2019, in addition to 2022 and 2025. Azure Compute Security Baseline for Linux Generally Available Aligned with Azure Compute recommendations across supported Linux distributions. Key Scenarios Faster Time to Deployment The customization-to-assignment path is now a single continuous flow. You can: Skip the JSON download step entirely. Baseline settings are auto-populated into the Azure Policy assignment flow, so you no longer have to download a JSON file, browse for it, and upload it back. The settings ride with you from Modify Settings straight into Assign Policy. Use the improved settings editor. Role-specific values (Domain Controller, Member Server) and formatted inputs render cleanly in the UX, with validation that prevents malformed parameters from reaching the policy assignment. Still export when you need to. The JSON download remains available for teams that want to commit baselines to source control, share with reviewers, or pipe through CI/CD. The net result: what used to take a multi-step download-and-reupload sequence is now a few clicks inside one blade. Lifecycle Management in the Portal Compliance baselines are not write-once artifacts. They evolve as benchmarks update, as your controls tighten, and as your estate changes. GA introduces two capabilities that treat baselines as living configuration: Import and Modify. From the Definitions tab under Machine Configuration, you can now import an existing baseline JSON and iterate on it directly in the portal. This closes the loop between policy-as-code workflows and ad-hoc edits, so you no longer have to choose between version-controlled artifacts and in-portal convenience. Edit Settings on existing Assignments. The Assignments tab now supports updating an active baseline assignment in place. You can refine rules, adjust role-specific values, or exclude controls without tearing down and re-creating the assignment. All you have to do is select the policy assignment and the "Edit Settings" button should be enabled. Together, these turn baselines into something you maintain, not something you set and forget. New Overview Page: See Where You're Unprotected A new Overview page on Policy > Machine Configuration gives you subscription-level visibility into where Machine Configuration is enabled and where it isn't. For each subscription it surfaces status (At Risk, Not Enabled, Enabled), machines missing prerequisites, machines with prerequisites in place, and total eligible machines. From the same view you can enable Machine Configuration on selected subscriptions to onboard eligible VMs and activate baseline auditing in a single action. This shifts the first question from "is this one machine compliant?" to "which corners of my estate aren't even being assessed yet?", which is usually the more consequential gap. Integration and Automation Security baselines continue to integrate into your DevOps pipelines and configuration management workflows. Each baseline produces a declarative settings catalog (JSON) that can be versioned and deployed using Azure CLI, ARM templates, Bicep, and CI/CD automation, ensuring reproducible, traceable compliance configurations across environments. Availability Customizable security baselines are now generally available in all public Azure regions, Azure Government, and Sovereign Clouds. Getting Started Prerequisites Before you begin: Deploy the Azure Machine Configuration prerequisite policy initiative. (This installs the required Guest Configuration extension on supported VMs.) You can also do this in a single action from the new Overview page. Ensure your Azure subscription or management group includes supported Windows or Linux VMs. Have sufficient permissions (Owner or Resource Policy Contributor) to create and assign custom policy definitions. Step-by-Step Guidance Check your coverage on the Overview page to see which subscriptions are unprotected and onboard them with one click. Select a baseline from the Definitions tab in Machine Configuration or use Import and Modify to iterate on an existing baseline JSON. Modify settings to enable, exclude, or parameterize rules to match your internal policies. Assign the policy directly from the wizard. Settings are auto populated into the assignment flow, no JSON upload required. Iterate when needed. Use Edit Settings on the Assignments tab to refine active baselines in place. Review compliance results to track outcomes in Azure Policy, Azure Resource Graph, or the Guest Assignments page. Learn More Azure Machine Configuration security baselines official documentation CIS Benchmark for Windows Server (Preview) documentation CIS Benchmark for Linux documentation Azure Windows Baseline and Azure Linux Baseline documentation Please note that the use of Azure Machine Configuration on Azure Arc-enabled servers will incur a charge.