Blog Post

Azure Arc Blog
7 MIN READ

Bringing AI to the Factory Floor with Foundry Local - Now in Public Preview on Azure Local

liranlyabock_microsoft's avatar
Apr 16, 2026

Today, we’re expanding Foundry Local to support single-node Azure Local deployments in preview, delivered as a Kubernetes-native service as well as an Azure Arc-enabled extension for industrial and sovereign customers. It’s designed for industrial and manufacturing settings where inference must run at the machine level—on a server on the factory floor, inside an electrical cabinet, or at a remote plant—without relying on cloud connectivity or a multi-node cluster. This builds on Microsoft’s edge AI strategy, complementing the cloud by enabling AI to run in secure datacenters, disconnected locations, highly regulated industries, and edge sites worldwide. With this release, organizations can deploy, manage, and run advanced AI models on local infrastructure, unlocking generative and predictive experiences with the governance, security, and operational consistency of Azure Local.

 

Key capabilities in this preview

Foundry Local exposes standard REST and OpenAI‑compatible APIs, enabling IT and AI teams to deploy and operate local AI workloads using familiar, cloud‑aligned patterns across edge and on‑prem environments.

In this public preview, we deliver the following capabilities:  

  • Azure Arc extension for Foundry Local
    Deploy and manage Foundry Local via an Azure Arc extension, enabling consistent install, configure, update, and governance workflows across Arc‑enabled Kubernetes clusters, in addition to Helm‑based installation.
  • Built‑in generative models from the Foundry Local catalog
    Deploy pre‑built generative models directly from the Foundry Local model catalog using a simple control‑plane API request.
  • Bring‑your‑own predictive models (ONNX) from OCI registries
    Deploy custom predictive models (such as ONNX models) securely pulled from customer‑managed OCI registries and run locally.
  • REST and OpenAI‑compatible inference endpoints
    Consume both generative and predictive models through standard HTTP endpoints.
  • Multi‑model orchestration for agent‑style applications
    Enable applications that coordinate multiple local models—for example, generative models guiding calls to predictive models—within a single Kubernetes cluster.

Running Foundry Local on Azure Local single-node gives you:

  • A validated, supported hardware foundation for running AI inference at the edge, from compact 1U nodes on the factory floor to rugged form factors in remote sites, using hardware from the Azure Local catalog
  • AKS on Azure Local as the deployment target, so Foundry Local runs as a containerized workload managed by Kubernetes - the same operational model you use for any other workload on the cluster
  • GPU access through the NVIDIA device plugin on AKS, giving Foundry Local's ONNX Runtime direct access to the node's discrete GPU without requiring Windows or host-OS-level configuration

Two installation Options for single node deployment:

The preview includes the Foundry Local Azure Arc extension, providing a consistent installation, deployment, and lifecycle management experience through Azure Arc, while also supporting Helm‑based installation

Choose one of two installation paths:

Option 1 - Arc-enabled Kubernetes Extension

Recommended when: your organization manages multiple Azure Local instances and wants Microsoft to handle the deployment lifecycle — version updates, configuration drift detection, health monitoring — through the Azure portal without the team needing to manage Helm releases manually.

Arc-enabled Kubernetes extensions deploy and manage workloads on AKS clusters registered with Azure. The extension operator runs in the cluster and reconciles the desired state declared in Azure, which means you don't need direct kubectl or helm access to the node to push updates. This is the lower-operational-overhead path for OT teams who are not Kubernetes specialists.

 

 

Once installed, the extension appears in the Azure portal under your AKS cluster's Extensions blade. Model updates and configuration changes are pushed by modifying the extension configuration in Azure — no shell access to the node required. For disconnected or intermittently connected deployments, the extension operator caches its desired state and continues operating; it reconciles with Azure when connectivity resumes.

 

Option 2 - Helm Chart

Recommended when: your team manages AKS workloads with Helm or GitOps (Flux), and you need precise control over GPU resource allocation, node affinity, model pre-loading, or persistent volume configuration.

The Helm chart gives you full control over the deployment manifest. You decide exactly how much GPU memory is requested per pod, which node the inference pod is pinned to, and what StorageClass backs the model cache. This matters on a single-node Azure Local deployment where you're sharing one physical GPU between the inference workload and potentially other AKS workloads.

 

 

With Helm you can also integrate with Flux for GitOps-managed deployment — useful when you manage multiple Azure Local single-node instances across plant sites and want to push model or configuration updates from a central Git repository.      

 

Example of a model deployment YAML file

 

Note: Verify the chart repository URL, chart name, and exact values.yaml parameters from the official Foundry Local documentation before deploying to production.

 

Choosing Between the Two

 

Helm Chart

Arc Extension

authentication

API key

EntraID

Version upgrades

Manual helm upgrade or Flux

Automatic, managed by Microsoft

GitOps compatible

Yes (Flux HelmRelease)

Yes (via Azure Policy / desired state)

Requires cluster access

Yes

No (after initial registration)

Best for

Platform engineers, custom configs

OT-managed sites, multi-site fleet

Disconnected operation

Works after initial deploy

Works; reconciles on reconnect

Control plane

K8S native management  (kubectl)

K8S native management  + REST API control plane

 

Early Customer Validation and Key Scenarios

Early customer validation is shaping the preview -helping ensure Foundry Local meets real-world requirements for latency, data control, and operating in constrained or disconnected environments across industries such as energy, manufacturing, government, financial services, and retail.

Based on this early feedback, customers are prioritizing scenarios such as:

  • Sovereign and regulated

o   On-site inference with data, models, and processing under customer control

o   Decision support in disconnected or restricted-network environments

o   In-jurisdiction processing for sensitive records and casework

o   Real-time detection and situational awareness within secure facilities

  • Industrial and critical infrastructure

o   Edge operations assistants combining sensor telemetry with conversational AI

o   Low-latency quality inspection and process verification on factory floors

o   Predictive maintenance for remote or intermittently connected equipment

o   Local safety monitoring and operational oversight close to systems

This input is guiding improvements across deployment flows, model catalog experience, hardware coverage, telemetry visibility, and documentation -so teams can evaluate and adopt Foundry Local more quickly and confidently in the environments above.

Examples:

CNC Anomaly Explanation: A machine vision system on a CNC line classifies a surface defect and passes the classification JSON to the Foundry Local endpoint. Phi-4-mini generates a plain-language root-cause hypothesis for the operator, referencing the specific machining parameters.

Disconnected Safety Procedure Lookup: An offshore platform or remote mine site loses WAN connectivity. The Foundry Local pods continue serving requests from the AKS cluster on the Azure Local node - Kubernetes keeps the pods running, the model is already on the local PersistentVolume, and no external dependency is required. Workers query safety procedures (LOTO sequences, chemical handling) from an intranet application backed by the same inference endpoint. Qwen2.5-7B fits within 8–12 GB VRAM and supports a 32K token context window, making it viable for inline procedure retrieval without a separate vector database - useful when plant-floor infrastructure is minimal.

 

Foundry Local for Devices and Foundry Local on Azure Local: What's Different

Foundry Local for devices reached general availability for developer devices -Windows 10/11, macOS (Apple Silicon), and Android. That release targets a specific scenario: a developer or end user running AI inference on their own machine, with the model executing locally on their CPU, GPU, or NPU. The install is a single command (winget or brew), the service runs directly on the host OS, and there is no Azure subscription or infrastructure required. It is a developer tool and an application-embedded runtime.

General overview of Foundry Local is available here: What is Foundry Local? - Foundry Local | Microsoft Learn

The public preview for Azure Local single node is a different deployment target built for a different operational context. The runtime is the same - ONNX Runtime, the same model catalog, the same OpenAI-compatible API - but where it runs, how it is deployed, and how it is managed are entirely different.

 

Foundry Local for Devices (GA)

Foundry Local on Azure Local Single Node (Preview)

Target

Developer machines, end-user devices

Enterprise edge servers on the factory floor or remote site

OS

Windows 10/11, macOS, Android

Linux container on AKS on Azure Local

Hardware

Laptops, workstations, NPU-equipped devices

Validated server hardware from the Azure Local catalog

GPU access

Direct host GPU (CUDA, DirectML, Apple Neural Engine)

NVIDIA device plugin on Kubernetes

Installation

winget install or brew install

Arc-enabled Kubernetes extension or Helm chart

Lifecycle management

Manual update via winget upgrade

Managed via Helm/Flux or Arc extension operator

Intended consumers

One developer or one application on one machine

Multiple applications sharing one inference endpoint on the plant network

Disconnected operation

Supported after model download; primarily online

Designed for persistent disconnected operation with NVMe-cached models

Model persistence

Local device cache

Kubernetes PersistentVolume on local storage

Operational model

Developer installs and manages it

Platform team deploys it; applications consume it as a service

 

The short version: the GA device release is for building and running AI-enabled applications on a single machine. The Azure Local single-node preview is for deploying Foundry Local as a shared, production inference service that runs continuously on validated industrial hardware, survives WAN outages, and is consumed by multiple workloads running on the same edge cluster.

If you are prototyping an application on your laptop using the GA release, the same application code - specifically the OpenAI-compatible API calls - runs unchanged against the Azure Local deployment. You change only the base_url from localhost to the Kubernetes Service

Built for Secure Industrial and Sovereign Operations

Foundry Local supports Microsoft’s sovereign cloud principles—allowing AI workloads to operate fully locally, with customer‑controlled data boundaries and governance.

Foundry Local on Arc high level Service Diagram

 


Integration with Azure Arc provides unified management, configuration, and monitoring across hybrid and disconnected landscapes, enabling organizations to meet stringent compliance and operational requirements while adopting advanced AI capabilities.

 

Learn more about Foundry Local on Azure Local

  • RECOMMENDED participate in Foundry Local on Azure Local preview form link
  • Foundry Local on Azure Local Documentation link
  • Reach out to the team for support requests, feedback or suggestions here: FoundryLocal_Support@microsoft.com
  • Foundry Local on Azure Local: HELM deployment Demo - link
  • Foundry Local is now Generally Available link

 

 

 

 

 

 

 

 

Updated Apr 17, 2026
Version 2.0
No CommentsBe the first to comment