Introduction to AI on Windows Server
As organizations embrace AI, new opportunities exist for Windows Server customers who want to leverage on-premises AI. While Azure remains the best place for cutting edge models and AI inference hardware accelerators, certain industries - such as healthcare, finance, manufacturing, and retail - require on-premises AI to improve and accelerate existing business workflows.
Microsoft Foundry on Windows helps harness the power of AI on existing server deployments. Microsoft Foundry on Windows includes Foundry Local and Windows ML that enable server customers to build local AI experiences and real-time inferencing.
Leveraging AI on your own infrastructure gives control over data residency, compliance, and latency.
This blog details how Microsoft Foundry on Windows brings local AI capabilities to Windows Server deployments. It explores why Foundry Local and Windows ML are a strong fit for on-premises AI, highlighting technical considerations, and showing how customers can easily build generative AI applications with Foundry Local catalog, or proprietary models of any type via Windows ML
Windows Server as local AI platform
Windows Server 2025 reached GA last year and introduced significant enhancements—including advanced storage capabilities, GPU partitioning (GPU-P), and Discrete Device Assignment (DDA) for assigning GPU resources to virtual machines, and massive Hyper V scalability with support for up to 2,048 vCPUs per Gen 2 VM. These capabilities combine to make Windows Server 2025 ideal for AI-intensive workloads. Built to power mission critical environments where compliance and continuity are non-negotiable, Windows Server offers a robust, enterprise grade infrastructure that enables AI inferencing on premises without leaving your datacenter.
Scenarios for On-Premises AI
Although many organizations are investing in AI on Azure to leverage the latest innovations, we understand there are several situations where on-premises AI capabilities are required. Below are a few examples of such scenarios.
Healthcare
Meet regulatory requirements. Maintain Protected Health Information (PHI) and clinical records within your on-premises perimeter to meet compliance requirements—while enabling AI-powered insights locally.
Finance
Act on insights instantly. Process financial reports and transaction logs near the source to reduce latency and avoid round trips to external endpoints, ensuring speed and confidentiality.
Manufacturing
Operate in disconnected environments. Run AI workflows in air-gapped or intermittently connected plants to support predictive maintenance and quality control without relying on cloud connectivity.
Retail offices
Operate in latency-sensitive environments. Run AI models for basic inferencing to improve point-of-sale efficiency and deliver personalized experiences.
Technical Snapshot
Microsoft Foundry on Windows supports a two-pronged approach to make Windows Server platform AI-ready:
Windows ML enables application service owners to introduce AI workflows or inferencing within existing server applications. It automatically identifies available processors (CPU or GPU) based on server hardware, downloads optimal execution providers (EPs) and allows the application to use AI models locally. Windows ML supports ONNX Runtime under the hood, ensuring compatibility with popular frameworks and optimized execution providers.
Foundry Local enables seamless discovery, download, and orchestration of AI models directly on Windows Servers, including support for hardware acceleration on servers with GPUs. It also streamlines deployment of foundational models on virtual machines with GPU-P partitioning, ensuring hardware isolation and optimized resource sharing for compliance sensitive environments.
The foundry model catalog will continue to evolve with more models and APIs, like embedding models support.
Simple steps to get started!
- Onboard Foundry Local on your existing server infrastructure: Install Foundry Local on Windows Server 2025
- Identify a practical use case for AI inferencing: Start with a simple scenario—such as summarizing reports or translating content to native language.
- Pilot with existing prebuilt models in the catalog for rapid results. Validate performance and compatibility with your hardware.
- Integrate with existing workflow: Connect inference endpoints to your current applications or automation pipelines. Keep data local while enhancing processes with AI insights. Foundry Local provides an SDK, Command Line Interface (CLI), and a REST API for ease of use and integration into existing workflows and applications.
- Measure performance: Track latency, throughput, and resource utilization to optimize deployment. Use these insights to fine-tune and iterate.
Deep dive: Unlock the power of BYOM + Windows ML on Windows Server
Bring Your Own Model (BYOM): This gives organizations the freedom to choose custom AI models tailored to their domain and business needs. For instance, a manufacturing company might bring a predictive maintenance model trained on its own sensor data to anticipate equipment failures and reduce downtime.
Windows ML allows use of proprietary models to run seamlessly on Windows Server. Windows ML automatically discovers, downloads and registers the latest version of all compatible execution providers (EP). Tools like AI Toolkit Extension for VS Code can be used for model optimization and quantization to prepare models for efficient local execution.
In summary, with BYOM and Windows ML on server customers can deploy custom AI models to provide inferencing solutions locally to existing business workloads.
Resources:
- For questions or feedback, reach out to foundrylocal-server@microsoft.com