adaptive cloud
44 TopicsCloud infrastructure for disconnected environments enabled by Azure Arc
Organizations in highly regulated industries such as government, defense, financial services, healthcare, and energy often operate under strict security and compliance requirements and across distributed locations, some with limited or no connectivity to public cloud. Leveraging advanced capabilities, including AI, in the face of this complexity can be time-consuming and resource intensive. Azure Local, enabled by Azure Arc, offers simplicity. Azure Local’s distributed infrastructure extends cloud services and security across distributed locations, including customer-owned on-premises environments. Through Azure Arc, customers benefit from a single management experience and full operational control that is consistent from cloud to edge. Available in preview to pre-qualified customers, Azure Local with disconnected operations extends these capabilities even further – enabling organizations to deploy, manage, and operate cloud-native infrastructure and services in completely disconnected or air-gapped networks. What is disconnected operations? Disconnected operations is an add-on capability of Azure Local, delivered as a virtual appliance, that enables the deployment and lifecycle management of your Azure Local infrastructure and Arc-enabled services, without any dependency on a continuous cloud connection. Key Benefits Consistent Azure Experience: You can operate your disconnected environment using the same tools you already know - Azure Portal, Azure CLI and ARM Templates extended through a local control plane. Built-in Azure Services: Through Azure Arc, you can deploy, update, and manage Azure services such as Azure Local VMs, Azure Kubernetes Service (AKS), etc. Data Residency and Control: You can govern and keep data within your organization's physical and legal jurisdiction to meet data residency, operational autonomy, and technological isolation requirements. Key Use Cases Azure Local with disconnected operations unlocks a range of impactful use cases for regulated industries: Government and Defense: Running sensitive government workloads and classified data more securely in air-gapped and tactical environments with familiar Azure management and operations. Manufacturing: Deploying and managing mission-critical applications like industrial process automation and control systems for real-time optimizations in more highly secure environments with zero connectivity. Financial Services: Enhanced protection of sensitive financial data with real time data analytics and decision making, while ensuring compliance with strict regulations in isolated networks. Healthcare: Running critical workloads with a need for real-time processing, storing and managing sensitive patient data with the increased levels of privacy and security in disconnected environments Energy: Operating critical infrastructure in isolated environments, such as electrical production and distribution facilities, oil rigs, or remote pipelines. Here is an example of how disconnected operations for Azure Local can provide mission critical emergency response and recovery efforts by providing essential services when critical infrastructure and networks are unavailable. Core Features and capabilities Simplified Deployment and Management Download and deploy the disconnected operations virtual appliance on Azure Local Premier Solutions through a streamlined user interface. Create and manage Azure Local instances using the local control plane, with the same tooling experience as Azure. Offline Updates The monthly update package includes all the essential components: the appliance, Azure Local software, AKS, and Arc-enabled service agents. You can update and manage the entire Azure Local instance using the local control plane without an internet connection. Monitoring Integration You can monitor your Azure Local instances and VMs using external monitoring solutions like SCOM by installing custom management packs and monitor AKS Clusters through 3 rd party open-source solutions like Prometheus and Grafana. Run Mission-Critical Workloads – Anytime, Anywhere Azure Local VMs You can run VMs with flexible sizing, support for custom VM images, and high availability through storage replication and automatic failover – all managed through the local Azure interface. AI & Containers with AKS You can use disconnected AI containers with Azure Kubernetes Service (AKS) on Azure Local to deploy and manage AI applications in disconnected scenarios where data residency and operational autonomy is required. AKS enables the deployment and management of containerized applications such as AI agents and models, deep learning frameworks, and related tools, which can be leveraged for inferencing, fine-tuning, and training in isolated networks. AKS also automates resource scaling, allowing for the dynamic addition and removal of container instances to more efficiently utilize hardware resources, including GPUs, which are critical for AI workloads. This provides consistent Azure experience in managing Kubernetes clusters and AI workloads with the same tooling and processes in connected environments. Get Started: Resources and Next Steps Microsoft is excited to announce the upcoming preview of Disconnected Operations for Azure Local in Q3 ‘CY25 for both Commercial and Government Cloud customers. To Learn more, please visit Disconnected operations for Azure Local overview (preview) - Azure Local Ready to participate? Get Qualified! or contact your Microsoft account team. Please also check out this session at Microsoft Build https://build.microsoft.com/en-US/sessions/BRK195 by Mark Russinovich, one of the most influential minds in cloud computing. His insights into the latest Azure innovations, the future of cloud architecture and computing, is a must-watch event!1.8KViews7likes3CommentsUpgrade Azure Local operating system to new version
Today, we’re sharing more details about the end of support for Azure Local, with OS version 25398.xxxx (23H2) on October 31, 2025. After this date, monthly security and quality updates stop, and Microsoft Support remains available only for upgrade assistance. Your billing continues, and your systems keep working, including registration and repair. There are several options to upgrade to Azure Local, with OS version 26100.xxxx (24H2) depending on which scenario applies to you. Scenario #1: You are on Azure Local solution, with OS version 25398.xxxx If you're already running the Azure Local solution, with OS version 25398.xxxx, there is no action required. You will automatically receive the upgrade to OS version 26100.xxxx via a solution update to 2509. Azure Local, version 23H2 and 24H2 release information - Azure Local | Microsoft Learn for the latest version of the diagram. If you are interested in upgrading to OS version 26100.xxxx before the 2509 release, there will be an opt-in process available in the future with production support. Scenario #2: You are on Azure Stack HCI and haven’t performed the solution upgrade yet Scenario #2a: You are still on Azure Stack HCI, version 22H2 With the 2505 release, a direct upgrade path from version 22H2 OS (20349.xxxx) to 24H2 OS (26100.xxxx) has been made available. To ensure a validated, consistent experience, we have reduced the process to using the downloadable media and PowerShell to install the upgrade. If you’re running Azure Stack HCI, version 22H2 OS, we recommend taking this direct upgrade path to the version 24H2 OS. Skipping the upgrade to the version 23H2 OS will be one less upgrade hop and will help reduce reboots and maintenance planning prior to the solution upgrade. After then, perform post-OS upgrade tasks and validate the solution upgrade readiness. Consult with your hardware vendor to determine if version 24H2 OS is supported before performing the direct upgrade path. The solution upgrade for systems on the 24H2 OS is not yet supported but will be available soon. Scenario #2b: You are on Azure Stack HCI, version 23H2 OS If you performed the upgrade from Azure Stack HCI, version 22H2 OS to version 23H2 OS (25398.xxxx), but haven’t applied the solution upgrade, then we recommend that you perform post-OS upgrade tasks, validate the solution upgrade readiness, and apply the solution upgrade. Diagram of Upgrade Paths Conclusion We invite you to identify which scenarios apply to you and take action to upgrade your systems. On behalf of the Azure Local team, we thank you for your continuous trust and feedback! Learn more To learn more, refer to the upgrade documentation. For known issues and remediation guidance, see the Azure Local Supportability GitHub repository.1.7KViews4likes7CommentsIntroducing Azure Local: cloud infrastructure for distributed locations enabled by Azure Arc
Today at Microsoft Ignite 2024 we're introducing Azure Local, cloud-connected infrastructure that can be deployed at your physical locations and under your operational control. With Azure Local, you can run the foundational Azure compute, networking, storage, and application services locally on hardware from your preferred vendor, providing flexibility to meet your requirements and budget.83KViews24likes26CommentsAnnouncing general availability of workload orchestration: simplifying edge deployments at scale
We’re excited to announce the General Availability of workload orchestration, a new Azure Arc capability that simplifies how enterprises deploy and manage Kubernetes-based applications across distributed edge environments. Organizations across industries, such as manufacturing, retail, healthcare, face challenges in managing varied site-specific configurations. Traditional methods often require duplicating app variants—an error-prone, costly, and hard-to-scale approach. Workload orchestration solves this with a centralized, template-driven model: define configurations once, deploy them across all sites, and allow local teams to adjust within guardrails. This ensures consistency, improves speed, reduces errors, and scales with your CI/CD workflows—whether you’re supporting 200+ factories, offline retail clusters, or regionally-compliant hospital apps. Fig 1.0: Workload orchestration – Key features Key benefits of workload orchestration include: Solution Configuration & Template Reuse Define solutions, environments, and multiple hierarchy levels using reusable templates. Key-value stores and schema-driven inputs allow flexible configurations, validations with role-based access to maintain control. Context-Aware Deployments Automatically generate deployable artifacts based on selected environments (Dev, QA, Prod) and push changes safely through a git ops flow — enabling controlled rollouts and staged testing across multiple environments. Deploying at Scale in Constrained Environments Deploy workloads across edge and cloud environments with built-in dependency management and preloading of container images (a.k.a Staging) to minimize downtime during narrow maintenance windows. Bulk Deployment and Git Ops-Based Rollouts Execute large-scale deployments — including shared or dependent applications — across multiple sites using Git-based CI/CD pipelines that validate configurations and enforce policy compliance before rollout. End to End Observability K8 diagnostics in workload orchestration provide full-stack observability by capturing container logs, Kubernetes events, system logs, and deployment errors—integrated with Azure Monitor and Open Telemetry pipelines for proactive troubleshooting across edge and cloud environments. Who Is It For? Workload orchestration supports two primary user personas: IT Admins and DevOps Engineers: Responsible for initial setup and application configuration via CLI. OT Operators: Use the portal for day-to-day activities like monitoring deployments and adjusting configurations. Resources for You to Get Started You can start using workload orchestration by visiting the Azure Arc portal and following the documentation. We encourage you to try it with a small application deployed to a few edge sites. Create a template, define parameters like site name or configuration toggles, and run a deployment. As you grow more comfortable, expand to more sites or complex applications.644Views3likes0CommentsEmpowering the Physical World with AI
Unlocking AI at the Edge with Azure Arc The integration of AI into the physical environment is revolutionizing ways we interact with and navigate the world around us. By embedding intelligence into edge devices, AI is not just processing data—it is defining how machines perceive, reason, and act autonomously in real-world scenarios. AI at the edge is transforming how we interact with our environment, driven by critical factors such as data sensitivity, local regulations, compliance, low latency requirements, limited network connectivity, and cost considerations. Added to this, the emergence of new, powerful agentic AI capabilities enables autonomous and adaptive real-time operations, making AI an indispensable tool in reshaping the physical world. Customers’ Use Cases By embedding AI into edge operations, industries are unlocking transformative efficiencies and innovations. In manufacturing, edge-powered AI enables real-time quality control and predictive maintenance, minimizing downtime and maximizing productivity. In retail, AI enhances customer experiences with personalized recommendations and streamlined inventory management. Similarly, finance leverages AI's capabilities for robust fraud detection and advanced risk management. Moreover, sectors like government and defense are increasingly adopting edge AI for safety-critical applications, enabling autonomous, real-time surveillance and response solutions that are both efficient and resilient. These advancements are paving the way for scalable, adaptive solutions that meet the unique demands of diverse operational environments. Azure’s Adaptive Cloud Approach enabling AI from cloud to edge Building on the promise to unify cloud and edge, Azure’s adaptive cloud approach is empowering teams to develop and scale AI workloads seamlessly across diverse environments. By enabling a unified suite of services tailored for modern AI applications, whether deployed in public clouds or distributed locations, Azure Arc enables streamlined operations with enhanced security and resilience. Central to extending AI services to the edge is our commitment to adaptive, scalable, and efficient solutions tailored to diverse operational needs. Azure Arc plays a key role in this vision by facilitating seamless deployment and management of AI workloads across various environments. This week, we’re excited to share that a subset of Microsoft Azure AI Foundry models, such as Phi and Mistral have been rigorously validated to run on Azure Local enabled by Azure Arc. Our investments are reflected in two primary areas: Foundational tools for MLOps and developer frameworks, which empower teams to build robust AI applications Intuitive, end-to-end low-code experiences designed for data analysts and solution developers. These low-code tools prioritize user-friendly interfaces and rapid deployment, enabling the creation of solutions with just a few clicks. This dual focus ensures enterprises can fully harness the potential of edge AI while maintaining flexibility and operational efficiency. Image 1: This high-level diagram illustrates our vision for the cloud to edge AI workloads, enabled by Azure Arc. Some components (agents and integration with AI Foundry and Foundry Local) are still under development, while others are more advanced and have been released to the market. Build 2025: New Capabilities and Releases This strategic vision is now being realized through a wave of new capabilities unveiled at Build 2025. These innovations are designed to accelerate edge AI adoption and simplify the developer experience—making it easier than ever to build, deploy, and manage intelligent applications across hybrid environments. Announcements related to developer Building blocks: Kubernetes AI Toolchain Orchestrator (KAITO), enabled by Azure Arc (public preview) Foundry Local (public preview) for Windows apps to be deployed on any client device read more here. Workload orchestration (public preview) Application development tools for Kubernetes enabled by Arc (public preview) Refer to this blog to read more: https://aka.ms/AdaptiveCloudBuild2025 Announcements related to End-to-end experiences: Edge RAG, enabled by Azure Arc is now available in public preview. Azure AI Video Indexer for recorded files, enabled by Arc is generally available since April 2025. Azure AI Video Indexer for live video analysis, enabled by Arc is available in private preview, for limited set of customers Customer scenarios: enabling search and retrieval for on-premises data on Azure Local Edge RAG targets customers who have data that needs to stay on premises due to data gravity, security and compliance, or latency requirements. We have observed significant and consistent interest from highly regulated sectors. These entities are exploring the use of RAG capabilities in disconnected environments through Azure Local. DataON is a hybrid cloud computing company for enterprises of all sizes, with a focus on educational institutions and local government agencies. Recently, they have worked with the their customers to successfully deploy our RAG solution on CPU and GPU clusters and begin testing with sample end-customer data. “DataON has been actively exploring how Edge RAG can enhance our Microsoft Azure Local solutions by providing more efficient data retrieval and decision-making capabilities. It’s exciting to be part of the private preview program and see firsthand how Edge RAG is shaping the future of data-driven insights.” Howard Lo | VP, Sales & Marketing | DataON This capability brings generative AI and RAG to on-premises data. Edge RAG was validated on AKS running on Azure Local. Based on DataON and other customer feedback, we have expanded the version to include new features: Model Updates: Ability to use any model compatible with OpenAI Inferencing standard APIs Multi-lingual support: 100+ common languages for document ingestion and question-answer sessions Multi-modal support: Support for image ingestion & retrieval during question-answer sessions Search Types: Support for Text, Vector, Hybrid Text & Hybrid Text+Image searches Ingestion Scale-out: Integration with KEDA for fully parallelized, high-throughput ingestion pipeline Evaluation Workflow with RAG Metrics: Integrated workflow with built-in or customer-provided sample dataset Read more about Edge RAG in this blog: https://aka.ms/AzureEdgeAISearchenabledbyArc. AI Workloads for Disconnected Operations In fully disconnected (air-gapped or non-internet) environments, such as those often found in government and defense sectors, technologies like RAG, can be deployed on-premises or in secure private clouds. This is currently available with limited access. Use Cases: Video analysis: Automatically analyzes video and audio content to extract metadata such as objects and scenes. Use cases include live video and analysis, mission debriefing and training, and modern safety. Models consumption: A central repository for securely managing, sharing, and deploying AI/ML models. Use cases: model governance, rapid deployment of mission-specific models, and inter-agency collaboration. Retrieval-Augmented Generation (RAG): Combines LLMs with a document retrieval system to generate accurate, context-aware responses based on internal knowledge bases. Use cases include field briefings, legal and policy compliance, and cybersecurity incident response. Transforming Industries with AI: Real-World Stories from the Edge Across industries, organizations are embracing AI to solve complex challenges, enhance operations, and deliver better outcomes. From healthcare to manufacturing, retail to energy, and even national security, Azure AI solutions are powering innovation at scale. In the manufacturing sector, a global company sought to optimize production and reduce costly downtime. Azure AI Video Indexer monitored video feeds from production lines to catch defects early, while custom predictive maintenance models from the Model Catalog helped prevent equipment failures. RAG provided real-time insights into operations, empowering managers to make smarter decisions by asking questions. These tools collectively boosted efficiency, minimized downtime, and improved product quality. At Airports, Azure AI helped enhance passenger experience and safety. From monitoring queue lengths and tracking vehicles to detecting falls and identifying restricted area breaches, the combination of Azure Local, Video Indexer, Azure IoT for Operations, and custom AI created a smarter, safer airport environment. Retailers, too, are reaping the benefits. A major retail chain used Azure AI to understand in-store customer behavior through video analytics, optimize inventory with demand forecasting models, and personalize shopping experiences using RAG. These innovations led to better customer engagement, streamlined inventory management, and increased sales. In Healthcare, a leading provider operating multiple hospitals and clinics nationwide faced the daunting task of analyzing massive volumes of patient data—from medical records and imaging to real-time feeds from wearable devices. With strict privacy regulations in play, they turned to Azure AI. Using Azure AI Video Indexer, they analyzed imaging data like X-rays and MRIs to detect anomalies. The Model Catalog enabled predictive analytics to identify high-risk patients and forecast readmissions. Meanwhile, Retrieval-Augmented Generation (RAG) gave doctors instant access to patient histories and relevant medical literature. The result? More accurate diagnoses, better patient care, and full regulatory compliance. These stories highlight how Azure Arc enabled AI workloads are not just a set of tools—they are a catalyst for transformation. Whether it’s saving lives, improving safety, or driving business growth, the impact is real, measurable, and growing every day. Learn More Whether you are tuning in online or joining us in person, we wish you a fun and exciting Build 2025! The advancements in AI at the edge are set to revolutionize how we build, deploy, and manage applications, providing greater speed, agility, and security for businesses around the world. Recommended Build Sessions: Breakout session (BRK188): Power your AI apps across cloud and edge with Azure Arc Breakout session (BRK183): Improving App Health with Health Modeling and Chaos Engineering Breakout session (BRK 195): Inside Azure innovations with Mark Russinovich Breakout session (BRK 168): AI and Agent Observability in Azure AI Foundry and Azure Monitor1.4KViews2likes0CommentsJumpstart LocalBox 25H2 Update
LocalBox delivers a streamlined, one-click sandbox experience for exploring the full power of Azure Local. With this 25H2 release, we are introducing support for Azure VM Spot pricing for the LocalBox Client VM, removed service principal dependency and transitioned to Managed Identity, added support for deploying the LocalBox Client VM and Azure Local instance in separate regions, added dedicated PowerShell modules and updated LocalBox to the Azure Local 2505 release - making it possible for you to evaluate a range of new features and enhancements that elevate the functionality, performance, and user experience. Following our LocalBox rebranding last month, today, we are thrilled to announce our second major update – LocalBox 25H2! Key Azure Local Updates Azure Local 2505 Solution Version In this release, we have updated the base image for the Azure Local nodes to the 2505 solution version. Started in the previous Azure Local 2504 release, a new operating system was introduced for Azure Local deployments. For 2505, all new deployments of Azure Local will run the new OS version 26100.4061. This unlocks several new features: Registration and Deployment Updates Starting with this release, you can now download a specific version of Azure Local software instead of just the latest version. For each upcoming release, you will be able to choose from up to the last six supported versions. Security Updates The Dynamic Root of Trust for Measurement (DRTM) is enabled by default for all new 2504 deployments running OS version 26100.3775. Azure Local VM Updates Data disk expansion - You can now expand the size of a data disk attached to an Azure Local VM. For more information, see Expand the size of a data disk attached to an Azure Local VM. Live VM migration with GPU partitioning (GPU-P) - You can now live migrate VMs with GPU-P. You can read more about what is new in Azure Local 2505 in the documentation. Jumpstart LocalBox 25H2 Updates Features Cost Optimizations with Azure Spot VM Support LocalBox now supports enabling Azure VM Spot pricing for the Client VM, allowing users to take advantage of cost savings on unused Azure capacity. This feature is ideal for workloads that can tolerate interruptions, providing an economical option for testing and dev environments. By leveraging Spot pricing, users can significantly reduce their operational costs while maintaining the flexibility and scalability offered by Azure. You may leverage the advisor on the Azure Spot Virtual Machine pricing page to estimate costs for your selected region. Here is an example for running the LocalBox Client Virtual Machine in the East US region: The new deployment parameter enableAzureSpotPricing is disabled by default, so users who wants to take advantage of this capability will need to opt-in. Visit the LocalBox FAQ to see the updated price estimates for running LocalBox in your environment. Deploying the LocalBox Client VM and Azure Local Instance in Separate Regions Our users have been sharing with us feedback around the Azure capacity requirements of deploying LocalBox, specifically when it comes to regions with sufficient compute capacity (vCPU quotas) for the VM SKU (Standard E32s v5/v6) used in LocalBox. To address this, we have now introduced a new parameter for specifying the region the Azure Local instance resources will be deployed to. In the following example, LocalBox is deployed into Norway East while the Azure Local instance is deployed into Australia East. In practice, this makes it possible to deploy LocalBox into any region where users have sufficient vCPU quotas available for the LocalBox VM SKU (Standard E32s v5/v6). Enhanced Security - Support for Azure Managed Identity We have now introduced an enhanced security posture by removing the Service Principal Names (SPN) user requirement in favor of Azure Managed Identity at deployment time. This follows the same pattern we introduced in Jumpstart ArcBox last year, and now, with Azure Local fully support deployments without an SPN, we are excited to share this update in LocalBox. Dedicated PowerShell modules Arc Jumpstart has been evolving and growing significantly since its beginning more than 5 years ago. As the code base is growing, we see the opportunity to consolidate common code and separate large scripts into modules. Our first PowerShell module, Azure.Arc.Jumpstart.Common, was moved into its own repository — and was published to the PowerShell Gallery via a GitHub Actions workflow last month. 💥 With this LocalBox release, we have also separated functions in LocalBox into the newly dedicated Azure.Arc.Jumpstart.LocalBox module. Both modules are now installed during provisioning and leveraged in automation scripts. While these modules are targeted for use in automation, it makes the scripts readable for those who want to understand the logic and potentially contribute with bugfixes or new functionality. What we’ve achieved: ✅ New repo structure for PowerShell modules ✅ CI/CD pipeline using GitHub Actions ✅ Cross-platform testing on Linux, macOS, Windows PowerShell 5.1 & 7 ✅ Published module to PowerShell Gallery ✅ Sampler module used to scaffold and streamline the module structure 🎯 This is a big step toward better reusability and scalability for PowerShell in Jumpstart scenarios. As we continue building out new use cases, having this modular foundation will keep things clean, maintainable, and extensible. Check out our SDK repository on GitHub and the modules on PowerShell Gallery. Other Quality of Life Improvements We appreciate the continued feedback from the Jumpstart community and have incorporated several smaller changes to make it easier for users to deploy LocalBox. These include, but are not limited to: Added configuration of start/shutdown settings for the nested VMs to make sure that they are shutdown properly and started in the correct order when the LocalBox Client VM is stopped and started. Moved the manual deployment steps to a separate page for clarity Added information about the Pester-tests in the Troubleshooting section, including how to open the log-file to see which tests have failed Added shortcut to Hyper-V Manager on the desktop in the LocalBox Client VM Getting started! The latest update to LocalBox not only focuses on new features but also on enhancing overall cost and deployment experience. We invite our community to explore these new features and take full advantage of the enhanced capabilities of LocalBox. Your feedback is invaluable to us, and we look forward to hearing about your experiences and insights as you navigate these new enhancements. Get started today by visiting aka.ms/JumpstartLocalBox!960Views4likes2CommentsPublic Preview: Deploy OSS Large Language Models with KAITO on AKS on Azure Local
Announcement Along with Kubernetes AI Toolchain Operator (KAITO) on AKS GA release, we are thrilled to announce Public Preview refresh for KAITO on AKS on Azure Local. Customers can now enable KAITO as a cluster extension on AKS enabled by Azure Arc as part of cluster creation or day 2 using Az CLI. The seamless enablement experience makes it easy to get started with LLM deployment and fully consistent with AKS in the cloud. We also invest heavily to reduce frictions in LLM deployment such as recommending the right GPU SKU, validating preset models with GPUs and avoiding Out of Memory errors, etc. KAITO Use Cases Many of our lighthouse customers are exploring exciting opportunities to build, deploy and run AI Apps at the edge. We’ve seen many interesting scenarios like Pipeline Leak detection, Shrinkage detection, Factory line optimization or GenAI Assistant across many industry verticals. All these scenarios need a local AI model with edge data to satisfy low latency or regulatory requirements. With one simple command, customers can quickly get started with LLM in the edge-located Kubernetes cluster, and ready to deploy OSS models with OpenAI-compatible endpoints. Deploy & fine-tune LLM declaratively With KAITO extension, customers can author a simple YAML for inference workspace in Visual Studio Code or any text editor and deploy a variety of preset models ranging from Phi-4, Mistral, to Qwen with kubectl on any supported GPUs. In addition, customers can deploy any vLLM compatible text generation model from Hugging Face or even private weights models by following custom integration instructions. You can also customize base LLMs in the edge Kubernetes with Parameter Efficient Fine Tuning (PEFT) using qLoRA or LoRA method, just like the inference workspace deployment with YAML file. For more details, please visit the product documentation and KAITO Jumpstart Drops for more details. Compare and evaluate LLMs in AI Toolkit Customers can now use AI Toolkit, a popular extension in Visual Studio Code, to compare and evaluate LLMs whether it’s local or remote endpoint. With AI Toolkit playground and Bulk Run features, you can test and compare LLMs side by side and find out which model fits the best for your edge scenario. In addition, there are many built-in LLM Evaluators such as Coherence, Fluency, or Relevance that can be used to analyze model performance and generate numeric scores. For more details, please visit AI Toolkit Overview document. Monitor inference metrics in Managed Grafana The KAITO extension defaults to vLLM inference runtime. With vLLM runtime, customers can now monitor and visualize inference metrics with Azure Managed Prometheus and Azure Managed Grafana. Within a few configuration steps, e.g., enabling the extensions, labeling inference workspace, creating Service Monitor, the vLLM metrics will show up in Azure Monitor Workspace. To visualize them, customers can link the Grafana dashboard to Azure Monitor Workspace and view the metrics using the community dashboard. Please view product document and vLLM metric reference for more details. Get started today The landscape of LLM deployment and application is evolving at lightning speed - especially in the world of Kubernetes. With the KAITO extension, we're aiming to supercharge innovation around LLMs and streamline the journey from ideation to model endpoints to real-world impact. Dive into this blog as well as KAITO Jumpstart Drops to explore how KAITO can help you get up and running quickly on your own edge Kubernetes cluster. We’d love to hear your thoughts - drop your feedback or suggestions in the KAITO OSS Repo!836Views4likes2CommentsPublic Preview: Workload orchestration simplifying edge deployment at scale
Public Preview Announcement - workload orchestration Introduction: As enterprises continue to scale their edge infrastructure, IT teams face growing complexity in deploying, managing, and monitoring workloads across distributed environments. Today, we are excited to announce the Public Preview of workload orchestration — a purpose-built platform that redefines configuration and deployment management across enterprise environments. Workload orchestration is designed to help you centrally manage configurations for applications deployed in diverse locations (from factories and retail stores to restaurants and hospitals) while empowering on-site teams with flexibility. Modern enterprises increasingly deploy Kubernetes-based applications at the edge, where infrastructure diversity and operational constraints are the norm. Managing these with site-specific configurations traditionally requires creating and maintaining multiple variants of the same application for different sites – a process that is costly, error-prone, and hard to scale. Workload orchestration addresses this challenge by introducing a centralized, template-driven approach to configuration. With this platform, central IT can define application configurations once and reuse them across many deployments, ensuring consistency and compliance, while still allowing site owners to adjust parameters for their local needs within controlled guardrails. The result is a significantly simplified deployment experience that maintains both central governance and localized flexibility. Key features of workload orchestration The public preview release of workload orchestration includes several key innovations and capabilities designed to simplify how IT manages complex workload deployments: Powerful Template Framework & Schema Inheritance: Define application configurations and schemas one time and reuse or extend them for multiple deployments. Workload orchestration introduces a templating framework that lets central IT teams create a single source of truth for app configurations, which can then be inherited and customized by different sites as needed. This ensures consistency across deployments and streamlines the authoring process by eliminating duplicate work. Dependent Application Management: Manage and deploy interdependent applications seamlessly using orchestrated workflows. The platform supports configuring and deploying apps with dependencies via a guided CLI or an intuitive portal experience, reducing deployment friction and minimizing errors when rolling out complex, multi-tier applications. Custom Validation Rules: Ensure every configuration is right before it’s applied. Administrators can define pre-deployment validation expressions (rules) that automatically check parameter inputs and settings. This means that when site owners customize configurations, all inputs are validated against predefined rules to prevent misconfigurations, helping to reduce rollout failures. External Validation Rules: External validation enables you to verify the solution template through an external service, such as an Azure Function or a webhook. The external validation service receives events from the workload orchestration service and can execute custom validation logic. This design pattern is commonly used when customers require complex validation rules that exceed data type and expression-based checks. It allows the implementation of business-specific validation logic, thereby minimizing runtime errors. Integrated Monitoring & Unified Control: Track and manage deployments from a single pane of glass. Workload orchestration includes an integrated monitoring dashboard that provides near real-time visibility into deployment progress and the health of orchestrated workloads. From this centralized interface, you can pause, retry, or roll back deployments as needed, with full logging and compliance visibility for all actions. Enhanced Authoring Experience (No-Code UI with RBAC): We’ve built a web-based orchestration portal that offers a no-code configuration authoring experience. Configuration managers can easily define or update application settings via an intuitive UI – comparing previous configuration revisions side by side, copying values between versions, and defining hierarchical parameters with just a few clicks. This portal is secured with role-based access control (RBAC) and full audit logging, so non-developers and local operators can safely make approved adjustments without risking security or compliance. CLI and Automation Support: For IT admins and DevOps engineers, workload orchestration provides a command-line interface (CLI) optimized for automation. This enables scripted deployments and environment bootstrapping. Power users can integrate the orchestration into CI/CD pipelines or use it to programmatically manage application lifecycles across sites, using familiar CLI commands to deploy or update configurations in bulk. Fast Onboarding and Setup: Getting started with orchestrating your edge environments is quick. The platform offers guided setup workflows to configure your organizational hierarchy of edge sites, define user roles, and set up access policies in minutes. This means you can onboard your team and prepare your edge infrastructure for orchestration without lengthy configuration processes. Architecture & Workflow: Workload orchestration is a service built with cloud and edge components. At a high level, the cloud control plane of workload orchestration provides customers and opportunity to use a dedicated resource provider to define templates centrally which WO edge agents consume and contextualize based on required customization needed at edge locations. The overall object model is embedded in Azure Resource Manager thus providing customers fine grained RBAC (Role Based Access Control) for all workload orchestration resources. The key actions to manage WO are governed by an intuitive CLI and portal experience. There is also a simplified no code experience for non-technical onsite staff for authoring, monitoring and deploying solution with contextualized configurations. Important Details & Limitations: Preview Scope: During public preview, workload orchestration supports Kubernetes-based workloads at the edge (e.g., AKS edge deployments or Arc-enabled Kubernetes clusters). Support for other types of workloads or cloud VMs is coming soon. Regions and Availability: The service is available in East US and East US2 regions during preview. Integration Requirements: Using workload orchestration with your edge Kubernetes clusters require them to be connected (e.g., via Azure Arc) for full functionality. Getting Started with workload orchestration Availability: Workload orchestration is available in public preview starting 19 th May, 2025. For access to public preview, please complete the form to get access for your subscription or share your subscription details over email at configmanager@service.microsoft.com. Once you have shared the details, the team will get back to you with an update on your request! Try it Out: We encourage you to try workload orchestration with one of your real-world scenarios. A great way to start is to pick a small application that you typically deploy to a few edge sites and use the orchestration to deploy it. Create a template for that app, define a couple of parameters (like a site name or a configuration toggle), and run a deployment to two or three test sites. This hands-on trial will let you experience first-hand how the process works and the value it provides. As you grow more comfortable, you can expand to more sites or more complex applications. Because this is a preview, feel free to experiment — you can deploy to non-production clusters or test environments to see how the orchestration fits your workflow. Feedback and Engagement We’d love to hear your feedback! As you try out workload orchestration, please share your experiences, questions, and suggestions. You can leave a comment below this blog post – our team will be actively monitoring and responding to comments throughout the preview. Let us know what worked well, what could be improved, and any features you’d love to see in the future. Your insights are incredibly valuable to us and will help shape the product as we progress toward General Availability. If you encounter any issues or have urgent feedback, you can also engage with us through the following channels: Email at configmanager@service.microsoft.com or fill up the form at WOfeedback for feedback Email at configmanager@service.microsoft.com or fill up the form at WOReportIssuees for reporting issues Contact your Microsoft account representative or support channel and mention “workload orchestration Public Preview” – they can route your feedback to us as well. Occasionally, we may reach out to select preview customers for deeper feedback sessions or to participate in user research. If you’re interested in that, please mention it in your comment or forum post. We truly consider our preview users as co-creators of the product. Many of the features and improvements in workload orchestration have been influenced by early customer input. So, thank you in advance for sharing your thoughts and helping us ensure that this platform meets your needs! (Reminder: Since this is a public preview, it is not meant for production use yet. If you do decide to use it in a production scenario, do so with caution and be aware of the preview limitations. We will do our best to assist with any issues during preview). Learn More To help you get started and dive deeper into workload orchestration, we’ve prepared a set of resources: Workload orchestration Documentation – Overview and how-to guides: Learn about the architecture, concepts, and step-by-step instructions for using workload orchestration in our official docs. [WO documentation] Quick Start: Deploy Your First Application – Tutorial: Follow a guided tutorial to create a template and deploy a sample application to a simulated edge cluster using workload orchestration. [Quickstart] CLI Reference – Command reference: Detailed documentation of all workload orchestration CLI commands with examples. [CLI reference] Conclusion: We’re thrilled for you to explore workload orchestration and see how it can transform your edge deployment strategy. This public preview is a major step towards simplifying distributed workload management, and your participation and feedback are key to its success.917Views2likes0CommentsUnlocking AI Apps Across Boundaries with Azure
As we open the doors to Microsoft Build 2025, I’m thrilled to share the newest releases in our effort to enable teams to more rapidly develop and scale applications across boundaries: app development tools for Kubernetes (public preview), Kubernetes AI Toolchain Orchestrator [KAITO] (public preview), Foundry Local (public preview), workload orchestration (public preview) and Retrieval-Augmented Generation (RAG) capabilities on Azure Local (public preview). With our adaptive cloud approach, we offer a unified set of capabilities to enable your AI applications—whether they’re deployed to the public cloud, in hybrid environments, or at distributed edge locations. These capabilities include tools developers use every day, such as Visual Studio Code, to help build AI applications faster, better, and with greater security and resilience than ever before. Microsoft's Adaptive cloud approach to more rapidly developing and scaling applications across boundaries These new additions complement existing capabilities from Azure Arc for Kubernetes and Azure Kubernetes Service (AKS) enabled by Azure Arc, that support the hosting of containerized workloads, now with key capabilities designed to help expedite the creation of AI applications from model selection to edge-ready cluster provisioning (with GPU nodes), automated model deployment, lifecycle management and more. By combining KAITO with Azure Arc and Foundry Local in your workflow, Microsoft provides you with a more unified, flexible platform for building and running intelligent applications across boundaries. Learn more about our Arc-enabled AI story here. To help accelerate your adoption of cloud-native capabilities in distributed environments, Kubernetes-based app development tools extend essential services—such as container storage and secrets synchronization—to edge-located clusters. And we plan to expand this set of services in the future. This integration simplifies the deployment and management of applications across hybrid and multi-cloud environments. By unifying infrastructure and application lifecycle management, it empowers teams to move faster while maintaining consistency, security, and visibility. More details on each of these releases below. Here’s a glimpse of what they can mean for you, your workflow and your company. Many of these services are already making a difference for application teams at customers like Domino’s, Coles, Chevron, and Dick’s Sporting Goods. Providing them with greater speed and agility, as they build the solutions their customers and teams need. As customers continue to modernize their applications across hybrid, multi-cloud and distributed environments, many rely on trusted solutions from independent software vendors (ISVs). This is designed to help accelerate this journey—enabling partners to build, validate, and publish Arc-enabled Kubernetes applications directly to the Azure Marketplace. Building on the momentum from our initial launch at last year's Ignite, I'm excited to introduce a new wave of partner solutions to the Azure Arc ISV Partner Program. This latest expansion brings not only new partners, but also entirely new solution categories to the Azure Marketplace—including Security, Networking & Service Mesh, API Infrastructure & Management, and Monitoring & Observability. With just a few clicks, customers can now deploy enterprise-grade tools like HashiCorp Vault Enterprise, Istio by Solo.io, Traefik’s API stack, and Dynatrace Operator directly onto their Arc-enabled Kubernetes clusters. These additions to the Azure Arc ISV Partner Program reflect our commitment to supporting the full spectrum of cloud-native application needs. Explore the growing ecosystem of Arc-enabled solutions in the Azure Marketplace. RELEASES Here’s a recap of some of our newest feature releases that support our Adaptive cloud approach. App development tools for Kubernetes | Public Preview Kubernetes clusters enabled by Azure Arc helps power our adaptive cloud strategy. We are extending a set of fundamental, services that are fully validated, managed and deployed by Arc. The initial set of these services includes Azure Container Storage enabled by Azure Arc and Azure Key Vault . In the future we will be expanding and adding more of these foundational services. In addition, a Visual Studio Code extension is available for developers to kick start Kubernetes application development and turn their Kubernetes apps into Arc-enabled applications. This toolkit provides code samples and an environment to build, test and deploy Kubernetes applications. Figure 1: app development tools for Kubernetes in Azure Arc Retrieval-Augmented Generation (RAG) capabilities on Azure Local | Public Preview Edge RAG on Azure Local is a is a turnkey service, Azure Arc-enabled solution that brings Retrieval-Augmented Generation (RAG) capabilities to on-premises environments. It can help customers to build, evaluate, and deploy generative AI applications—like custom chat assistants—directly on their local data, without sending it to the cloud. This release is especially valuable for industries like manufacturing and healthcare, where data sovereignty, low latency, and IP protection are important. By supporting customer local deployment of language models, more secure data ingestion, and built-in tools for prompt engineering and evaluation, these capabilities help empower organizations to unlock AI insights while maintaining more control over their data. KAITO extension for AKS on Azure Local | Public Preview Kubernetes AI Toolchain Operator (KAITO) enabled by Azure Arc is designed to help simplify and scale AI model deployment across hybrid and edge environments. It enables developers to declaratively deploy AI models—whether from Microsoft’s AI Foundry, third-party hubs like Hugging Face, or customer-provided sources—on Arc-enabled Kubernetes clusters. It helps customers bring cloud-native AI capabilities to the edge, enable low-latency inference, more consistent lifecycle management, and operational control across diverse infrastructure. Try it out today using the “KAITO & AKS Arc” Jumpstart Drop! Figure 2: Deploy AI models on AKS in hybrid and edge environments using KAITO Workload orchestration | Public Preview Workload orchestration provides a centralized, template-driven platform for managing application configurations across distributed edge environments. It enables IT teams to define reusable templates, manage interdependent applications, and enforce custom validation rules—both built-in and external. It also includes integrated monitoring, a no-code authoring portal with RBAC, and CLI support for automation and CI/CD integration. Workload orchestration simplifies complex edge deployments by unifying configuration management and governance, empowering teams to scale faster with consistency, security, and flexibility. Foundry Local | Public Preview Foundry Local is the high-performance local AI runtime stack that helps bring Azure AI Foundry’s power to client devices. It includes CLI, SDK, and a local REST API for model inference, and integrates with the Azure AI Foundry catalog for model access and deployment. It can help provide performance optimizations for Windows and Apple Silicon, and the SDK enables code portability between local and cloud environments. Foundry Local, now available in preview on Windows and macOS, enables the creation and deployment of cross-platform AI applications that help operate models, tools, and agents directly on-device. This eliminates reliance on cloud connectivity and offers more enhanced control and flexibility. FIND US AT BUILD Breakout session (BRK188): Build and Scale your AI apps with Kubernetes and Azure Arc Breakout session (BRK183): Improving App Health with Health Modeling and Chaos Engineering Breakout session (BRK 195): Inside Azure innovations with Mark Russinovich Breakout session (BRK 168): AI and Agent Observability in Azure AI Foundry and Azure Monitor You can also come talk to us about building, deploying and managing applications for the Adaptive cloud at the Expert Meet Up Area. Whether you are tuning in online or joining us in person, I wish you a fun and exciting Build 2025!!1.2KViews1like0CommentsTransforming On-Premises Data with RAG Capabilities on Azure Local
Authored by Sanjana Mohan, Carmel Zolkov, and Moran Assaf, Edge RAG Product Management During Ignite 2024, we explored how Azure’s adaptive cloud approach is reshaping the AI landscape—enabling organizations to build, deploy, and scale AI solutions across hybrid and multicloud environments with consistency and control. That foundation is now evolving with a powerful new capability: Retrieval-Augmented Generation (RAG). RAG represents a pivotal shift in how enterprises can ground generative AI in their own data. By combining the reasoning power of large language models (LLMs) with real-time access to enterprise content, RAG enables more accurate, context-aware, and trustworthy responses. This is especially critical in hybrid environments where data is distributed across on-premises systems, edge locations, and multiple clouds. What is RAG? Retrieval-Augmented Generation (RAG) is a technique in AI that enhances the performance of language models by combining two steps: Retrieve: The model first fetches relevant information from external sources (e.g., documents, databases, or vector indexes). Generate: It then uses this retrieved content to generate more accurate, grounded, and context-aware responses. This approach helps reduce hallucinations, improves factual accuracy, and allows models to work with up-to-date or domain-specific data without retraining We’re excited to further expand RAG capabilities on Azure Local and enable customers to: Ground AI in their own data—whether stored in Azure, on-premises, or across multicloud environments—without needing to move or duplicate it. Maintain data sovereignty and compliance by keeping sensitive data within jurisdictional boundaries while still enabling AI to reason over it. Accelerate time to insight by integrating RAG into existing applications and workflows using Azure Arc. This evolution is part of our broader vision to make Azure the most open, extensible, and intelligent cloud for AI innovation—where your data, wherever it lives, becomes a strategic asset for transformation. RAG on Azure Local Customers can bring their private cloud data to language models to build generative AI applications and create a retrieval system for RAG-based applications. The capability is available as a first-party extension from Azure Arc for Kubernetes, packaging the end-to-end data ingestion and retrieval pipeline. It also includes essential developer features like prompt engineering, evaluation, and monitoring through a local developer portal. Image 1: The chat interface includes options to control the inference model and several parameters, as well as the system prompt that can be adjusted for the specific use case. The RAG capabilities on Azure Local enable organizations to bring Generative AI to their on-premises data, eliminating the necessity of transmitting any information to the cloud. This No-Code/Low-Code experience provides an intuitive interface, allowing users to deploy and manage AI models without the need for extensive programming skills while addressing several critical concerns: Data Privacy and Compliance: Maintains proprietary data on-premises, ensuring adherence to data protection regulations and internal policies. Reduced Latency: Processes data locally, resulting in faster response times essential for real-time applications. Bandwidth Efficiency: Eliminates the requirement to transfer large datasets to the cloud, conserving network resources. Scalability and Flexibility: Utilizes Azure Arc to manage and scale Kubernetes clusters seamlessly across diverse environments. Discovering the Advanced Capabilities of RAG on Azure Local Support for Hybrid Search, and soon Lazy Graph RAG, allowing robust, fast, low-cost indexing and providing quality and relevant answers regardless of query type. Evaluation flows: includes built-in evaluation features to assess the quality and performance of the RAG system. These features support multiple experimentation flows, allowing for concurrent experimentation and evaluation. Multi-Modality: supports multi-modal RAG, which includes handling images, documents, and soon videos. It uses the best parsers available for each media type, focusing on unstructured data hosted on Network File System (NFS) shares. This capability allows for comprehensive data analysis across different formats. Support for multiple languages: 100+ common languages for document ingestion and question-answer sessions Language Models Updates: ensures that language models are kept up to date with each extension update. This means that users will always have access to the latest advancements in language model technology, ensuring optimal performance and accuracy. Managed Responsible AI: ensures features to manage security and regulatory compliance, reducing the burden on developers. It ensures content safety and responsible AI practices are followed, helping developers navigate the complexities of regulatory requirements and maintain high standards of security. Image 2: The capability includes built-in evaluation feature, reducing the operational overhead of building and maintaining custom RAG solutions, based on Phi-4-Multi-Modal. Key Use Cases and Scenarios Deploying RAG at the edge on AKS clusters running on Azure Local empowers organizations to leverage generative AI capabilities while maintaining data sovereignty, ensuring compliance, and reducing latency. Here are some key use cases and scenarios: Financial Services A financial institution can utilize it to process and analyze sensitive data that must remain on-premises due to regulatory constraints. This enables use cases such as: Compliance Checks: Automating the review of transactions and documents to ensure they meet regulatory requirements. Customer Assistance: Providing personalized support and recommendations to customers based on their financial data. Sales Pitch Generation: Creating tailored sales pitches and marketing materials by analyzing customer data and preferences. Manufacturing A manufacturing company can deploy it to enhance operations and support factory floor activities. Key use cases include: Issue Resolution: Reducing the time to resolve issues by providing real-time troubleshooting assistance using local data. Operational Efficiency: Analyzing production data to optimize processes and improve efficiency. Predictive Maintenance: Using historical data to predict equipment failures and schedule maintenance proactively. Public Sector Public sectors can leverage it to derive insights from sensitive on-premises data, enabling various applications such as: Decision Making: Summarizing large datasets to provide actionable insights for quicker decision-making. Training and Education: Creating training materials and educational content by analyzing and summarizing relevant data. Public Safety: Enhancing public safety measures by analyzing local data to identify patterns and predict potential threats. Healthcare Healthcare providers can benefit from deploying it to manage and analyze patient data securely. Use cases include: Patient Care: Providing personalized treatment plans and recommendations based on patient data. Medical Research: Analyzing clinical data to support medical research and development. Operational Management: Improving hospital operations by analyzing data related to patient flow, resource utilization, and more. Retail Retail businesses can use it to enhance customer experiences and optimize operations. Key scenarios include: Personalized Marketing: Creating personalized marketing campaigns based on customer purchase history and preferences. Inventory Management: Analyzing sales data to optimize inventory levels and reduce stockouts. Customer Insights: Gaining insights into customer behavior and preferences to improve product offerings and services. By deploying RAG on Azure Local, organizations across various industries can harness the power of generative AI while ensuring data remains secure and compliant with local regulations. Resources Read this blog post about AI workloads running on Azure Local Learn more and see how you can get started here.1.3KViews3likes0Comments