Extending Azure's AI Platform with an adaptive cloud approach

Microsoft

Nov 19, 2024

Authored by Derek Bogardus and Sanjana Mohan, Azure Edge AI Product Management

Ignite 2024 is here, and nothing is more top of mind for customers than the potential to transform their businesses with AI wherever they operate. Today, we are excited to announce the preview of two new Arc-enabled services that extend the power of Azure’s AI platform to on-premises and edge environments.

An adaptive cloud approach to AI

The goal of Azure’s adaptive cloud approach is to extend just enough Azure to customers’ distributed environments. For many of these customers, valuable data is generated and stored locally, outside of the hyperscale cloud, whether due to regulation, latency, business continuity, or simply the large volume of data being generated in real time.

AI inferencing can only occur where the data exists. So, while the cloud has become the environment of choice for training models, we see a tremendous need to extend inferencing services beyond the cloud to enable complete cloud-to-edge AI scenarios.

Search on-premises data with generative AI

Over the past couple of years, generative AI has come to the forefront of AI innovation. Language models give any user the ability to interact with large, complex data sets in natural language.

Public tools like ChatGPT are great for queries about general knowledge, but they can’t answer questions about private enterprise data on which they were not trained.

Retrieval Augmented Generation, or "RAG", helps address this need by augmenting language models with private data. Cloud services like Azure AI Search and Azure AI Foundry simplify how customers can use RAG to ground language models in their enterprise data.

Today, we are announcing the preview of a new service that brings generative AI and RAG to your data at the edge.

Within minutes, customers can deploy an Arc extension that contains everything needed to start asking questions about their on-premises data, including:

Popular small and large language models running locally with support for both CPU and GPU hardware

A turnkey data ingestion and RAG pipeline that keeps all data completely local, with RBAC controls to prevent unauthorized access

An out-of-the-box prompt engineering and evaluation tool to find the best settings for a particular dataset

Azure-consistent APIs to integrate into business applications, as well as a pre-packaged UI to get started quickly

This service is available now in gated private preview for customers running Azure Local infrastructure, and we plan to make it available on other Arc-enabled infrastructure platforms in the near future.

Deploy curated open-source AI models via Azure Arc

Another great thing about Azure’s AI platform is that it provides a catalog of curated AI models that are ready to deploy and provide consistent inferencing endpoints that can be integrated directly into customer applications. This not only makes deployment easy, but customers can also be confident that the models are secure and validated

These same needs exist on the edge as well, which is why we are now making a set of curated models deployable directly from the Azure Portal. These models have been selected, packaged, and tested specifically for edge deployments, and are currently available on Azure Local infrastructure.

Phi-3.5 Mini (3.8 billion parameter language model)

Mistral 7B (7.3 billion parameter language model)

MMDetection YOLO (object detection)

OpenAI Whisper Large (speech to text)

Google T5 Base (translation)

Models can be deployed from a familiar Azure Portal wizard to an Arc AKS cluster running on premises. All available models today can be run on just a CPU. Phi-3.5 and Mistral 7B also have GPU versions available for better performance.

Once complete, the deployment can be managed directly in Azure ML Studio, and an inferencing endpoint is available on your local network.

Wrap up

Sign up now to join either of the previews at the link below or stop by and visit us in person in the Azure Arc and Azure Local Expert Meet Up station in the Azure Infrastructure neighborhood at Ignite. We’re excited to get these new capabilities into our customers’ hands and hear from you how it’s going.

Updated May 12, 2025

Version 4.0

adaptive cloud

azure arc

hybrid

microsoft ignite 2024

Derek_Bogardus

Microsoft

Joined March 09, 2022

View Profile

Azure Arc Blog

Follow this blog board to get notified when there's new activity