AKS Arc - Optimized for AI Workloads

Microsoft

Nov 19, 2024

Overview

Azure is the world’s AI supercomputer providing the most comprehensive AI capabilities ranging from infrastructure, platform services to frontier models. We’ve seen emerging needs among Azure customers to use the same Azure-based solution for AI/ML on the edge with minimized latencies while staying compliant with industry regulation or government requirement.

Azure Kubernetes Service enabled by Azure Arc (AKS Arc) is a managed Kubernetes service that empowers customers to deploy and manage containerized workload whether they are in data centers or at edge locations. We want to ensure AKS Arc provides optimal experience for AI/ML workload on the edge, throughout the whole development lifecycle from AI infrastructure, Model deployment, Inference, Fine-tuning, and Application.

AI infrastructure

AKS Arc supports Nvidia A2, A16, and T4 for compute-intensive workload such as machine learning, deep learning, model training. When GPUs are enabled in Azure Local; AKS Arc customers can provision GPU node pools from Azure and host AI/ML workload in the Kubernetes cluster on the edge. For more details, please visit instructions from GPU Nodepool in AKS Arc.

Model deployment and fine tuning

Use KAITO for language model deployment, inference and fine tuning

Kubernetes AI Toolchain Operator (KAITO) is an open-source operator that automates and simplifies the management of model deployments on a Kubernetes cluster. With KAITO, you can deploy popular open-source language models such as Phi-3 and Falcon, and host them in the cloud or on the edge. Along with the currently supported models from KAITO, you can also onboard and deploy custom language models following this guidance in just a few steps.

AKS Arc has been validated with the latest KAITO operator via helm-based installation, and customers can now use KAITO in the edge to:

Deploy language models such as Falcon, Phi-3, or their custom models
Automate and optimize AI/ML model inferencing for cost-effective deployments,
Fine-tune a model directly in a Kubernetes cluster,
Perform parameter efficient fine tuning using low-rank adaptation (LoRA)
Perform parameter efficient fine tuning using quantized adaptation (QLoRA)

You can get started by installing KAITO and deploying a model for inference on your edge GPU nodes with KAITO Quickstart Guidance. You may also refer to KAITO experience in AKS in cloud: Deploy an AI model with the AI toolchain operator (Preview)

Use Arc-enabled Machine Learning to train and deploy models in the edge

For customers who are already familiar with Azure Machine Learning (AML), Azure Arc-enabled ML extends AML in Azure and enables customers to target any Arc enabled Kubernetes cluster for model training, evaluation and inferencing. With Arc ML extension running in AKS Arc, customers can meet data-residency requirements by storing data on premises during model training and deploy models in the cloud for global service access. To get started with Arc ML extension, please view instructions from Azure Machine Learning document .

In addition, AML extension can now be used for a fully automated deployment of a curated list of pre-validated language and traditional AI models to AKS clusters, perform CPU and GPU-based inferencing, and subsequently manage them via Azure ML Studio. This experience is currently in gated preview, please view another Ignite blog for more details.

Use Azure AI Services with disconnected container in the edge

Azure AI services enable customers to rapidly create cutting-edge AI applications with out-of-the-box and customizable APIs and models. It simplified the developer experience to use APIs and embed the ability to see, hear, speak, search, understand and accelerate decision-making into the application. With disconnected Azure AI service containers, customers can now download the container to an offline environment such as AKS Arc and use the same APIs available from Azure. Containers enable you to run Azure AI services APIs in your own environment and are great for your specific security and data governance requirements. Disconnected containers enable you to use several of these APIs disconnected from the internet. Currently, the following containers can be run in this manner:

Speech to text
Custom Speech to text
Neural Text to speech
Text Translation (Standard)
Azure AI Vision - Read
Document Intelligence
Azure AI Language
- Sentiment Analysis
- Key Phrase Extraction
- Language Detection
- Summarization
- Named Entity Recognition
- Personally Identifiable Information (PII) detection

To get started with disconnected container, please view instructions at Use Docker containers in disconnected environments .

Build and deploy data and machine learning pipelines with Flyte

Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is a Kubernetes native workflow automation tool. Customers can focus on experimentation and providing business value without being an expert in infrastructure and resource management. Data scientists and ML engineers can use Flyte to create data pipelines for processing petabyte-scale data, building analytics workflow for business or finance, or leveraging it as ML pipeline for industry applications.

AKS Arc has been validated with the latest Flyte operator via helm-based installation, customers are welcome to use Flyte for building data or ML pipelines. For more information, please view instructions from Introduction to Flyte - Flyte and Build and deploy data and machine learning pipelines with Flyte on Azure Kubernetes Service (AKS).

AI-powered edge applications with cloud-connected control plane

Azure AI Video Indexer, enabled by Azure Arc

Azure AI Video Indexer enabled by Arc enables video and audio analysis, generative AI on edge devices. It runs as Azure Arc extension on AKS Arc and supports many video formats including MP4 and other common formats. It also supports several languages in all basic audio-related models. The Phi 3 language model is included and automatically connected with your Video Indexer extension. With Arc enabled VI, you can bring AI to the content for cases when indexed content can’t move to the cloud due to regulation or data store being too large. Other use cases include using on-premises workflow to lower the indexing duration latency or pre-indexing before uploading to the cloud. You can find more details from What is Azure AI Video Indexer enabled by Arc (Preview)

Search on-premises data with a language model via Arc extension

Retrieval Augmented Generation (RAG) is emerging to augment language models with private data, and this is especially important for enterprise use cases. Cloud services like Azure AI Search and Azure AI Studio simplify how customers can use RAG to ground language models in their enterprise data in cloud. The same experience is coming to the edge and now customers can deploy an Arc extension and ask questions about on-premises data within a few clicks. Please note this experience is currently in gated preview and please see another Ignite blog for more details.

Conclusion

Developing and running AI workload at distributed edges brings clear benefits such as using cloud as universal control plane, data residency, reduced network bandwidth, and low latency. We hope the products and features we developed above can benefit and enable new scenarios in Retail, Manufacturing, Logistics, Energy, and more. As Microsoft-managed Kubernetes on the edge, AKS Arc not only can host critical edge applications but also optimized for AI workload from hardware, runtime to application. Please share your valuable feedback with us (aksarcfeedback@microsoft.com) and we would love to hear from you regarding your scenarios and business impact.

Updated Nov 19, 2024

Version 1.0

microsoft ignite 2024

unified operations

haojiehang

Microsoft

Joined August 24, 2022

View Profile