azure ai foundry

161 Topics

Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning
Today we are introducing Phi-4, our 14B parameter state-of-the-art small language model (SLM) that excels at complex reasoning in areas such as math, in addition to conventional language processing. Phi-4 is the latest member of our Phi family of small language models and demonstrates what’s possible as we continue to probe the boundaries of SLMs. Phi-4 is available on Azure AI Foundry and on Hugging Face. Phi-4 Benchmarks Phi-4 outperforms comparable and larger models on math related reasoning due to advancements throughout the processes, including the use of high-quality synthetic datasets, curation of high-quality organic data, and post-training innovations. Phi-4 continues to push the frontier of size vs quality. Phi-4 is particularly good at math problems, for example here are the benchmarks for Phi-4 on math competition problems: Phi-4 performance on math competition problems To see more benchmarks read the newest technical paper released on arxiv. Enabling AI innovation safely and responsibly Building AI solutions responsibly is at the core of AI development at Microsoft. We have made our robust responsible AI capabilities available to customers building with Phi models, including Phi-3.5-mini optimized for Windows Copilot+ PCs. Azure AI Foundry provides users with a robust set of capabilities to help organizations measure, mitigate, and manage AI risks across the AI development lifecycle for traditional machine learning and generative AI applications. Azure AI evaluations in AI Foundry enable developers to iteratively assess the quality and safety of models and applications using built-in and custom metrics to inform mitigations. Additionally, Phi users can use Azure AI Content Safety features such as prompt shields, protected material detection, and groundedness detection. These capabilities can be leveraged as content filters with any language model included in our model catalog and developers can integrate these capabilities into their application easily through a single API. Once in production, developers can monitor their application for quality and safety, adversarial prompt attacks, and data integrity, making timely interventions with the help of real-time alerts. Phi-4 in action One example of the mathematical reasoning Phi-4 is capable of is demonstrated in this problem. Start Exploring Phi-4 is currently available on Azure AI Foundry and Hugging Face, take a look today.
ecekamar
Dec 13, 2024 Place Microsoft Foundry Blog
235KViews
20likes
22Comments
Integrate Custom Azure AI Agents with CoPilot Studio and M365 CoPilot
Integrating Custom Agents with Copilot Studio and M365 Copilot In today's fast-paced digital world, integrating custom agents with Copilot Studio and M365 Copilot can significantly enhance your company's digital presence and extend your CoPilot platform to your enterprise applications and data. This blog will guide you through the integration steps of bringing your custom Azure AI Agent Service within an Azure Function App, into a Copilot Studio solution and publishing it to M365 and Teams Applications. When Might This Be Necessary: Integrating custom agents with Copilot Studio and M365 Copilot is necessary when you want to extend customization to automate tasks, streamline processes, and provide better user experience for your end-users. This integration is particularly useful for organizations looking to streamline their AI Platform, extend out-of-the-box functionality, and leverage existing enterprise data and applications to optimize their operations. Custom agents built on Azure allow you to achieve greater customization and flexibility than using Copilot Studio agents alone. What You Will Need: To get started, you will need the following: Azure AI Foundry Azure OpenAI Service Copilot Studio Developer License Microsoft Teams Enterprise License M365 Copilot License Steps to Integrate Custom Agents: Create a Project in Azure AI Foundry: Navigate to Azure AI Foundry and create a project. Select 'Agents' from the 'Build and Customize' menu pane on the left side of the screen and click the blue button to create a new agent. Customize Your Agent: Your agent will automatically be assigned an Agent ID. Give your agent a name and assign the model your agent will use. Customize your agent with instructions: Add your knowledge source: You can connect to Azure AI Search, load files directly to your agent, link to Microsoft Fabric, or connect to third-party sources like Tripadvisor. In our example, we are only testing the CoPilot integration steps of the AI Agent, so we did not build out additional options of providing grounding knowledge or function calling here. Test Your Agent: Once you have created your agent, test it in the playground. If you are happy with it, you are ready to call the agent in an Azure Function. Create and Publish an Azure Function: Use the sample function code from the GitHub repository to call the Azure AI Project and Agent. Publish your Azure Function to make it available for integration. azure-ai-foundry-agent/function_app.py at main · azure-data-ai-hub/azure-ai-foundry-agent Connect your AI Agent to your Function: update the "AIProjectConnString" value to include your Project connection string from the project overview page of in the AI Foundry. Role Based Access Controls: We have to add a role for the function app on OpenAI service. Role-based access control for Azure OpenAI - Azure AI services | Microsoft Learn Enable Managed Identity on the Function App Grant "Cognitive Services OpenAI Contributor" role to the System-assigned managed identity to the Function App in the Azure OpenAI resource Grant "Azure AI Developer" role to the System-assigned managed identity for your Function App in the Azure AI Project resource from the AI Foundry Build a Flow in Power Platform: Before you begin, make sure you are working in the same environment you will use to create your CoPilot Studio agent. To get started, navigate to the Power Platform (https://make.powerapps.com) to build out a flow that connects your Copilot Studio solution to your Azure Function App. When creating a new flow, select 'Build an instant cloud flow' and trigger the flow using 'Run a flow from Copilot'. Add an HTTP action to call the Function using the URL and pass the message prompt from the end user with your URL. The output of your function is plain text, so you can pass the response from your Azure AI Agent directly to your Copilot Studio solution. Create Your Copilot Studio Agent: Navigate to Microsoft Copilot Studio and select 'Agents', then 'New Agent'. Make sure you are in the same environment you used to create your cloud flow. Now select ‘Create’ button at the top of the screen From the top menu, navigate to ‘Topics’ and ‘System’. We will open up the ‘Conversation boosting’ topic. When you first open the Conversation boosting topic, you will see a template of connected nodes. Delete all but the initial ‘Trigger’ node. Now we will rebuild the conversation boosting agent to call the Flow you built in the previous step. Select 'Add an Action' and then select the option for existing Power Automate flow. Pass the response from your Custom Agent to the end user and end the current topic. My existing Cloud Flow: Add action to connect to existing Cloud Flow: When this menu pops up, you should see the option to Run the flow you created. Here, mine does not have a very unique name, but you see my flow 'Run a flow from Copilot' as a Basic action menu item. If you do not see your cloud flow here add the flow to the default solution in the environment. Go to Solutions > select the All pill > Default Solution > then add the Cloud Flow you created to the solution. Then go back to Copilot Studio, refresh and the flow will be listed there. Now complete building out the conversation boosting topic: Make Agent Available in M365 Copilot: Navigate to the 'Channels' menu and select 'Teams + Microsoft 365'. Be sure to select the box to 'Make agent available in M365 Copilot'. Save and re-publish your Copilot Agent. It may take up to 24 hours for the Copilot Agent to appear in M365 Teams agents list. Once it has loaded, select the 'Get Agents' option from the side menu of Copilot and pin your Copilot Studio Agent to your featured agent list Now, you can chat with your custom Azure AI Agent, directly from M365 Copilot! Conclusion: By following these steps, you can successfully integrate custom Azure AI Agents with Copilot Studio and M365 Copilot, enhancing you’re the utility of your existing platform and improving operational efficiency. This integration allows you to automate tasks, streamline processes, and provide better user experience for your end-users. Give it a try! Curious of how to bring custom models from your AI Foundry to your CoPilot Studio solutions? Check out this blog
hannahabbott
Apr 16, 2025 Place Microsoft Foundry Blog
18KViews
3likes
11Comments
Unlocking Document Intelligence: Mistral OCR Now Available in Azure AI Foundry
Every organization has a treasure trove of information—buried not in databases, but in documents. From scanned contracts and handwritten forms to research papers and regulatory filings, this knowledge often sits locked in static formats, invisible to modern AI systems. Imagine if we could teach machines not just to read, but to truly understand the structure and nuance of these documents. What if equations, images, tables, and multilingual text could be seamlessly extracted, indexed, and acted upon—at scale? That future is here. Today we are announcing the launch of Mistral OCR in the Azure AI Foundry model catalog—a state-of-the-art Optical Character Recognition (OCR) model that brings intelligent document understanding to a whole new level. Designed for speed, precision, and multilingual versatility, Mistral OCR unlocks the potential of unstructured content with unmatched performance. From Patient Charts to Investment Reports—Built for Every Industry Mistral OCR’s ability to extract structure from complex documents makes it transformative across a range of verticals: Healthcare Hospitals and health systems can digitize clinical notes, lab results, and patient intake forms, transforming scanned content into structured data for downstream AI applications—improving care coordination, automation, and insights. Finance & Insurance From loan applications and KYC documents to claims forms and regulatory disclosures, Mistral OCR helps financial institutions process sensitive documents faster, more accurately, and with multilingual support—ensuring compliance and improving operational efficiency. Education & Research Academic institutions and research teams can turn PDFs of scientific papers, course materials, and diagrams into AI-readable formats. Mistral OCR’s support for equations, charts, and LaTeX-style formatting makes it ideal for scientific knowledge extraction. Legal & Government With its multilingual and high-fidelity OCR capabilities, legal teams and public agencies can digitize contracts, historical records, and filings—accelerating review workflows, preserving archival materials, and enabling transparent governance. Key Highlights of Mistral OCR According to Mistral their OCR model stands apart due to the following: State-of-the-Art Document Understanding Mistral OCR excels in parsing complex, multimodal documents—extracting tables, math, and figures with markdown-style clarity. It goes beyond recognition to deliver understanding. benchmark testing. Whether you’re working in Hindi, Arabic, French, or Chinese—this model adapts seamlessly. State-of-the-Art Document Understanding Mistral OCR excels in parsing complex, multimodal documents—extracting tables, math, and figures with markdown-style clarity. It goes beyond recognition to deliver understanding. Multilingual by Design With support for dozens of languages and scripts, Mistral OCR achieves 99%+ fuzzy match scores in benchmark testing. Whether you’re working in Hindi, Arabic, French, or Chinese—this model adapts seamlessly. Fastest in Its Class Process up to 2,000 pages per minute on a single node. This speed makes it ideal for enterprise document pipelines and real-time applications. Doc-as-Prompt + Structured Output Turn documents into intelligent prompts—then extract structured, JSON-formatted output for downstream use in agents, workflows, or analytics engines. Why use Mistral OCR on Azure AI Foundry? Mistral OCR is now available as serverless APIs through Models as a Service (MaaS) in Azure AI Foundry. This enables enterprise-scale workloads with ease. Network Isolation for Inferencing: Protect your data from public network access. Expanded Regional Availability: Access from multiple regions. Data Privacy and Security: Robust measures to ensure data protection. Quick Endpoint Provisioning: Set up an OCR endpoint in Azure AI Foundry in seconds. Azure AI ensures seamless integration, enhanced security, and rapid deployment for your AI needs. How to deploy Mistral OCR model in Azure AI Foundry? Prerequisites: If you don’t have an Azure subscription, get one here: https://azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go Familiarize yourself with Azure AI Model Catalog Create an Azure AI Foundry hub and project. Make sure you pick East US, West US3, South Central US, West US, North Central US, East US 2 or Sweden Central as the Azure region for the hub. Create a deployment to obtain the inference API and key: Open the model card in the model catalog on Azure AI Foundry. Click on Deploy and select the Pay-as-you-go option. Subscribe to the Marketplace offer and deploy. You can also review the API pricing at this step. You should land on the deployment page that shows you the API and key in less than a minute. These steps are outlined in detail in the product documentation. From Documents to Decisions The ability to extract meaning from documents—accurately, at scale, and across languages—is no longer a bottleneck. With Mistral OCR now available in Azure AI Foundry, organizations can move beyond basic text extraction to unlock true document intelligence. This isn’t just about reading documents. It’s about transforming how we interact with the knowledge they contain. Try it. Build with it. And see what becomes possible when documents speak your language.
Naomi Moneypenny
Apr 09, 2025 Place Microsoft Foundry Blog
15KViews
2likes
8Comments
Step-by-Step Tutorial: Building an AI Agent Using Azure AI Foundry
This blog post provides a comprehensive tutorial on building an AI agent using Azure AI Agent service and the Azure AI Foundry portal. AI agents represent a powerful new paradigm in application development, offering a more intuitive and dynamic way to interact with software. They can understand natural language, reason about user requests, and take actions to fulfill those requests. This tutorial will guide you through the process of creating and deploying an intelligent agent on Azure. We'll cover setting up an Azure AI Foundry hub, crafting effective instructions to define the agent's behavior, including recognizing user intent, processing requests, and generating helpful responses. We'll also discuss testing the agent's conversational abilities and provide additional resources for expanding your knowledge of AI agents and the Azure AI ecosystem. This hands-on guide is perfect for anyone looking to explore the practical application of Azure's conversational AI capabilities and build intelligent virtual assistants. Join us as we dive into the exciting world of AI agents.
ShivamGoyal03
Feb 27, 2025 Place Educator Developer Blog
14KViews
2likes
2Comments
The Future of AI: Computer Use Agents Have Arrived
Discover the groundbreaking advancements in AI with Computer Use Agents (CUAs). In this blog, Marco Casalaina shares how to use the Responses API from Azure OpenAI Service, showcasing how CUAs can launch apps, navigate websites, and reason through tasks. Learn how CUAs utilize multimodal models for computer vision and AI frameworks to enhance automation. Explore the differences between CUAs and traditional Robotic Process Automation (RPA), and understand how CUAs can complement RPA systems. Dive into the future of automation and see how CUAs are set to revolutionize the way we interact with technology.
Marco_Casalaina
Apr 07, 2025 Place Microsoft Foundry Blog
11KViews
6likes
0Comments
Deepening our Partnership with Mistral AI on Azure AI Foundry
We’re excited to mark a new chapter in our collaboration with Mistral AI, a leading European AI innovator, with the launch of Mistral Document AI in Azure AI Foundry Models. This marks the first in a series of Mistral models coming to Azure as a serverless API, giving customers seamless access to Mistral’s cutting-edge capabilities, fully hosted, managed, and integrated into the Foundry ecosystem. This launch also deepens our support for sovereign cloud customers —especially in Europe. At Microsoft, we believe Sovereign AI is essential for enabling organizations and regulated industries to harness the full potential of AI while maintaining control over their security, data, and governance. As Satya Nadella has said, “We want every country, every organization, to build AI in a way that respects their sovereignty—of data, of applications, and of infrastructure.” By combining Mistral’s state-of-the-art models with Azure’s enterprise-grade reliability and scale we’re enabling customers to confidently deploy AI that meets strict regulatory and data sovereignty requirements. Mistral Document AI By the Mistral AI Team “Enterprises today are overwhelmed with documents—contracts, forms, research papers, invoices—holding critical information that’s often trapped in scanned images and PDFs. With nearly 90% of enterprise data stored in unstructured formats, traditional OCR simply can’t keep up. Mistral Document AI is built with a multimodal approach that combines vision and language understanding, it interprets documents with contextual intelligence and delivers structured outputs that reflect the original layout—tables remain tables, headings remain headings, and images are preserved alongside the text.” Key Capabilities Document Parsing: Mistral Document AI interprets complex layouts and extracts rich structures such as tables, charts, and LaTeX-formatted equations with markdown-style clarity. Multilingual & Multimodal: The model supports dozens of languages and understands both text and visual elements, making it well-suited for global, diverse datasets. Structured Output & Doc-as-Prompt: Mistral Document AI delivers results in structured formats like JSON, enabling easy downstream integration with databases or AI agents. This supports use cases like Retrieval-Augmented Generation (RAG), where document content becomes a prompt for subsequent queries. Use Cases Document Digitization: Process archives of scanned PDFs or handwritten forms into structured digital records. Knowledge Extraction: Transform research papers, technical manuals, or customer guides into machine-readable formats. RAG pipelines and Intelligent Agents: Integrate structured output into pipelines that feed AI systems for Q&A, summarization, and more. Mistral Document AI on Azure AI Foundry You can now access Mistral Document AI’s capabilities through Azure AI Foundry as a serverless Azure model, sold directly from Microsoft. One-Click Deployment (Serverless) – With a few clicks, you can deploy the model as a serverless REST API, without needing to provision any GPU machines or container hosts. This makes it easy to get started. Enterprise-Grade Security & Privacy – Because the model runs within your Azure environment, you get network isolation and data security out of the box. All inferencing happens in Azure’s cloud under your account, so your documents aren’t sent to a third-party server. Azure AI Foundry ensures your data stays private (no data leaves the Azure region you choose) and offers compliance with enterprise security standards. This is critical for sensitive use cases like banking or healthcare documents. Integrated Responsible AI Capabilities – With Mistral Doc AI running in Azure AI Foundry, you can apply Azure’s built-in Responsible AI tools—such as content filtering, safety system monitoring, and evaluation frameworks—to ensure your deployments align with your organization’s ethical and compliance standards. Observability & Monitoring – Foundry’s monitoring features give you full visibility into model usage, performance, and cost. You can track API calls, latency, and error rates, enabling proactive troubleshooting and optimization. Agent Services Enablement – You can connect Mistral Document AI to Azure AI Agent Service, enabling intelligent agents to process, reason over, and act on extracted document data—unlocking new automation and decision-making scenarios. Azure Ecosystem Integration – Once deployed, the Mistral Document AI endpoint can easily plug into your existing Azure workflows. And because it’s part of Foundry, you can manage it alongside other models in a unified way. This interoperability accelerates the development of intelligent applications. Getting Started: Deploying and Using Mistral Document AI on Azure Setting up Mistral Document AI on Azure AI Foundry is straightforward. Here’s a quick guide to get you up and running: Create an Azure AI Foundry workspace – Ensure you have an Azure subscription (pay-as-you-go, not a free trial) and create an AI Foundry hub and project in the Azure portal Deploy the Mistral Document AI model – In the Azure AI Foundry Model Catalog, search for “mistral-document-ai-2505”. Then click the Deploy button. You’ll be prompted to select a pricing plan – choose deploy. Call the Mistral Document AI API – Once deployed, using the model is as easy as calling a REST API. You can do this from any programming language or even a command-line tool like cURL. Integrate and iterate – With the OCR results in hand, you can integrate Mistral Document AI into your workflows. Conclusion Mistral Document AI joins Azure AI Foundry as one of the several tools available to help organizations unlock insights from unstructured documents. This launch reflects our continued commitment to bringing the latest, most capable models into Foundry, giving developers and enterprises more choice than ever. Whether you’re digitizing records, building knowledge bases, or enhancing your AI workflows, Azure AI Foundry offers powerful and accessible solutions. Pricing Model Name Pricing /1K pages mistral-document-ai-2505 Global $3 mistral-document-ai-2505 DataZone $3.3 Mistral OCR Global $1 Resources Explore Mistral Document AI MS Learn Github Code Samples
Naomi Moneypenny
Aug 15, 2025 Place Microsoft Foundry Blog
10KViews
3likes
3Comments
Step-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.
suzarilshah
Mar 06, 2025 Place Educator Developer Blog
10KViews
1like
4Comments
Introducing Microsoft Agent Factory
Microsoft Agent Factory is a new program designed for organizations that want to move from experimentation to execution faster. With a single plan, organizations can build agents with Work IQ, Fabric IQ, and Foundry IQ using Microsoft Foundry and Copilot Studio. They can also deploy their agents anywhere, including Microsoft 365 Copilot, with no upfront licensing and provisioning required. Eligible organizations can also tap into hands-on engagement from top AI Forward Deployed Engineers (FDEs) and access tailored role-based training to boost AI fluency across teams.
CyrilBelikoff
Nov 18, 2025 Place Microsoft Foundry Blog
9.8KViews
6likes
0Comments
Distillation: Turning Smaller Models into High-Performance, Cost-Effective Solutions
by Vishal Yadav, Nikhil Pandey Introduction Large Language Models (LLMs) have transformed the landscape of natural language processing (NLP) with their ability to understand and generate human-like text. However, their size and complexity often pose challenges in terms of deployment, speed, and cost. Usually for specialized niche tasks, we end up deploying the best available model even though we don’t utilize all its capabilities. This is where distillation comes in, offering a method to create (fine-tune) smaller, customized, more efficient models, while retaining much of the performance of a significantly larger state-of-the-art model. What is distillation? Distillation is a technique designed to transfer knowledge of a large pre-trained model (the "teacher") into a smaller model (the "student"), enabling the student model to achieve comparable performance to the teacher model. This technique allows users to leverage the high quality of larger LLMs, while reducing inference costs in a production environment, thanks to the smaller student model. How distillation works? In distillation, knowledge can be transferred from teacher to student model in several ways. Here, we specifically discuss response-based, offline distillation, where the student model learns to mimic the output (only predictions) of the teacher model, and the teacher model is not trained during distillation. Teacher Model: A large, high-capacity teacher model that is already pre-trained on massive datasets. This model has learnt rich representations and complex patterns from the data which allows it to generalize well even on unseen tasks. Knowledge Extraction: The teacher model generates outputs based on given inputs, which are then used as training data for the student model. This involves not just mimicking outputs but also understanding the underlying reasoning processes. Student Model Training: A smaller student model is trained using the extracted knowledge as a guide. The student model learns to mimic the teacher model's behavior and predictions on specific tasks. Advantages Reduced Size: The resulting student model is significantly smaller, making it easier to deploy in resource-constrained environments. Lower Cost: Running smaller models incurs lower operational costs while maintaining competitive performance levels. Task-Specific Optimization: Distillation can be tailored for specific applications, enhancing efficiency and accuracy. Performance: Smaller models exhibit significantly lower latency compared to larger models, which in turn boosts the throughput of the deployment. Customization: Distillation allows users to select desirable traits from multiple larger models and transfer them to smaller models. Personalization: Personality traits can be incorporated into the model, enabling it to respond with relevant answers when queried about its personality. Synthetic Data Generation: At scale data generation can be done either only for labels or from scratch using just seed/meta data. Generalization: Distillation can help student models generalize better by learning from the teacher model's knowledge and avoiding overfitting. Improved Multilingual Capabilities: The multilingual performance of smaller models can be significantly enhanced with the help of teacher models making them suitable for global applications. Distillation in Azure AI Foundry Distillation as a Service is now supported on Azure allowing a variety of task types and more to be added soon. Following tasks are supported. Summarization: Given a document (article) to summarize, generate an entity-dense summary of the document. Conversational Assistant: Generate AI assistant responses on single-turn and multi-turn conversational datasets. To generate each response, the available chat history and the current user prompt are utilized. Natural Language Understanding (NLU) o MATH: Generate numeric answers to math problems. o Natural Language Inference (NLI): Given premise and hypothesis, determine if premise entails the hypothesis, or contradicts the hypothesis, or is neutral i.e. neither entails not contradicts the hypothesis. o Multiple-Choice Question Answering: Given question and answer choices, determine the correct answer choice. Distillation Process Overview of the two-step distillation process: (1) Generate synthetic data using a task-specific, elaborate prompt (2) Train (and infer from) the student model using a shorter prompt (Figure source: https://arxiv.org/pdf/2410.18588) The distillation process involves two main steps: generate high quality synthetic data (labels) using the teacher model, followed by instruction-based finetuning of the student model. Data Generation High-quality data generation is crucial for the student model's performance. Azure provides a proprietary library of advanced prompts, to generate high-quality synthetic data for all supported tasks, utilizing techniques such as Chain of Thought (CoT) or Chain of Density (CoD), and other best practices. This option can be enabled by passing the `enable_chain_of_thought` parameter while invoking the distillation pipeline, ensuring reasoning-based answers and consequently high-quality data for distillation. Instruction Fine-Tuning The next step is to fine-tune the smaller model using the task-specific generated data. This involves using a concise, task-specific prompt and training with the input and generated output (excluding reasoning steps). These innovations ensure significant performance gains for a given task while minimizing the cost (number of tokens) for the user. When using user-provided prompts, the same prompt is applied in both data generation and fine-tuning. Distillation Code Snippet Distillation is supported by the Azure SDK and CLI. Support for this was added in version 1.22.0 of azure-ai-ml. Ensure that the azure-ai-ml package is >= 1.22.0 before using the code snippet below. Model Offerings Teacher Models Currently Meta Llama 3.1 405B Instruct is supported as the teacher model for distillation. Student Models Currently Meta Llama 3.1 8B Instruct is supported as the student model for distillation. Soon all Microsoft’s Phi 3 and 3.5 Instruct series models will also be available for distillation. The following table demonstrates our current and upcoming student model offerings. Student Model Region Availability Meta Llama 3.1 8B Instruct West US 3 Available Phi 3/3.5 Instruct East US 2 Coming Soon At the time of this writing, fine-tuning of Meta Llama 3.1 Instruct series of models, and deployment of such fine-tuned models, is only available in West US 3 region. Whereas fine-tuning of Microsoft’s Phi 3 Instruct series of models, and deployment of such fine-tuned models, is only available in East US 2 region. Ensure your AI Foundry project is setup in the appropriate region for your selected student model. Notebooks Distilling Large Language Models for NLI Tasks: A Practical Guide Notebook - Distillation with Large Language Models This notebook provides a comprehensive guide on how to distil a large teacher model into a smaller student model, specifically for Natural Language Inference (NLI) tasks. It uses the Meta Llama 3.1 405B Instruct as the teacher and the Meta Llama 3.1 8B Instruct as the student model. Key Highlights Teacher and Student Models: The process uses Meta Llama 3.1 405B Instruct as the teacher model and Meta Llama 3.1 8B Instruct as the student model. Prerequisites: Ensure you have subscribed to the required models and set up an AI Foundry project in the West US 3 region for distillation of a Meta Llama 3.1 8B Instruct student model. SDK Installation: Install necessary SDKs such as azure-ai-ml, azure-identity, and mlflow. Dataset Preparation: Use the ConjNLI dataset from Hugging Face for training and validation. Distillation Job: Configure and run the distillation job to transfer knowledge from the teacher to the student model. Deployment: Optionally, deploy the distilled model to a serverless endpoint and perform sample inferences. This notebook simplifies the complex task of model distillation, making it accessible even to those new to NLP and model training. Results Using the ConjNLI dataset and Chain-Of-Thought (CoT) distillation, we obtain the following accuracy (%) metrics. Dataset Student Model Teacher (405B) with CoT Prompting Student with CoT Prompting Student Distilled on CoT-prompted Teacher Output ConjNLI (dev) Meta Llama 3.1 8B Instruct 69.98 52.81 63.88 ConjNLI (dev) Phi 3 Mini 128k Instruct 69.98 43.98 57.78 Distillation with the Meta Llama 3.1 8B Instruct and Phi 3 Mini 128k Instruct student models provides approximately 21% and 31% improvement respectively over directly prompting the student model using CoT prompt. For detailed results on other datasets and tasks, we refer the user to check the published results in our knowledge distillation paper. Conclusion Distillation represents a significant step forward in development and deployment of LLM/SLM at scale. By transferring the knowledge from a large pre-trained model (teacher) to a smaller, more efficient model (student), distillation offers a practical solution to the challenges of deploying large models, such as high costs and complexity. This technique not only reduces model size and operational costs but also enhances the performance of student models for specific tasks. The support for distillation on Azure AI Foundry further simplifies the process, making it accessible for various applications, such as summarization, conversational assistance, and natural language understanding tasks. Furthermore, the detailed, hands-on example notebooks provided in Azure Github can help facilitate easier adoption. In summary, distillation not only bridges the gap between generalist understanding and specialized application but also makes the way for a more sustainable and practical approach to leveraging LLMs in real-world scenarios.
nikhilpandey
Dec 06, 2024 Place Microsoft Foundry Blog
8.8KViews
2likes
1Comment
Azure AI Foundry Models: Futureproof Your GenAI Applications
Years of Rapid Growth and Innovation The Azure AI Foundry Models journey started with the launch of Models as a Service (MaaS) in partnership with Meta Llama at Ignite 2023. Since then, we’ve rapidly expanded our catalog and capabilities: 2023: General Availability of the model catalog and launch of MaaS 2024: 1800+ models available including Cohere, Mistral, Meta, G42, AI21, Nixtla and more, with 250+ OSS models deployed on managed compute 2025 (Build): 10000+ models, new models sold directly by Microsoft, more managed compute models and expanded partnerships, introduction of advanced tooling like Model Leaderboard, Model Router, MCP Server, and Image Playground GenAI Trends Reshaping the Model Landscape To stay ahead of the curve, Azure AI Foundry Models is designed to support the most important trends in GenAI: Emergence of Reasoning-Centric Models Proliferation of Agentic AI and Multi-agent systems Expansion of Open-Source Ecosystems Multimodal Intelligence Becoming Mainstream Rise of Small, Efficient Models (SLMs) These trends are shaping a future where enterprises need not just access to models—but smart tools to pick, combine, and deploy the best ones for each task. A Platform Built for Flexibility and Scale Azure AI Foundry is more than a catalog—it’s your end-to-end platform for building with AI. You can: Explore over 10000+ models, including foundation, industry, multimodal, and reasoning models along with agents. Deploy using flexible options like PayGo, Managed Compute, or Provisioned Throughput (PTU) Monitor and optimize performance with integrated observability and compliance tooling Whether you're prototyping or scaling globally, Foundry gives you the flexibility you need. Two Core Model Categories 1. Models Sold Directly by Microsoft These models are hosted and billed directly by Microsoft under Microsoft Product Terms. They offer: Enterprise-grade SLAs and reliability Deep Azure service integration Responsible AI standards Flexible usage of reserved quota by using Azure AI Foundry Provisioned Throughput (PTU) across direct models including OpenAI, Meta, Mistral, Grok, DeepSeek and Black Forest Labs. Reduce AI workload costs on predictable consumption patterns with Azure AI Foundry Provisioned Throughput reservations. Learn more here Coming to the family of direct models from Azure: Grok 3 / Grok 3 Mini (from xAI) Flux Pro 1.1 Ultra (from Black Forest Labs) Llama 4 Scout & Maverick (from Meta) Codestral 2501, OCR (from Mistral) 2. Models from Partners & Community These models come from the broader ecosystem, including open-source and monetized partners. They are deployed as Managed Compute or Standard PayGo, and include models from Cohere, Paige and Saifr. We also have new industry models joining this ecosystem of partner and community models NVIDIA NIMs: ProteinMPNN, RFDiffusion, OpenFold2, MSA Paige AI: Virchow 2G, Virchow 2G-mini Microsoft Research: EvoDiff, BioEmu-1 Expanded capabilities that make model choice simpler and faster Azure AI Foundry Models isn’t just about more models. We’re introducing tools to help developers intelligently navigate model complexity: 1. Model Leaderboard Easily compare model performance across real-world tasks with: Transparent benchmark scores Task-specific rankings (summarization, RAG, classification, etc.) Live updates as new models are evaluated Whether you want the highest accuracy, fastest throughput, or best price-performance ratio—the leaderboard guides your selection. 2. Model Router Don’t pick just one—let Azure do the heavy lifting. Automatically route queries to the best available model Optimize based on speed, cost, or quality Supports dynamic fallback and load balancing This capability is a game-changer for agents, copilots, and apps that need adaptive intelligence. 3. Image/Video Playground A new visual interface for: Testing image generation models side-by-side Tuning prompts and decoding settings Evaluating output quality interactively This is particularly useful for multimodal experimentation across marketing, design, and research use cases. 4. MCP Server Enables model-aware orchestration, especially for agentic workloads: Tool use integration Multi-model planning and reasoning Unified coordination across model APIs A Futureproof Foundation With Azure AI Foundry Models, you're not just selecting from a list of models—you’re stepping into a full-stack, flexible, and future-ready AI environment: Choose the best model for your needs Deploy on your terms—serverless, managed, or reserved Rely on enterprise-grade performance, security, and governance Stay ahead with integrated innovation from Microsoft and the broader ecosystem The AI future isn’t one-size-fits-all—and neither is Azure AI Foundry. Explore Today : Azure AI Foundry
Naomi Moneypenny
May 19, 2025 Place Microsoft Foundry Blog
8KViews
0likes
0Comments