azure ai foundry
139 TopicsThe Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.74Views0likes0CommentsMulti-agent Workflow with Human Approval using Agent Framework
In modern AI-driven workflows, balancing automation with human oversight is critical, especially for high-stakes decisions. The solution scripts from this article demonstrate an effective approach to orchestrate Agent Framework-powered sequential workflow involving persistent Azure AI Foundry agents, integrating human approval for critical actions. By leveraging workflow checkpointing, the system ensures state persistence, allowing workflows to pause for human decisions and seamlessly resume from the last checkpoint once approval is granted. This article explores the implementation details of the script, including its agent setup, human approval simulation and state management, providing a robust framework for building AI workflows with human-in-the-loop control. Scenario In this workflow example, three specialised agents collaborate sequentially to manage industrial sensor data with human-in-the-loop control. The Data Analyser Agent initiates the process by collecting pipeline sensor readings (pressure, temperature, flow rate), detecting anomalies and summarising findings in a structured format. These results are passed to the Risk Assessor Agent V2, which evaluates severity based on predefined thresholds and determines appropriate actions - such as scheduling maintenance or initiating an immediate shutdown. For high-impact decisions, the workflow pauses at a checkpoint, saving its state and requesting human approval before proceeding. Once approval is received, the Maintenance Scheduler Agent resumes the process, assigning tasks to relevant teams and confirming execution steps. This design ensures automation efficiency while maintaining critical human oversight for safety and compliance. The workflow pauses when human approval is required and triggers an entry in a simulated external system for decision capture. Once the external system records an approve or reject decision, the workflow resumes, verifies the approval status in that system and then executes critical actions such as scheduling maintenance or shutting down equipment. Code Overview Setting up Azure Resources and Local Environment Set up the Azure AI Foundry resource along with a large language model. Retrieve the endpoint details for the Azure AI Foundry Project and the name of the deployed model, then update the relevant variables in your environment file accordingly. AZURE_AI_PROJECT_ENDPOINT = "<AZURE AI FOUNDRY PROJECT ENDPOINT>" AZURE_AI_MODEL_DEPLOYMENT_NAME = "<gpt-4o>" Proceed with the installation of the requirements.txt file. Creating Persistent Agents Azure AI Foundry persistent agents used in the workflow are created using AIProjectClient class from Azure AI Foundry SDK. It is implicitly installed as part of Agent Framework SDK (Python) as listed in requirements.txt file. Once agents are created, they can be viewed in the Azure AI Foundry portal under Agents blade. You may observe that the tools are not yet associated with any agents. This association will be created during the process of creating agent instances for workflows. Creating agents in AI Foundry allows you to utilise and share them across multiple workflows and ensures they are accessible in a centralised location. After agents are created, each is assigned an ID that can be accessed through the agent's `id` property or found in the Azure AI Foundry portal. These IDs will be required in next script to use the agents as part of Agent Framework orchestrated workflow. Other important point to note is that Maintenance Scheduler Agent (v2) has special instructions that [Schedule Maintenance] and [Immediate Shutdown] actions require human approval result [APPROVED]. This agent can retrieve approval status when required as per Risk Assessor Agent (v2) and Maintenance Scheduler Agent (v2) adds [PENDING] keyword in messages if approval is required but not yet granted. Reference Script: A03_Create_Multiple_Foundry_Agent_Persistent.py Creating Sequential Workflow Necessary agents for the workflow are setup using ChatAgent class from Agent Framework, this will invoke the Azure AI Foundry agents created earlier. The local agent tools are attached to the ChatAgent instance. There are number of tools made available to the agents which simulate actions to support the workflow demonstration. Agent Tools get_data: Get the data (temperature, pressure, flow rate) for a given pump in JSON format. schedule_maintenance: Scheduling maintenance for the given equipment, returns maintenance request number. send_shutdown_equipment_notification: Send notification for shutting down the given equipment and notifying relevant teams, returns notification ID. send_approval_rejection_notification: Send notification that the requested action was rejected by human approver. request_human_approval: equest human approval for critical actions via external system. get_human_approval_status: Check the human approval status from the external JSON file when the workflow is resumed. Section of the code below builds initiates in memory checkpoint storage and builds a sequential workflow. Checkpoints is a feature of Agent Framework which allows you to save the state of a workflow at specific points during its execution and resume from those points later. This example utilises this feature for enabling a long-running workflow where you want to pause and resume execution at a later time. The following code starts a new workflow if no checkpoint file exists or resumes from the last checkpoint if the file is found. When implementing this, you can assign a unique suffix such as the workflow ID to the checkpoint file to make sure the correct checkpoint is created and identified when resuming the workflow. The code below checks if human approval was requested by checking keyword [PENDING] in the chat messages, as mentioned above. The presence of this keyword is a deciding factor in checkpoint requires saving or it does not. The simulation of creating an entry into an external approval system is achieved by approval_db.json file (This can be an API call to relevant system / database). This file gets generated with status [PENDING] when human approval is required. The Maintenance Scheduler Agent (v2) will not action the tasks when workflow is re-run unless the approval is provided by manually updating the status to [APPROVED] in this file. The workflow when re-run, will resume from required step instead of starting from very first step when approval is PENDING / APPROVED / REJECTED. Reference Script: W04_Sequential_Workflow_Human_Approval.py GitHub: View code sample114Views0likes0CommentsReasoning Effort for Foundry Agents
I am currently using the Azure AI Foundry Agents API and noticed that unlike the base completions endpoint, there is no option to specify the "Reasoning Effort" parameter. Could you please confirm if this feature is supported in the Agents API? If not yet supported, are there any plans to introduce Reasoning Effort control for the Agents API in future releases?19Views0likes1CommentAnnouncing Public Preview: AI Toolkit for GitHub Copilot Prompt-First Agent Development
This week at GitHub Universe, we’re announcing the Public Preview of the GitHub Copilot prompt-first agent development in the AI Toolkit for Visual Studio Code. With this release, building powerful AI agents is now simpler and faster - no need to wrestle with complex frameworks or orchestrators. Just start with natural language prompts and let GitHub Copilot guide you from concept to working agent code. Accelerate Agent Development in VS Code The AI Toolkit embeds agent development workflows directly into Visual Studio Code and GitHub Copilot, enabling you to transform ideas into production-ready agents within minutes. This unified experience empowers developers and product teams to: Select the best model for your agent scenario Build and orchestrate agents using Microsoft Agent Framework Trace agent behaviors Evaluate agent response quality Select the best model for your scenario Models are the foundation for building powerful agents. Using the AI Toolkit, you can already explore and experiment with a wide range of local and remote models. Copilot now recommends models tailored to your agent’s needs, helping you make informed choices quickly. Build and orchestrate agents Whether you’re creating a single agent or designing a multi-agent workflow, Copilot leverages the latest Microsoft Agent Framework to generate robust agent code. You can initiate agent creation with simple prompts and visualize workflows for greater clarity and control. Create a single agent using Copilot Create a multi-agent workflow using Copilot and visualize workflow execution Trace agent behaviors As agents become more sophisticated, understanding their actions is crucial. The AI Toolkit enables tracing via Copilot, collecting local traces and displaying detailed agent calls, all within VS Code. Evaluate agent response quality Copilot guides you through structured evaluation, recommending metrics and generating test datasets. Integrate evaluations into your CI/CD pipeline for continuous quality assurance and confident deployments. Get started and share feedback This release marks a significant step toward making AI agent development easier and more accessible in Visual Studio Code. Try out the AI Toolkit for Visual Studio Code, share your thoughts, and file issues and suggest features on our GitHub repo. Thank you for being a part of this journey with us!The Future of AI: From Noise to Insight - An AI Agent for Customer Feedback
This post explores how Microsoft’s AI Futures team built a multi-agent system to transform scattered customer feedback into actionable insights. The solution aggregates feedback from multiple channels, uses advanced language models to cluster themes, summarize content, and identify sentiment, and delivers prioritized insights directly in Microsoft Teams. With human-in-the-loop safeguards, the system accelerates triage, prioritization, and follow-ups while maintaining compliance and traceability. Future enhancements include richer automation, trend visualization, and expanded feedback sources.154Views0likes0CommentsNVIDIA NIM for NVIDIA Nemotron, Cosmos, & Microsoft Trellis: Now Available in Azure AI Foundry
We’re excited to announce 7 new powerful NVIDIA NIM™ additions to Azure AI Foundry Models now on Managed Compute. The latest wave of models—NVIDIA Nemotron Nano 9B v2, Llama 3.1 Nemotron Nano VL 8B, Llama 3.3 Nemotron Super 49B v1.5 (coming soon), Cosmos Reason1-7B, Cosmos Predict 2.5 (coming soon), Cosmos Transfer 2.5. (coming soon), and Microsoft Trellis—marks a significant leap forward in intelligent application development. Collectively, these models redefine what’s possible in advanced instruction-following, vision-language understanding, and efficient language modeling, empowering developers to build multimodal, visually rich, and context-aware solutions. By combining robust reasoning, flexible input handling, and enterprise-grade deployment options, these additions accelerate innovation across industries—from robotics and autonomous vehicles to immersive retail and digital twins—enabling smarter, safer, and more adaptive experiences at scale. Meet the Models Model Name Size Primary Use Cases NVIDIA Nemotron Nano 9B v2 Available Now 9B parameters Multilingual Reasoning: Multilingual and code-based reasoning tasks Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling Llama 3.3 Nemotron Super 49B v1.5 Coming Soon 49B Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling Llama 3.1 Nemotron Nano VL 8B Available Now 8B Multimodal: Multimodal vision-language tasks, document intelligence and understanding Edge Agents: Mobile and edge AI agents Cosmos Reason1-7B Available Now 7B Robotics: Planning and executing tasks with physical constraints. Autonomous Vehicles: Understanding environments and making decisions. Video Analytics Agents: Extracting insights and performing root-cause analysis from video data. Cosmos Predict 2.5 Coming Soon 2B Generalist Model: World state generation and prediction Cosmos Transfer 2.5 Coming Soon) 2B Structural Conditioning: Physical AI Microsoft TRELLIS by Microsoft Research (Available Now) - Digital Twins: Generate accurate 3D assets from simple prompts Immersive Retail experiences: photorealistic product models for AR, virtual try-ons Game and simulation development: Turn creative ideas into production-ready 3D content Meet the NVIDIA Nemotron Family NVIDIA Nemotron Nano 9B v2: Compact power for high-performance reasoning and agentic tasks Nemotron Nano 9B v2 is a high-efficiency large language model built with a hybrid Mamba-Transformer architecture, designed to excel in both reasoning and non-reasoning tasks. Efficient architecture for high-performance reasoning: Combines Mamba-2 and Transformer components to deliver strong reasoning capabilities with higher throughput. Extensive multilingual and code capabilities: Trained on diverse language and programming data, it performs exceptionally well across tasks involving natural language (English, German, French, Italian, Spanish and Japanese), code generation, and complex problem solving. Reasoning Budget Control: Supports runtime “thinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think" for helping balance speed, cost, and accuracy during inference. For example, a user can tell the model to think for “1K tokens or 3K tokens, etc ” for different use cases with far better cost predictability. Fig 1. provided by NVIDIA Nemotron Nano 9B v2 is built from the ground up with training data spanning 15 languages and 43 programming languages, giving it broad multilingual and coding fluency. Its capabilities were sharpened through advanced post-training techniques like GRPO and DPO enabling it to reason deeply, follow instructions precisely, and adapt dynamically to different tasks. -> Explore the model card on Azure AI Foundry Llama 3.3 Nemotron Super 49B v1.5: High-throughput reasoning at scale Llama 3.3 Nemotron Super 49Bv1.5 (coming soon) is a significantly upgraded version of Llama-3.3-Nemotron-Super-49B-v1 and is a large language model which is a derivative of Meta Llama-3.3-70B-Instruct (the reference model) optimized for advanced reasoning, instruction following, and tool use across a wide range of tasks. Excels in applications such as chatbots, AI agents, and retrieval-augmented generation (RAG) systems Balances accuracy and compute efficiency for enterprise-scale workloads Designed to run efficiently on a single NVIDIA H100 GPU, making it practical for real-world applications Llama-3.3-Nemotron-Super-49B-v1.5 was trained through a multi-phase process combining human expertise, synthetic data, and advanced reinforcement learning techniques to refine its reasoning and instruction-following abilities. Its impressive performance across benchmarks like MATH500 (97.4%) and AIME 2024 (87.5%) highlights its strength in tackling complex tasks with precision and depth. Llama 3.1 Nemotron Nano VL 8B: Multimodal intelligence for edge deployments Llama 3.1 Nemotron Nano VL 8B is a compact vision-language model that excels in tasks such as report generation, Q&A, visual understand, and document intelligence. This model delivers low latency and high efficiency, reducing TCO. This model was trained on a diverse mix of human-annotated and synthetic data, enabling robust performance across multimodal tasks such as document understanding and visual question answering. It achieved strong results on evaluation benchmarks including DocVQA (91.2%), ChartQA (86.3%), AI2D (84.8%), and OCRBenchV2 English (60.1%). -> Explore the model card on Azure AI Foundry What Sets Nemotron Apart NVIDIA Nemotron is a family of open models, datasets, recipes, and tools. 1. Open-source AI technologies: Open models, data, and recipes offer transparency, allowing developers to create trustworthy custom AI for their specific needs, from creating new agents to refining existing applications. Open Weights: NVIDIA Open Model License offers enterprises data control and flexible deployment. Open Data: Models are trained with transparent, permissively-licensed NVIDIA data, available on Hugging Face, ensuring confidence in use. Additionally, it allows developers to train their high-accuracy custom models with these open datasets. Open Recipe: NVIDIA shares development techniques, like NAS, hybrid architecture, Minitron, as well as NeMo tools enabling customization or creation of custom models. 2. Highest Accuracy & Efficiency: Engineered for efficiency, Nemotron delivers industry leading accuracy in the least amount of time for reasoning, vision, and agentic tasks. 3. Run Anywhere On Cloud: Packaged as NVIDIA NIM, for secure and reliable deployment of high-performance AI model inferencing across Azure platforms. Meet the Cosmos Family NVIDIA Cosmos™ is a world foundation model (WFM) development platform to advance physical AI. At its core are Cosmos WFMs, openly available pretrained multimodal models that developers can use out-of-the-box for generating world states as videos and physical AI reasoning, or post-train to develop specialized physical AI models. Cosmos Reason1-7B: Physical AI Cosmos Reason1-7B combines chain-of-thought reasoning, flexible input handling for images and video, a compact 7B parameter architecture, and advanced physical world understanding making it ideal for real-time robotics, video analytics, and AI agents that require contextual, step-by-step decision-making in complex environments. This model transforms how AI and robotics interact with the real world giving your systems the power to not just see and describe, but truly understand, reason, and make decisions in complex environments like factories, cities, and autonomous vehicles. With its ability to analyze video, plan robot actions, and verify safety protocols, Cosmos Reason1-7B helps developers build smarter, safer, and more adaptive solutions for real-world challenges. Cosmos Reason1-7B is physical AI for 4 embodiments: Fig.2 Physical AI Model Strengths Physical World Reasoning: Leverages prior knowledge, physics laws, and common sense to understand complex scenarios. Chain-of-Thought (CoT) Reasoning: Delivers contextual, step-by-step analysis for robust decision-making. Flexible Input: Handles images, video (up to 30 seconds, 1080p), and text with a 16k context window. Compact & Deployable: 7B parameters runs efficiently from edge devices to the cloud. Production-Ready: Available via Hugging Face, GitHub, and NVIDIA NIM; integrates with industry-standard APIs. Enterprise Use Cases Cosmos Reason1-7B is more than a model, it’s a catalyst for building intelligent, adaptive solutions that help enterprises shape a safer, more efficient, and truly connected physical world. Fig.3 Use Cases Reimagine safety and efficiency by empowering AI agents to analyze millions of live streams and recorded videos, instantly verifying protocols and detecting risks in factories, cities, and industrial sites. Accelerate robotics innovation with advanced reasoning and planning, enabling robots to understand their environment, make methodical decisions, and perform complex tasks—from autonomous vehicles navigating busy streets to household robots assisting with daily chores. Transform data curation and annotation by automating the selection, labeling, and critiquing of massive, diverse datasets, fueling the next generation of AI with high-quality training data. Unlock smarter video analytics with chain-of-thought reasoning, allowing systems to summarize events, verify actions, and deliver actionable insights for security, compliance, and operational excellence. -> Explore the model card on Azure AI Foundry Also coming soon to Azure AI Foundry are two models of the Cosmos WFM, designed for world generation and data augmentation. Cosmos Predict 2.5 2B Cosmos Predict 2.5 is a next-generation world foundation model that generates realistic, controllable video worlds from text, images, or videos—all through a unified architecture. Trained on 200M+ high-quality clips and enhanced with reinforcement learning, it delivers stronger physics and prompt alignment while cutting compute cost and post-training time for faster Physical AI workflows. Cosmos Transfer 2.5 2B While Predict 2.5 generates worlds, Transfer 2.5 that transforms structured simulation inputs—like segmentation, depth, or LiDAR maps—into photorealistic synthetic data for Physical AI training and development. What Sets Cosmos Apart Built for Physical AI — Purpose-built for robotics, autonomous systems, and embodied agents that understand physics, motion, and spatial environments. Multimodal World Modeling — Combines images, video, depth, segmentation, LiDAR, and trajectories to create physics-aware, controllable world simulations. Scalable Synthetic Data Generation — Generates diverse, photorealistic data at scale using structured simulation inputs for faster Sim2Real training and adaptation. Microsoft Trellis by Microsoft Research: Enterprise-ready 3D Generation Microsoft Trellis by Microsoft Research is a cutting-edge 3D asset generation model developed by Microsoft Research, designed to create high-quality, versatile 3D assets, complete with shapes and textures, from text or image prompts. Seamlessly integrated within the NVIDIA NIM microservice, Trellis accelerates asset generation and empowers creators with flexible, production-ready outputs. Quickly generate high-fidelity 3D models from simple text or image prompts perfect for industries like manufacturing, energy, and smart infrastructure looking to accelerate digital twin creation, predictive maintenance, and immersive training environments. From virtual try-ons in retail to production-ready assets in media, TRELLIS empowers teams to create stunning 3D content at scale, cutting down production time and unlocking new levels of interactivity and personalization. -> Explore the model card on Azure AI Foundry Pricing The pricing breakdown consists of the Azure Compute charges plus a flat fee per GPU for the NVIDIA AI Enterprise license that is required to use the NIM software. Pay-as-you-go (per gpu hour) NIM Surcharge: $1 per gpu hour Azure Compute charges also apply based on deployment configuration Why use Managed Compute? Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you: Custom model support: Deploy open-source or third-party models Infrastructure flexibility: Choose your own GPU SKUs (NVIDIA A10, A100, H100) Detailed control: Configure inference servers, protocols, and advanced settings Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies NVIDIA NIM Microservices on Azure These models are available as NVIDIA NIM™ microservices on Azure AI Foundry. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing. NIM microservices are pre-built, containerized AI endpoints that simplify deployment and scale across environments. They allow developers to run models securely and efficiently in the cloud environment. If you're ready to build smarter, more capable AI agents, start exploring Azure AI Foundry. Build Trustworthy AI Solutions Azure AI Foundry delivers managed compute designed for enterprise-grade security, privacy, and governance. Every deployment of NIM microservices through Azure AI Foundry is backed by Microsoft’s Responsible AI principles and Secure Future Initiative ensuring fairness, reliability, and transparency so organizations can confidently build and scale agentic AI workflows. How to Get Started in Azure AI Foundry Explore Azure AI Foundry: Begin by accessing the Azure AI Foundry portal and then following the steps below. Navigate to ai.azure.com. Select on top left existing project that is (Hub) resource provider. If you do not have a HUB Project, create new Hub Project using “+ Create New” link. Choose AI Hub Resource: Deploy with NIM Microservices: Use NVIDIA’s optimized containers for secure, scalable deployment. Select Model Catalog from the left sidebar menu: In the "Collections" filter, select NVIDIA to see all the NIM microservices that are available on Azure AI Foundry. Select the NIM you want to use. Click Deploy. Choose the deployment name and virtual machine (VM) type that you would like to use for your deployment. VM SKUs that are supported for the selected NIM and also specified within the model card will be preselected. Note that this step requires having sufficient quota available in your Azure subscription for the selected VM type. If needed, follow the instructions to request a service quota increase. Use this NVIDIA NeMo Agent Toolkit: designed to orchestrate, monitor, and optimize collaborative AI agents. Note about the License Users are responsible for compliance with the terms of NVIDIA AI Product Agreement . Learn More How to Deploy NVIDIA NIM Docs Learn More about Accelerating agentic workflows with Azure AI Foundry, NVIDIA NIM, and NVIDIA NeMo Agent Toolkit Register for Microsoft Ignite 2025272Views1like0CommentsUnderstanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.Deepening our Partnership with Mistral AI on Azure AI Foundry
We’re excited to mark a new chapter in our collaboration with Mistral AI, a leading European AI innovator, with the launch of Mistral Document AI in Azure AI Foundry Models. This marks the first in a series of Mistral models coming to Azure as a serverless API, giving customers seamless access to Mistral’s cutting-edge capabilities, fully hosted, managed, and integrated into the Foundry ecosystem. This launch also deepens our support for sovereign cloud customers —especially in Europe. At Microsoft, we believe Sovereign AI is essential for enabling organizations and regulated industries to harness the full potential of AI while maintaining control over their security, data, and governance. As Satya Nadella has said, “We want every country, every organization, to build AI in a way that respects their sovereignty—of data, of applications, and of infrastructure.” By combining Mistral’s state-of-the-art models with Azure’s enterprise-grade reliability and scale we’re enabling customers to confidently deploy AI that meets strict regulatory and data sovereignty requirements. Mistral Document AI By the Mistral AI Team “Enterprises today are overwhelmed with documents—contracts, forms, research papers, invoices—holding critical information that’s often trapped in scanned images and PDFs. With nearly 90% of enterprise data stored in unstructured formats, traditional OCR simply can’t keep up. Mistral Document AI is built with a multimodal approach that combines vision and language understanding, it interprets documents with contextual intelligence and delivers structured outputs that reflect the original layout—tables remain tables, headings remain headings, and images are preserved alongside the text.” Key Capabilities Document Parsing: Mistral Document AI interprets complex layouts and extracts rich structures such as tables, charts, and LaTeX-formatted equations with markdown-style clarity. Multilingual & Multimodal: The model supports dozens of languages and understands both text and visual elements, making it well-suited for global, diverse datasets. Structured Output & Doc-as-Prompt: Mistral Document AI delivers results in structured formats like JSON, enabling easy downstream integration with databases or AI agents. This supports use cases like Retrieval-Augmented Generation (RAG), where document content becomes a prompt for subsequent queries. Use Cases Document Digitization: Process archives of scanned PDFs or handwritten forms into structured digital records. Knowledge Extraction: Transform research papers, technical manuals, or customer guides into machine-readable formats. RAG pipelines and Intelligent Agents: Integrate structured output into pipelines that feed AI systems for Q&A, summarization, and more. Mistral Document AI on Azure AI Foundry You can now access Mistral Document AI’s capabilities through Azure AI Foundry as a serverless Azure model, sold directly from Microsoft. One-Click Deployment (Serverless) – With a few clicks, you can deploy the model as a serverless REST API, without needing to provision any GPU machines or container hosts. This makes it easy to get started. Enterprise-Grade Security & Privacy – Because the model runs within your Azure environment, you get network isolation and data security out of the box. All inferencing happens in Azure’s cloud under your account, so your documents aren’t sent to a third-party server. Azure AI Foundry ensures your data stays private (no data leaves the Azure region you choose) and offers compliance with enterprise security standards. This is critical for sensitive use cases like banking or healthcare documents. Integrated Responsible AI Capabilities – With Mistral Doc AI running in Azure AI Foundry, you can apply Azure’s built-in Responsible AI tools—such as content filtering, safety system monitoring, and evaluation frameworks—to ensure your deployments align with your organization’s ethical and compliance standards. Observability & Monitoring – Foundry’s monitoring features give you full visibility into model usage, performance, and cost. You can track API calls, latency, and error rates, enabling proactive troubleshooting and optimization. Agent Services Enablement – You can connect Mistral Document AI to Azure AI Agent Service, enabling intelligent agents to process, reason over, and act on extracted document data—unlocking new automation and decision-making scenarios. Azure Ecosystem Integration – Once deployed, the Mistral Document AI endpoint can easily plug into your existing Azure workflows. And because it’s part of Foundry, you can manage it alongside other models in a unified way. This interoperability accelerates the development of intelligent applications. Getting Started: Deploying and Using Mistral Document AI on Azure Setting up Mistral Document AI on Azure AI Foundry is straightforward. Here’s a quick guide to get you up and running: Create an Azure AI Foundry workspace – Ensure you have an Azure subscription (pay-as-you-go, not a free trial) and create an AI Foundry hub and project in the Azure portal Deploy the Mistral Document AI model – In the Azure AI Foundry Model Catalog, search for “mistral-document-ai-2505”. Then click the Deploy button. You’ll be prompted to select a pricing plan – choose deploy. Call the Mistral Document AI API – Once deployed, using the model is as easy as calling a REST API. You can do this from any programming language or even a command-line tool like cURL. Integrate and iterate – With the OCR results in hand, you can integrate Mistral Document AI into your workflows. Conclusion Mistral Document AI joins Azure AI Foundry as one of the several tools available to help organizations unlock insights from unstructured documents. This launch reflects our continued commitment to bringing the latest, most capable models into Foundry, giving developers and enterprises more choice than ever. Whether you’re digitizing records, building knowledge bases, or enhancing your AI workflows, Azure AI Foundry offers powerful and accessible solutions. Pricing Model Name Pricing /1K pages mistral-document-ai-2505 Global $3 mistral-document-ai-2505 DataZone $3.3 Mistral OCR Global $1 Resources Explore Mistral Document AI MS Learn Github Code Samples9.4KViews3likes3CommentsPhi-4: Small Language Models That Pack a Punch
What Are Small Language Models, and Why Should You Care? If you've been following AI development, you can probably recall "bigger is better" being the mantra for years. GPT-3.5 was 175 billion parameters, GPT-4 is even larger, and everyone seemed to be in an arms race to build the biggest model possible. But here's the thing: bigger models are expensive to run, slow to respond, and often overkill for what you actually need. Small Language Models (SLMs) flip this script. These are models with fewer parameters (typically 1-15 billion) that are trained really thoughtfully on high-quality data. The outcome of this is models that can run on your laptop, respond instantly, and still handle complex reasoning tasks. You can extrapolate from this, increased speed, privacy, and cost-effectiveness. Microsoft's been exploring this space for a while. It started with Phi-1, which showed that small models trained on carefully curated "textbook-like" data could punch way above their weight class. Then came Phi-2 and Phi-3, each iteration getting better at reasoning and problem-solving. Now we have Phi-4, and it's honestly impressive. At 14 billion parameters, it outperforms models that are 5 times its size on math and reasoning tasks. Microsoft trained it on 9.8 trillion tokens over three weeks, using a mix of synthetic data (generated by larger models like GPT-4o) and high-quality web content. The key innovation isn't just throwing more data at it but they were incredibly selective about what to include, focusing on teaching reasoning patterns rather than memorizing facts. The Phi family has also expanded recently. There's Phi-4-mini at 3.8 billion parameters for even lighter deployments, and Phi-4-multimodal at 5.6 billion parameters that can handle text, images, and audio all at once. Pretty cool if you're building something that needs to understand screenshots or transcribe audio. How Well Does It Actually Perform? Let's talk numbers, because that's where Phi-4 really shines. On MMLU (a broad test of knowledge across 57 subjects), Phi-4 scores 84.8%. That's better than Phi-3's 77.9% and competitive with models like GPT-4o-mini. On MATH (competition-level math problems), it hits 56.1%, which is significantly higher than Phi-3's 42.5%. For code generation on HumanEval, it achieves 82.6%. Model Parameters MMLU MATH HumanEval Phi-3-medium 14B 77.9% 42.5% 62.5% Phi-4 14B 84.8% 56.1% 82.6% Llama 3.3 70B 86.0% ~51% ~73% GPT-4o-mini Unknown ~82% 52.2% 87.2% Microsoft tested Phi-4 on the November 2024 AMC-10 and AMC-12 math competitions. These are tests that over 150,000 high school students take each year, and the questions appeared after all of Phi-4's training data was collected. Phi-4 beat not just similar-sized models, but also much larger ones. That suggests it's actually learned to reason, not just memorize benchmark answers. The model also does well on GPQA (graduate-level science questions) and even outperforms its teacher model GPT-4o on certain reasoning tasks. That's pretty remarkable for a 14 billion parameter model. If you're wondering about practical performance, Phi-4 runs about 2-4x faster than comparable larger models and uses significantly less memory. You can run it on a single GPU or even on newer AI-capable laptops with NPUs. That makes it practical for real-time applications where latency matters. Try Phi-4 Yourself You can start experimenting with Phi-4 right now without any complicated setup. Azure AI Foundry Microsoft's Azure AI Foundry is probably the quickest way to get started. Once you're logged in: Go to the Model Catalog and search for "Phi-4" Click "Use this Model" Select an active subscription in the subsequent pop-up and confirm Deploy and start chatting or testing prompts The playground lets you adjust parameters like temperature and see how the model responds. You can test it on math problems, coding questions, or reasoning tasks without writing any code. There's also a code view that shows you how to integrate it into your own applications. Hugging Face (for open-source enthusiasts) If you prefer to work with open-source tools, the model weights are available on Hugging Face. You can run it locally or use their hosted inference API: # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-4") messages = [ {"role": "user", "content": "What's the derivative of x²?"}, ] pipe(messages) Other Options The Phi Cookbook on GitHub has tons of examples for different use cases like RAG (retrieval-augmented generation), function calling, and multimodal inputs. If you want to run it locally with minimal setup, you can use Ollama (ollama pull phi-4) or LM Studio, which provides a nice GUI. The Azure AI Foundry Labs also has experimental features where you can test Phi-4-multimodal with audio and image inputs. What's Next? Phi-4 is surprisingly capable for its size, and it's practical enough to run almost anywhere. Whether you're building a chatbot, working on educational software, or just experimenting with AI, it's worth checking out. We might explore local deployment in more detail later, including how to build multi-agent systems where several SLMs work together, and maybe even look at fine-tuning Phi-4 for specific tasks. But for now, give it a try and see what you can build with it. The model weights are MIT licensed, so you're free to use them commercially. Microsoft's made it pretty easy to get started, so there's really no reason not to experiment. Resources: Azure AI Foundry Phi-4 on Hugging Face Phi Cookbook Phi-4 Technical Report226Views0likes0CommentsServerless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔