artifical intelligence
40 TopicsBuilding Knowledge-Grounded Conversational AI Agents with Azure Speech Photo Avatars
From Chat to Presence: The Next Step in Conversational AI Chat agents are now embedded across nearly every industry, from customer support on websites to direct integrations inside business applications designed to boost efficiency and productivity. As these agents become more capable and more visible, user expectations are also rising: conversations should feel natural, trustworthy, and engaging. While text‑only chat agents work well for many scenarios, voice‑enabled agents take a meaningful step forward by introducing a clearer persona and a stronger sense of presence, making interactions feel more human and intuitive (see healow Genie success story). In domains such as Retail, Healthcare, Education, and Corporate Training, adding a visual dimension through AI avatars further elevates the experience. Pairing voice with a lifelike visual representation improves inclusiveness, reduces interaction friction, and helps users better contextualize conversations—especially in scenarios that rely on trust, guidance, or repeated engagement. To support these experiences, Microsoft offers two AI avatar options through Azure Speech: Video Avatars, which are generally available and provide full‑ or partial‑body immersive representations, and Photo Avatars, currently in public preview, which deliver a headshot‑style visual well suited for web‑based agents and digital twin scenarios. Both options support custom avatars, enabling organizations to reflect their brand identity rather than relying solely on generic representations (see W2M custom video avatar). Choosing between Video Avatars and Photo Avatars is less about preference and more about intent. Video Avatars offer higher visual fidelity and immersion but require more extensive onboarding, such as high-quality recorded video of an avatar talent. Photo Avatars, by contrast, can be created from a single image, enabling a lighter‑weight onboarding process while still delivering a human‑centered experience. The right choice depends on the desired interaction style, visual presence, and target deployment scenario. What this solution demonstrates In this post, I walk through how to integrate Azure Speech Photo Avatars — powered by Microsoft Research's VASA-1 model — into a knowledge‑grounded conversational AI agent built on Azure AI Search. The goal is to show how voice, visuals, and retrieval‑augmented generation (RAG) can come together to create a more natural and engaging agent experience. The solution exposes a web‑based interface where users can speak naturally to the AI agent using their voice. The agent responds in real time using synthesized speech, while live transcriptions of the conversation are displayed in the UI to improve clarity and accessibility. To help compare different interaction patterns, the sample application supports three modes: 1) Photo Avatar mode, which adds a lifelike visual presence. 2) Video Avatar mode, which provides a more immersive, full‑motion experience. 3) Voice‑only mode, which focuses purely on speech‑to‑speech interaction. Key architectural components An end‑to‑end architecture for the solution is shown in the diagram below. The solution is composed of the following core services and building blocks: Microsoft Foundry — provides the platform for deploying, managing, and accessing the foundation models used by the application. Azure OpenAI — provides the Realtime API for speech‑to‑speech interaction in the voice‑only mode and the Chat Completions API used by backend services for reasoning and conversational responses. gpt‑4.1 — LLM used for reasoning tasks such as deciding when to invoke tool calls and summarizing responses. gpt-realtime-mini — LLM used for speech-to-speech interaction in the Voice-only mode. text‑embedding‑3‑large — LLM used for generating vector embeddings used in retrieval‑augmented generation. Azure Speech — delivers the real‑time speech‑to‑text (STT), text‑to‑speech (TTS), and AI avatars capabilities for both Photo Avatar and Video Avatar experiences. Azure Document Intelligence — extracts structured text, layout, and key information from source documents used to build the knowledge base. Azure AI Search — provides vector‑based retrieval to ground the language model with relevant, context‑aware content. Azure Container Apps — hosts the web UI frontend, backend services, and MCP server within a managed container runtime. Azure Container Apps Environment — defines a secure and isolated boundary for networking, scaling, and observability of the containerized workloads. Azure Container Registry — stores and manages Docker images used by the container applications. How you can try it yourself The complete sample implementation is available in the LiveChat AI Voice Assistant repository, which includes instructions for deploying the solution into your Azure environment. The repository uses Infrastructure as Code (IaC) deployment via Azure Developer CLI (azd) to orchestrate Azure resource provisioning and application deployment. Prerequisites: An Azure subscription with appropriate services and models' quota is required to deploy the solution. Getting the solution up and running in just three simple steps: Clone the repository and navigate to the project git clone https://github.com/mardianto-msft/azure-speech-ai-avatars.git cd azure-speech-ai-avatars Authenticate with Azure azd auth login Initialize and deploy the solution azd up Once deployed, you can access the sample application by opening the frontend service URL in a web browser. To demonstrate knowledge grounding, the sample includes source documents derived from Microsoft’s 2025 Annual Report and Shareholder Letter. These grounding documents can optionally be replaced with your own data, allowing the same architecture to be reused for domain‑specific or enterprise scenarios. When using the provided sample documents, you can ask questions such as: “How much was Microsoft’s net income in 2025?”, “What are Microsoft’s priorities according to the shareholder letter?”, “Who is Microsoft’s CEO?” Bringing Conversational AI Agents to Life This implementation of Azure Speech Photo Avatars serves as a practical starting point for building more engaging, knowledge‑grounded conversational AI agents. By combining voice interaction, visual presence, and retrieval‑augmented generation, Photo Avatars offer a lightweight yet powerful way to make AI agents feel more approachable, trustworthy, and human‑centered — especially in web‑based and enterprise scenarios. From here, the solution can be extended over time with capabilities such as long‑term memory, richer personalization, or more advanced multi‑agent orchestration. Whether used as a reference architecture or as the foundation for a production system, this approach demonstrates how Azure Speech Photo Avatars can help bridge the gap between conversational intelligence and meaningful user experience. By emphasizing accessibility, trust, and human‑centered design, it reflects Microsoft’s broader mission to empower every person and every organization on the planet to achieve more.152Views0likes0CommentsWhat’s trending on Hugging Face: PubMedBERT Base Embeddings, Paraphrase Multilingual MiniLM, BGE-M3
The embedding model landscape has evolved beyond one-size-fits-all solutions. Today’s developers navigate a set of deliberate trade‑offs: domain specialization to improve accuracy in vertical applications, multilingual capabilities to support global use cases, and retrieval strategies that optimize performance at scale. Once a model demonstrates strong semantic performance, predictable behavior, and broad community support, it often becomes a trusted reference baseline that developers build around and deploy with confidence. This week, we’re not spotlighting models that are new to Microsoft Foundry. Instead, we’re turning our attention to models that have managed to stay relevant in a rapidly expanding sea of options. This week's Model Monday's edition highlights three Hugging Face models including NeuML's PubMedBERT Base Embeddings for domain-specific medical text understanding, Sentence Transformers' Paraphrase Multilingual MiniLM for lightweight cross-lingual semantic similarity, and BAAI's BGE-M3 for multi-functional long-context retrieval across 100+ languages. Models of the week NeuML: PubMedBERT Base Embeddings Model Specs Parameters / size: 109M Context length: 512 tokens Primary task: Embeddings (medical domain) Why it's interesting Domain-specific performance gains: Fine-tuned on PubMed title-abstract pairs, achieving 95.62% average Pearson correlation across medical benchmarks—outperforming general-purpose models like gte-base (95.37%), bge-base-en-v1.5 (93.78%), and all-MiniLM-L6-v2 (93.46%) on medical literature tasks Production-validated for medical RAG: With 141K downloads and deployment in 30+ medical AI applications, this model demonstrates consistent real-world performance for clinical research, drug discovery, and biomedical semantic search pipelines Built on Microsoft's BiomedNLP foundation: Extends BioMed BERT family with sentence-transformers mean pooling, creating 768-dimensional embeddings optimized for medical literature clustering and retrieval Try it Clinical research sample prompt: Industry specific sample prompt: You're building a clinical decision support system for oncology. Deploy PubMedBERT Base Embeddings in Microsoft Foundry to index 50,000 recent cancer research abstracts from PubMed. A physician queries: "What are the cardiotoxicity risks of combining checkpoint inhibitors with anthracycline chemotherapy in elderly patients?" Embed the query, retrieve the top 10 most semantically similar abstracts using cosine similarity, and return citations with PubMed IDs for evidence-based treatment planning. Sentence Transformers: Paraphrase Multilingual MiniLM L12 v2 Model Specs Parameters / size: 117M Context length: 128 tokens Primary task: Embeddings (multilingual, sentence similarity) Why it's interesting Multilingual adoption: Supports 50+ languages including Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Russian, Thai, and Vietnamese—with 18.4 million downloads last month demonstrating production-scale validation across global deployments Compact architecture for edge deployment: At 117M parameters producing 384-dimensional embeddings, this model balances multilingual coverage with inference efficiency, making it ideal for resource-constrained environments or high-throughput applications Sentence-BERT foundation: Based on the influential Sentence-BERT paper (Reimers & Gurevych, 2019), using siamese BERT networks with mean pooling to create semantically meaningful sentence embeddings for clustering, paraphrase detection, and cross-lingual search Community-proven versatility: With 299 fine-tuned variants and 100+ Spaces implementations, this model serves as a peer reviewed starting point for multilingual semantic similarity tasks, from customer support ticket routing to cross-lingual document retrieval Try it E-commerce sample prompt: You're building a global customer support platform for an e-commerce company operating in 30 countries. Deploy Paraphrase Multilingual MiniLM in Microsoft Foundry to process incoming support tickets in English, Spanish, French, German, Portuguese, Japanese, and Korean. Embed each ticket as a 384-dimensional vector and cluster by semantic similarity to automatically route issues to specialized teams (payment, shipping, returns, technical). Flag duplicate tickets with cosine similarity > 0.85 to prevent redundant responses. BAAI: BGE-M3 Model Specs Parameters / size: ~560M Context length: 8192 tokens Primary task: Embeddings (multi-functional: dense, sparse, multi-vector) Why it's interesting Three retrieval modes in one model: Uniquely supports dense retrieval (1024-dim embeddings), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching)—enabling hybrid search pipelines without maintaining separate models or indexes Exceptional long-context capability: 8192-token context window handles full documents, legal contracts, research papers, and lengthy technical content—validated on MLDR (13-language document retrieval) and NarrativeQA (long-form question answering) benchmarks Multilingual dominance: Outperforms OpenAI embeddings on MIRACL multilingual retrieval across 13+ languages and demonstrates strong zero-shot cross-lingual transfer on MKQA. Try it Legal document search sample prompt: You're building a legal document search system for a multinational law firm. Deploy BGE-M3 in Microsoft Foundry to index 5,000 full-length commercial contracts (average 6,000 tokens each) in English, French, German, and Spanish. A lawyer queries: "Find all force majeure clauses that exclude liability for pandemics or global health emergencies." Use hybrid retrieval: (1) dense embeddings for semantic similarity to capture concept variations like "Act of God" or "unforeseen circumstances", (2) sparse retrieval for exact keyword matches on "force majeure", "pandemic", "health emergency". Combine scores with weighted sum (0.6 dense + 0.4 sparse) and return top 15 contract sections with clause numbers and jurisdiction metadata. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry76Views0likes0CommentsFoundry IQ: Unlocking ubiquitous knowledge for agents
Introducing Foundry IQ by Azure AI Search in Microsoft Foundry. Foundry IQ is a centralized knowledge layer that connects agents to data with the next generation of retrieval-augmented generation (RAG). Foundry IQ includes the following features: Knowledge bases: Available directly in the new Foundry portal, knowledge bases are reusable, topic-centric collections that ground multiple agents and applications through a single API. Automated indexed and federated knowledge sources – Expand what data an agent can reach by connecting to both indexed and remote knowledge sources. For indexed sources, Foundry IQ delivers automatic indexing, vectorization, and enrichment for text, images, and complex documents. Agentic retrieval engine in knowledge bases – A self-reflective query engine that uses AI to plan, select sources, search, rank and synthesize answers across sources with configurable “retrieval reasoning effort.” Enterprise-grade security and governance – Support for document-level access control, alignment with existing permissions models, and options for both indexed and remote data. Foundry IQ is available in public preview through the new Foundry portal and Azure portal with Azure AI Search. Foundry IQ is part of Microsoft's intelligence layer with Fabric IQ and Work IQ.34KViews6likes2CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry653Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Open‑source AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a production‑ready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Monday’s series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This week’s Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Meta’s unified foundation model for image and video segmentation, and MiniMax’s latest open-source agentic model optimized for complex workflows. Models of the week Z.AI’s GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why it’s interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑context agent workflows (local or low‑cost autonomous agents) Call out long‑horizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLM‑4.7‑Flash works best when you give it a clear goal and let it reason through a bounded task, here’s an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a mid‑scale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edge‑case failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize low‑frequency, high‑impact risks that standard testing misses. Recommend minimal, low‑cost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why it’s interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Best‑fit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation What’s notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Best‑practice prompt pattern Agentic coding (multi‑step repo work, debugging, refactoring) Treat SAM 3 as a concept detector, not an interactive click tool. Use short, concrete noun‑phrase concept prompts instead of describing the scene or asking questions. Example prompt: “yellow school bus” or “shipping containers”. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: “person wearing a red jersey”. Hard‑to‑name or visually subtle objects Use exemplar‑based prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid over‑constraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAM 3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixel‑accurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frame‑level masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why it’s interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to one”. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Best‑fit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. What’s notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Best‑practice prompt pattern End‑to‑end agentic coding (multi‑file edits, run‑fix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and step‑by‑step execution, then a single consolidated result. Long‑horizon tool‑using agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1’s interleaved thinking and improved instruction‑constraint handling are designed for complex, multi‑step analytical tasks that require evidence tracking and coherent synthesis, not conversational back‑and‑forth. Long‑context reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMax‑M2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMax‑M2.1 is designed to act as a long‑horizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of information—here’s a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry840Views0likes0CommentsBeyond the Model: Empower your AI with Data Grounding and Model Training
Discover how Microsoft Foundry goes beyond foundational models to deliver enterprise-grade AI solutions. Learn how data grounding, model tuning, and agentic orchestration unlock faster time-to-value, improved accuracy, and scalable workflows across industries.808Views6likes4CommentsAnswer synthesis in Foundry IQ: Quality metrics across 10,000 queries
With answers, you can control your entire RAG pipeline directly in Foundry IQ by Azure AI Search, without integrations. Responding only when the data supports it, answers delivers grounded, steerable, citation-rich responses and traces each piece of information to its original source. Here’s how it works and how it performed across our experiments.860Views0likes0CommentsPublishing Agents from Microsoft Foundry to Microsoft 365 Copilot & Teams
Better Together is a series on how Microsoft’s AI platforms work seamlessly to build, deploy, and manage intelligent agents at enterprise scale. As organizations embrace AI across every workflow, Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio are coming together to deliver a unified approach—from development to deployment to day-to-day operations. This three-part series explores how these technologies connect to help enterprises build AI agents that are secure, governed, and deeply integrated with Microsoft’s product ecosystem. Series Overview Part 1: Publishing from Foundry to Microsoft 365 Copilot and Microsoft Teams Part 2: Foundry + Agent 365 — Native Integration for Enterprise AI Part 3: Microsoft Copilot Studio Integration with Foundry Agents This blog focuses on Part 1: Publishing from Foundry to Microsoft 365 Copilot—how developers can now publish agents built in Foundry directly to Microsoft 365 Copilot and Teams in just a few clicks. Build once. Publish everywhere. Developers can now take an AI agent built in Microsoft Foundry and publish it directly to Microsoft 365 Copilot and Microsoft Teams in just a few clicks. The new streamlined publishing flow eliminates manual setup across Entra ID, Azure Bot Service, and manifest files, turning hours of configuration into a seamless, guided flow in the Foundry Playground. Simplifying Agent Publishing for Microsoft 365 Copilot & Microsoft Teams Previously, deploying a Foundry AI agent into Microsoft 365 Copilot and Microsoft Teams required multiple steps: app registration, bot provisioning, manifest editing, and admin approval. With the new Foundry → M365 integration, the process is straightforward and intuitive. Key capabilities No-code publishing — Prepare, package, and publish agents directly from Foundry Playground. Unified build — A single agent package powers multiple Microsoft 365 channels, including Teams Chat, Microsoft 365 Copilot Chat, and BizChat. Agent-type agnostic — Works seamlessly whether you have a prompt agent, hosted agent, or workflow agent. Built-in Governance — Every agent published to your organization is automatically routed through Microsoft 365 Admin Center (MAC) for review, approval, and monitoring. Downloadable package — Developers can download a .zip for local testing or submission to the Microsoft Marketplace. For pro-code developers, the experience is also simplified. A C# code-first sample in the Agent Toolkit for Visual Studio is searchable, featured, and ready to use. Why It Matters This integration isn’t just about convenience; it’s about scale, control, and trust. Faster time to value — Deliver intelligent agents where people already work, without infrastructure overhead. Enterprise control — Admins retain full oversight via Microsoft 365 Admin Center, with built-in approval, review and governance flows. Developer flexibility — Both low-code creators and pro-code developers benefit from the unified publishing experience. Better Together — This capability lays the groundwork for Agent 365 publishing and deeper M365 integrations. Real-world scenarios YoungWilliams built Priya, an AI agent that helps handle government service inquiries faster and more efficiently. Using the one-click publishing flow, Priya was quickly deployed to Microsoft Teams and M365 Copilot without manual setup. This allowed Young Williams’ customers to provide faster, more accurate responses while keeping governance and compliance intact. “Integrating Microsoft Foundry with Microsoft 365 Copilot fundamentally changed how we deliver AI solutions to our government partners,” said John Tidwell, CTO of YoungWilliams. “With Foundry’s one-click publishing to Teams and Copilot, we can take an idea from prototype to production in days instead of weeks—while maintaining the enterprise-grade security and governance our clients expect. It’s a game changer for how public services can adopt AI responsibly and at scale.” Availability Publishing from Foundry to M365 is in Public Preview within the Foundry Playground. Developers can explore the preview in Microsoft Foundry and test the Teams / M365 publishing flow today. SDK and CLI extensions for code-first publishing are generally available. What’s Next in the Better Together Series This blog is part of the broader Better Together series connecting Microsoft Foundry, Microsoft 365, Agent 365, and Microsoft Copilot Studio. Continue the journey: Foundry + Agent 365 — Native Integration for Enterprise AI (Link) Start building today [Quickstart — Publish an Agent to Microsoft 365 ] Try it now in the new Foundry Playground2.6KViews0likes2CommentsOptiMind: A small language model with optimization expertise
Turning a real world decision problem into a solver ready optimization model can take days—sometimes weeks—even for experienced teams. The hardest part is often not solving the problem; it’s translating business intent into precise mathematical objectives, constraints, and variables. OptiMind is designed to try and remove that bottleneck. This optimization‑aware language model translates natural‑language problem descriptions into solver‑ready mathematical formulations, can help organizations move from ideas to decisions faster. Now available through public preview as an experimental model through Microsoft Foundry, OptiMind targets one of the more expertise‑intensive steps in modern optimization workflows. Addressing the Optimization Bottleneck Mathematical optimization underpins many enterprise‑critical decisions—from designing supply chains and scheduling workforces to structuring financial portfolios and deploying networks. While today’s solvers can handle enormous and complex problem instances, formulating those problems remains a major obstacle. Defining objectives, constraints, and decision variables is an expertise‑driven process that often takes days or weeks, even when the underlying business problem is well understood. OptiMind tries to address this gap by automating and accelerating formulation. Developed by Microsoft Research, OptiMind transforms what was once a slow, error‑prone modeling task into a streamlined, repeatable step—freeing teams to focus on decision quality rather than syntax. What makes OptiMind different? OptiMind is not just as a language model, but as a specialized system built for real-world optimization tasks. Unlike general-purpose large language models adapted for optimization through prompting, OptiMind is purpose-built for mixed integer linear programming (MILP), and its design reflects this singular focus. At inference time, OptiMind follows a multi‑stage process: Problem classification (e.g., scheduling, routing, network design) Hint retrieval tailored to the identified problem class Solution generation in solver‑compatible formats such as GurobiPy Optional self‑correction, where multiple candidate formulations are generated and validated This design can improve reliability without relying on agentic orchestration or multiple large models. In internal evaluations on cleaned public benchmarks—including IndustryOR, Mamo‑Complex, and OptMATH—OptiMind demonstrated higher formulation accuracy than similarly sized open models and competitive performance relative to significantly larger systems. OptiMind improved accuracy by approximately 10 percent over the base model. In comparison to open-source models under 32 billion parameters, OptiMind was also found to match or exceed performance benchmarks. For more information on the model, please read the official research blog or the technical paper for OptiMind. Practical use cases: Unlocking efficiency across domains OptiMind is especially valuable where modeling effort—not solver capability—is the primary bottleneck. Typical use cases include: Supply Chain Network Design: Faster formulation of multi‑period network models and logistics flows Manufacturing and Workforce Scheduling: Easier capacity planning under complex operational constraints Logistics and Routing Optimization: Rapid modeling that captures real‑world constraints and variability Financial Portfolio Optimization: More efficient exploration of portfolios under regulatory and market constraints By reducing the time and expertise required to move from problem description to validated model, OptiMind helps teams reach actionable decisions faster and with greater confidence. Getting started OptiMind is available today as an experimental model, and Microsoft Research welcomes feedback from practitioners and enterprise teams. Next steps: Explore the research details: Read more about the model on Foundry Labs and the technical paper on arXiv Try the model: Access OptiMind through Microsoft Foundry Test sample code: Available in the OptiMind GitHub repository Take the next step in optimization innovation with OptiMind—empowering faster, more accurate, and cost-effective problem solving for the future of decision intelligence.1.5KViews0likes0CommentsFrom Zero to Hero: AgentOps - End-to-End Lifecycle Management for Production AI Agents
The shift from proof-of-concept AI agents to production-ready systems isn't just about better models—it's about building robust infrastructure that can develop, deploy, and maintain intelligent agents at enterprise scale. As organizations move beyond simple chatbots to agentic systems that plan, reason, and act autonomously, the need for comprehensive Agent LLMOps becomes critical. This guide walks through the complete lifecycle for building production AI agents, from development through deployment to monitoring, with special focus on leveraging Azure AI Foundry's hosted agents infrastructure. The Evolution: From Single-Turn Prompts to Agentic Workflows Traditional AI applications operated on a simple request-response pattern. Modern AI agents, however, are fundamentally different. They maintain state across multiple interactions, orchestrate complex multi-step workflows, and dynamically adapt their approach based on intermediate results. According to recent analysis, agentic workflows represent systems where language models and tools are orchestrated through a combination of predefined logic and dynamic decision-making. Unlike monolithic systems where a single model attempts everything, production agents break down complex tasks into specialized components that collaborate effectively. The difference is profound. A simple customer service chatbot might answer questions from a knowledge base. An agentic customer service system, however, can search multiple data sources, escalate to specialized sub-agents for technical issues, draft response emails, schedule follow-up tasks, and learn from each interaction to improve future responses. Stage 1: Development with any agentic framework Why LangGraph for Agent Development? LangGraph has emerged as a leading framework for building stateful, multi-agent applications. Unlike traditional chain-based approaches, LangGraph uses a graph-based architecture where each node represents a unit of work and edges define the workflow paths between them. The key advantages include: Explicit State Management: LangGraph maintains persistent state across nodes, making it straightforward to track conversation history, intermediate results, and decision points. This is critical for debugging complex agent behaviors. Visual Workflow Design: The graph structure provides an intuitive way to visualize and understand agent logic. When an agent misbehaves, you can trace execution through the graph to identify where things went wrong. Flexible Control Flows: LangGraph supports diverse orchestration patterns—single agent, multi-agent, hierarchical, sequential—all within one framework. You can start simple and evolve as requirements grow. Built-in Memory: Agents automatically store conversation histories and maintain context over time, enabling rich personalized interactions across sessions. Core LangGraph Components Nodes: Individual units of logic or action. A node might call an AI model, query a database, invoke an external API, or perform data transformation. Each node is a Python function that receives the current state and returns updates. Edges: Define the workflow paths between nodes. These can be conditional (routing based on the node's output) or unconditional (always proceeding to the next step). State: The data structure passed between nodes and updated through reducers. Proper state design is crucial—it should contain all information needed for decision-making while remaining manageable in size. Checkpoints: LangGraph's checkpointing mechanism saves state at each node, enabling features like human-in-the-loop approval, retry logic, and debugging. Implementing the Agentic Workflow Pattern A robust production agent typically follows a cyclical pattern of planning, execution, reflection, and adaptation: Planning Phase: The agent analyzes the user's request and creates a structured plan, breaking complex problems into manageable steps. Execution Phase: The agent carries out planned actions using appropriate tools—search engines, calculators, code interpreters, database queries, or API calls. Reflection Phase: After each action, the agent evaluates results against expected outcomes. This critical thinking step determines whether to proceed, retry with a different approach, or seek additional information. Decision Phase: Based on reflection, the agent decides the next course of action—continue to the next step, loop back to refine the approach, or conclude with a final response. This pattern handles real-world complexity far better than simple linear workflows. When an agent encounters unexpected results, the reflection phase enables adaptive responses rather than brittle failure. Example: Building a Research Agent with LangGraph from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from typing import TypedDict, List class AgentState(TypedDict): query: str plan: List[str] current_step: int research_results: List[dict] final_answer: str def planning_node(state: AgentState): # Agent creates a research plan llm = ChatOpenAI(model="gpt-4") plan = llm.invoke(f"Create a research plan for: {state['query']}") return {"plan": plan, "current_step": 0} def research_node(state: AgentState): # Execute current research step step = state['plan'][state['current_step']] # Perform web search, database query, etc. results = perform_research(step) return {"research_results": state['research_results'] + [results]} def reflection_node(state: AgentState): # Evaluate if we have enough information if len(state['research_results']) >= len(state['plan']): return {"next": "synthesize"} return {"next": "research", "current_step": state['current_step'] + 1} def synthesize_node(state: AgentState): # Generate final answer from all research llm = ChatOpenAI(model="gpt-4") answer = llm.invoke(f"Synthesize research: {state['research_results']}") return {"final_answer": answer} # Build the graph workflow = StateGraph(AgentState) workflow.add_node("planning", planning_node) workflow.add_node("research", research_node) workflow.add_node("reflection", reflection_node) workflow.add_node("synthesize", synthesize_node) workflow.add_edge("planning", "research") workflow.add_edge("research", "reflection") workflow.add_conditional_edges( "reflection", lambda s: s["next"], {"research": "research", "synthesize": "synthesize"} ) workflow.add_edge("synthesize", END) agent = workflow.compile() This pattern scales from simple workflows to complex multi-agent systems with dozens of specialized nodes. Stage 2: CI/CD Pipeline for AI Agents Traditional software CI/CD focuses on code quality, security, and deployment automation. Agent CI/CD must additionally handle model versioning, evaluation against behavioral benchmarks, and non-deterministic behavior. Build Phase: Packaging Agent Dependencies Unlike traditional applications, AI agents have unique packaging requirements: Model artifacts: Fine-tuned models, embeddings, or model configurations Vector databases: Pre-computed embeddings for knowledge retrieval Tool configurations: API credentials, endpoint URLs, rate limits Prompt templates: Versioned prompt engineering assets Evaluation datasets: Test cases for agent behavior validation Best practice is to containerize everything. Docker provides reproducibility across environments and simplifies dependency management: FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Register Phase: Version Control Beyond Git Code versioning is necessary but insufficient for AI agents. You need comprehensive artifact versioning: Container Registry: Azure Container Registry stores Docker images with semantic versioning. Each agent version becomes an immutable artifact that can be deployed or rolled back at any time. Prompt Registry: Version control your prompts separately from code. Prompt changes can dramatically impact agent behavior, so treating them as first-class artifacts enables A/B testing and rapid iteration. Configuration Management: Store agent configurations (model selection, temperature, token limits, tool permissions) in version-controlled files. This ensures reproducibility and enables easy rollback. Evaluate Phase: Testing Non-Deterministic Behavior The biggest challenge in agent CI/CD is evaluation. Unlike traditional software where unit tests verify exact outputs, agents produce variable responses that must be evaluated holistically. Behavioral Testing: Define test cases that specify desired agent behaviors rather than exact outputs. For example, "When asked about product pricing, the agent should query the pricing API, handle rate limits gracefully, and present information in a structured format." Evaluation Metrics: Track multiple dimensions: Task completion rate: Did the agent accomplish the goal? Tool usage accuracy: Did it call the right tools with correct parameters? Response quality: Measured via LLM-as-judge or human evaluation Latency: Time to first token and total response time Cost: Token usage and API call expenses Adversarial Testing: Intentionally test edge cases—ambiguous requests, tool failures, rate limiting, conflicting information. Production agents will encounter these scenarios. Recent research on CI/CD for AI agents emphasizes comprehensive instrumentation from day one. Track every input, output, API call, token usage, and decision point. After accumulating production data, patterns emerge showing which metrics actually predict failures versus noise. Deploy Phase: Safe Production Rollouts Never deploy agents directly to production. Implement progressive delivery: Staging Environment: Deploy to a staging environment that mirrors production. Run automated tests and manual QA against real data (appropriately anonymized). Canary Deployment: Route a small percentage of traffic (5-10%) to the new version. Monitor error rates, latency, user satisfaction, and cost metrics. Automatically rollback if any metric degrades beyond thresholds. Blue-Green Deployment: Maintain two production environments. Deploy to the inactive environment, verify it's healthy, then switch traffic. Enables instant rollback by switching back. Feature Flags: Deploy new agent capabilities behind feature flags. Gradually enable them for specific user segments, gather feedback, and iterate before full rollout. Now since we know how to create an agent using langgraph, the next step will be understand how can we use this langgraph agent to deploy in Azure AI Foundry. Stage 3: Azure AI Foundry Hosted Agents Hosted agents are containerized agentic AI applications that run on Microsoft's Foundry Agent Service. They represent a paradigm shift from traditional prompt-based agents to fully code-driven, production-ready AI systems. When to Use Hosted Agents: ✅ Complex agentic workflows - Multi-step reasoning, branching logic, conditional execution ✅ Custom tool integration - External APIs, databases, internal systems ✅ Framework-specific features - LangGraph graphs, multi-agent orchestration ✅ Production scale - Enterprise deployments requiring autoscaling ✅ Auth- Identity and authentication, Security and compliance controls ✅ CI/CD integration - Automated testing and deployment pipelines Why Hosted Agents Matter Hosted agents bridge the gap between experimental AI prototypes and production systems: For Developers: Full control over agent logic via code Use familiar frameworks and tools Local testing before deployment Version control for agent code For Enterprises: No infrastructure management overhead Built-in security and compliance Scalable pay-as-you-go pricing Integration with existing Azure ecosystem For AI Systems: Complex reasoning patterns beyond prompts Stateful conversations with persistence Custom tool integration and orchestration Multi-agent collaboration Before you get started with Foundry. Deploy Foundry project using the starter code using AZ CLI. # Initialize a new agent project azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic # The template automatically provisions: # - Foundry resource and project # - Azure Container Registry # - Application Insights for monitoring # - Managed identities and RBAC # Deploy azd up The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. Local Development to Production Workflow A streamlined workflow bridges development and production: Develop Locally: Build and test your LangGraph agent on your machine. Use the Foundry SDK to ensure compatibility with production APIs. Validate Locally: Run the agent locally against the Foundry Responses API to verify it works with production authentication and conversation management. Containerize: Package your agent in a Docker container with all dependencies. Deploy to Staging: Use azd deploy to push to a staging Foundry project. Run automated tests. Deploy to Production: Once validated, deploy to production. Foundry handles versioning, so you can maintain multiple agent versions and route traffic accordingly. Monitor and Iterate: Use Application Insights to monitor agent performance, identify issues, and plan improvements. Azure AI Toolkit for Visual Studio offer this great place to test your hosted agent. You can also test this using REST. FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Once you are able to run agent and test in local playground. You want to move to the next step of registering, evaluating and deploying agent in Microsoft AI Foundry. CI/CD with GitHub Actions This repository includes a GitHub Actions workflow (`.github/workflows/mslearnagent-AutoDeployTrigger.yml`) that automatically builds and deploys the agent to Azure when changes are pushed to the main branch. 1. Set Up Service Principal # Create service principal az ad sp create-for-rbac \ --name "github-actions-agent-deploy" \ --role contributor \ --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP # Output will include: # - appId (AZURE_CLIENT_ID) # - tenant (AZURE_TENANT_ID) 2. Configure Federated Credentials # For GitHub Actions OIDC az ad app federated-credential create \ --id $APP_ID \ --parameters '{ "name": "github-actions-deploy", "issuer": "https://token.actions.githubusercontent.com", "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main", "audiences": ["api://AzureADTokenExchange"] }' 3. Set Required Permissions Critical: Service principal needs Azure AI User role on AI Services resource: # Get AI Services resource ID AI_SERVICES_ID=$(az cognitiveservices account show \ --name $AI_SERVICES_NAME \ --resource-group $RESOURCE_GROUP \ --query id -o tsv) # Assign Azure AI User role az role assignment create \ --assignee $SERVICE_PRINCIPAL_ID \ --role "Azure AI User" \ --scope $AI_SERVICES_ID 4. Configure GitHub Secrets Go to GitHub repository → Settings → Secrets and variables → Actions Add the following secrets: AZURE_CLIENT_ID=<from-service-principal> AZURE_TENANT_ID=<from-service-principal> AZURE_SUBSCRIPTION_ID=<your-subscription-id> AZURE_AI_PROJECT_ENDPOINT=<your-project-endpoint> ACR_NAME=<your-acr-name> IMAGE_NAME=calculator-agent AGENT_NAME=CalculatorAgent 5. Create GitHub Actions Workflow Create .github/workflows/deploy-agent.yml: name: Deploy Agent to Azure AI Foundry on: push: branches: - main paths: - 'main.py' - 'custom_state_converter.py' - 'requirements.txt' - 'Dockerfile' workflow_dispatch: inputs: version_tag: description: 'Version tag (leave empty for auto-increment)' required: false type: string permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Generate version tag id: version run: | if [ -n "${{ github.event.inputs.version_tag }}" ]; then echo "VERSION=${{ github.event.inputs.version_tag }}" >> $GITHUB_OUTPUT else # Auto-increment version VERSION="v$(date +%Y%m%d-%H%M%S)" echo "VERSION=$VERSION" >> $GITHUB_OUTPUT fi - name: Azure Login (OIDC) uses: azure/login@v1 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install Azure AI SDK run: | pip install azure-ai-projects azure-identity - name: Build and push Docker image run: | az acr build \ --registry ${{ secrets.ACR_NAME }} \ --image ${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }} \ --image ${{ secrets.IMAGE_NAME }}:latest \ --file Dockerfile \ . - name: Register agent version env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} ACR_NAME: ${{ secrets.ACR_NAME }} IMAGE_NAME: ${{ secrets.IMAGE_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} VERSION: ${{ steps.version.outputs.VERSION }} run: | python - <<EOF import os from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from azure.ai.projects.models import ImageBasedHostedAgentDefinition project_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"] credential = DefaultAzureCredential() project_client = AIProjectClient.from_connection_string( credential=credential, conn_str=project_endpoint, ) agent_name = os.environ["AGENT_NAME"] version = os.environ["VERSION"] image_uri = f"{os.environ['ACR_NAME']}.azurecr.io/{os.environ['IMAGE_NAME']}:{version}" agent_definition = ImageBasedHostedAgentDefinition( image=image_uri, cpu=1.0, memory="2Gi", ) agent = project_client.agents.create_or_update( agent_id=agent_name, body=agent_definition ) print(f"Agent version registered: {agent.version}") EOF - name: Start agent run: | echo "Agent deployed successfully with version ${{ steps.version.outputs.VERSION }}" - name: Deployment summary run: | echo "### Deployment Summary" >> $GITHUB_STEP_SUMMARY echo "- **Agent Name**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Image**: ${{ secrets.ACR_NAME }}.azurecr.io/${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: Deployed" >> $GITHUB_STEP_SUMMARY 6. Add Container Status Verification To ensure deployments are truly successful, add a script to verify container startup before marking the pipeline as complete. Create wait_for_container.py: """ Wait for agent container to be ready. This script polls the agent container status until it's running successfully or times out. Designed for use in CI/CD pipelines to verify deployment. """ import os import sys import time import requests from typing import Optional, Dict, Any from azure.identity import DefaultAzureCredential class ContainerStatusWaiter: """Polls agent container status until ready or timeout.""" def __init__( self, project_endpoint: str, agent_name: str, agent_version: str, timeout_seconds: int = 600, poll_interval: int = 10, ): """ Initialize the container status waiter. Args: project_endpoint: Azure AI Foundry project endpoint agent_name: Name of the agent agent_version: Version of the agent timeout_seconds: Maximum time to wait (default: 10 minutes) poll_interval: Seconds between status checks (default: 10s) """ self.project_endpoint = project_endpoint.rstrip("/") self.agent_name = agent_name self.agent_version = agent_version self.timeout_seconds = timeout_seconds self.poll_interval = poll_interval self.api_version = "2025-11-15-preview" # Get Azure AD token credential = DefaultAzureCredential() token = credential.get_token("https://ml.azure.com/.default") self.headers = { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json", } def _get_container_url(self) -> str: """Build the container status URL.""" return ( f"{self.project_endpoint}/agents/{self.agent_name}" f"/versions/{self.agent_version}/containers/default" ) def get_container_status(self) -> Optional[Dict[str, Any]]: """Get current container status.""" url = f"{self._get_container_url()}?api-version={self.api_version}" try: response = requests.get(url, headers=self.headers, timeout=30) if response.status_code == 200: return response.json() elif response.status_code == 404: return None else: print(f"⚠️ Warning: GET container returned {response.status_code}") return None except Exception as e: print(f"⚠️ Warning: Error getting container status: {e}") return None def wait_for_container_running(self) -> bool: """ Wait for container to reach running state. Returns: True if container is running, False if timeout or error """ print(f"\n🔍 Checking container status for {self.agent_name} v{self.agent_version}") print(f"⏱️ Timeout: {self.timeout_seconds}s | Poll interval: {self.poll_interval}s") print("-" * 70) start_time = time.time() iteration = 0 while time.time() - start_time < self.timeout_seconds: iteration += 1 elapsed = int(time.time() - start_time) container = self.get_container_status() if not container: print(f"[{iteration}] ({elapsed}s) ⏳ Container not found yet, waiting...") time.sleep(self.poll_interval) continue # Extract status information status = ( container.get("status") or container.get("state") or container.get("provisioningState") or "Unknown" ) # Check for replicas information replicas = container.get("replicas", {}) ready_replicas = replicas.get("ready", 0) desired_replicas = replicas.get("desired", 0) print(f"[{iteration}] ({elapsed}s) 📊 Status: {status}") if replicas: print(f" 🔢 Replicas: {ready_replicas}/{desired_replicas} ready") # Check if container is running and ready if status.lower() in ["running", "succeeded", "ready"]: if desired_replicas == 0 or ready_replicas >= desired_replicas: print("\n" + "=" * 70) print("✅ Container is running and ready!") print("=" * 70) return True elif status.lower() in ["failed", "error", "cancelled"]: print("\n" + "=" * 70) print(f"❌ Container failed to start: {status}") print("=" * 70) return False time.sleep(self.poll_interval) # Timeout reached print("\n" + "=" * 70) print(f"⏱️ Timeout reached after {self.timeout_seconds}s") print("=" * 70) return False def main(): """Main entry point for CLI usage.""" project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT") agent_name = os.getenv("AGENT_NAME") agent_version = os.getenv("AGENT_VERSION") timeout = int(os.getenv("TIMEOUT_SECONDS", "600")) poll_interval = int(os.getenv("POLL_INTERVAL_SECONDS", "10")) if not all([project_endpoint, agent_name, agent_version]): print("❌ Error: Missing required environment variables") sys.exit(1) waiter = ContainerStatusWaiter( project_endpoint=project_endpoint, agent_name=agent_name, agent_version=agent_version, timeout_seconds=timeout, poll_interval=poll_interval, ) success = waiter.wait_for_container_running() sys.exit(0 if success else 1) if __name__ == "__main__": main() Key Features: REST API Polling: Uses Azure AI Foundry REST API to check container status Timeout Handling: Configurable timeout (default 10 minutes) Progress Tracking: Shows iteration count and elapsed time Replica Checking: Verifies all desired replicas are ready Clear Output: Emoji-enhanced status messages for easy reading Exit Codes: Returns 0 for success, 1 for failure (CI/CD friendly) Update workflow to include verification: Add this step after starting the agent: - name: Start the new agent version id: start_agent env: FOUNDRY_ACCOUNT: ${{ steps.foundry_details.outputs.FOUNDRY_ACCOUNT }} PROJECT_NAME: ${{ steps.foundry_details.outputs.PROJECT_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} run: | LATEST_VERSION=$(az cognitiveservices agent show \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --query "versions.latest.version" -o tsv) echo "AGENT_VERSION=$LATEST_VERSION" >> $GITHUB_OUTPUT az cognitiveservices agent start \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --agent-version $LATEST_VERSION - name: Wait for container to be ready env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} AGENT_NAME: ${{ secrets.AGENT_NAME }} AGENT_VERSION: ${{ steps.start_agent.outputs.AGENT_VERSION }} TIMEOUT_SECONDS: 600 POLL_INTERVAL_SECONDS: 15 run: | echo "⏳ Waiting for container to be ready..." python wait_for_container.py - name: Deployment Summary if: success() run: | echo "## Deployment Complete! 🚀" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "- **Agent**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: ✅ Container running and ready" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "### Deployment Timeline" >> $GITHUB_STEP_SUMMARY echo "1. ✅ Image built and pushed to ACR" >> $GITHUB_STEP_SUMMARY echo "2. ✅ Agent version registered" >> $GITHUB_STEP_SUMMARY echo "3. ✅ Container started" >> $GITHUB_STEP_SUMMARY echo "4. ✅ Container verified as running" >> $GITHUB_STEP_SUMMARY - name: Deployment Failed Summary if: failure() run: | echo "## Deployment Failed ❌" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "Please check the logs above for error details." >> $GITHUB_STEP_SUMMARY Benefits of Container Status Verification: Deployment Confidence: Know for certain that the container started successfully Early Failure Detection: Catch startup errors before users are affected CI/CD Gate: Pipeline only succeeds when container is actually ready Debugging Aid: Clear logs show container startup progress Timeout Protection: Prevents infinite waits with configurable timeout REST API Endpoints Used: GET {endpoint}/agents/{agent_name}/versions/{agent_version}/containers/default?api-version=2025-11-15-preview Response includes: status or state: Container state (Running, Failed, etc.) replicas.ready: Number of ready replicas replicas.desired: Target number of replicas error: Error details if failed Container States: Running/Ready: Container is operational InProgress: Container is starting up Failed/Error: Container failed to start Stopped: Container was stopped 7. Trigger Deployment # Automatic trigger - push to main git add . git commit -m "Update agent implementation" git push origin main # Manual trigger - via GitHub UI # Go to Actions → Deploy Agent to Azure AI Foundry → Run workflow Now this will trigger the Workflow as soon as you checkin the implementation code. You can play with the Agent in Foundry UI: Evaluation is now part the workflow You can also visualize the Evaluation in AI Foundry Best Practices for Production Agent LLMOps 1. Start with Simple Workflows, Add Complexity Gradually Don't build a complex multi-agent system on day one. Start with a single agent that does one task well. Once that's stable in production, add additional capabilities: Single agent with basic tool calling Add memory/state for multi-turn conversations Introduce specialized sub-agents for complex tasks Implement multi-agent collaboration This incremental approach reduces risk and enables learning from real usage before investing in advanced features. 2. Instrument Everything from Day One The worst time to add observability is after you have a production incident. Comprehensive instrumentation should be part of your initial development: Log every LLM call with inputs, outputs, token usage Track all tool invocations Record decision points in agent reasoning Capture timing metrics for every operation Log errors with full context After accumulating production data, you'll identify which metrics matter most. But you can't retroactively add logging for incidents that already occurred. 3. Build Evaluation into the Development Process Don't wait until deployment to evaluate agent quality. Integrate evaluation throughout development: Maintain a growing set of test conversations Run evaluations on every code change Track metrics over time to identify regressions Include diverse scenarios—happy path, edge cases, adversarial inputs Use LLM-as-judge for scalable automated evaluation, supplemented with periodic human review of sample outputs. 4. Embrace Non-Determinism, But Set Boundaries Agents are inherently non-deterministic, but that doesn't mean anything goes: Set acceptable ranges for variability in testing Use temperature and sampling controls to manage randomness Implement retry logic with exponential backoff Add fallback behaviors for when primary approaches fail Use assertions to verify critical invariants (e.g., "agent must never perform destructive actions without confirmation") 5. Prioritize Security and Governance from Day One Security shouldn't be an afterthought: Use managed identities and RBAC for all resource access Implement least-privilege principles—agents get only necessary permissions Add content filtering for inputs and outputs Monitor for prompt injection and jailbreak attempts Maintain audit logs for compliance Regularly review and update security policies 6. Design for Failure Your agents will fail. Design systems that degrade gracefully: Implement retry logic for transient failures Provide clear error messages to users Include fallback behaviors (e.g., escalate to human support) Never leave users stuck—always provide a path forward Log failures with full context for post-incident analysis 7. Balance Automation with Human Oversight Fully autonomous agents are powerful but risky. Consider human-in-the-loop workflows for high-stakes decisions: Draft responses that require approval before sending Request confirmation before executing destructive actions Escalate ambiguous situations to human operators Provide clear audit trails of agent actions 8. Manage Costs Proactively LLM API costs can escalate quickly at scale: Monitor token usage per conversation Set per-conversation token limits Use caching for repeated queries Choose appropriate models (not always the largest) Consider local models for suitable use cases Alert on cost anomalies that indicate runaway loops 9. Plan for Continuous Learning Agents should improve over time: Collect feedback on agent responses (thumbs up/down) Analyze conversations that required escalation Identify common failure patterns Fine-tune models on production interaction data (with appropriate consent) Iterate on prompts based on real usage Share learnings across the team 10. Document Everything Comprehensive documentation is critical as teams scale: Agent architecture and design decisions Tool configurations and API contracts Deployment procedures and runbooks Incident response procedures Version migration guides Evaluation methodologies Conclusion You now have a complete, production-ready AI agent deployed to Azure AI Foundry with: LangGraph-based agent orchestration Tool-calling capabilities Multi-turn conversation support Containerized deployment CI/CD automation Evaluation framework Multiple client implementations Key Takeaway LangGraph provides flexible agent orchestration with state management Azure AI Agent Server SDK simplifies deployment to Azure AI Foundry Custom state converter is critical for production deployments with tool calls CI/CD automation enables rapid iteration and deployment Evaluation framework ensures agent quality and performance Resources Azure AI Foundry Documentation LangGraph Documentation Azure AI Agent Server SDK OpenAI Responses API Thanks Manoranjan Rajguru https://www.linkedin.com/in/manoranjan-rajguru/2.2KViews2likes0Comments