azure ai foundry
100 TopicsThe Future of AI: Evaluating and optimizing custom RAG agents using Azure AI Foundry
This blog post explores best practices for evaluating and optimizing Retrieval-Augmented Generation (RAG) agents using Azure AI Foundry. It introduces the RAG triad metrics—Retrieval, Groundedness, and Relevance—and demonstrates how to apply them using Azure AI Search and agentic retrieval for custom agents. Readers will learn how to fine-tune search parameters, use end-to-end evaluation metrics and golden retrieval metrics like XDCG and Max Relevance, and leverage Azure AI Foundry tools to build trustworthy, high-performing AI agents.186Views0likes0CommentsJoin Us for a Technical Deep Dive and Q&A on Foundry Local - LLMs on device
Join us for an Ask Me Anything with the Foundry Local team on September 29th, 2025! Discover how Foundry Local is redefining edge AI with powerful features like on-device inference, enabling you to run models directly on your hardware, cutting costs and keeping your data secure. Whether you're customizing models to fit unique use cases or integrating seamlessly via SDKs, APIs, or CLI, Foundry Local offers scalable pathways to Azure AI Foundry as your needs evolve. It's the perfect solution for environments with limited connectivity, sensitive data requirements, low-latency demands, or early-stage experimentation before cloud deployment. If you're building smarter, leaner, and more private AI workflows, this AMA is your chance to dive deep with the team behind it all. What is Foundry Local? Foundry Local is a set of development tools designed to help you build and evaluate LLM applications on your local machine. It provides a curated collection of production-quality tools, including evaluation and prompt engineering capabilities, that are fully compatible with Azure AI. This allows for a seamless transition of your work from your local environment to the cloud. Don't miss this opportunity to connect with our experts and enhance your understanding of local LLM development. Foundry Local is an on-device AI inference solution offering performance, privacy, customization, and cost advantages. It integrates seamlessly into your existing workflows and applications through an intuitive CLI, SDK, and REST API. Key features On-Device Inference: Run models locally on your own hardware, reducing your costs while keeping all your data on your device. Model Customization: Select from preset models or use your own to meet specific requirements and use cases. Cost Efficiency: Eliminate recurring cloud service costs by using your existing hardware, making AI more accessible. Seamless Integration: Connect with your applications through an SDK, API endpoints, or the CLI, with easy scaling to Azure AI Foundry as your needs grow. How to Join: Register to Join the Azure AI Foundry Discord Community Event 29th Sept 2025 9am Pacific Time UTC−08:00 Unlock Accelerated Local LLM Development Discover how Foundry Local can enhance your development process and explore the possibilities for building robust LLM applications. Whether you're a seasoned AI developer or just getting started, this session is your chance to get hands-on insights into the innovative world of Azure AI Foundry. Event Highlights: An in-depth overview of the Foundry Local CLI and SDK. Interactive demo with step-by-step examples. Best practices for local AI Inference and models Transitioning your local development to cloud solutions or vice-versa Why Attend? Gain expert insights into Foundry Local, and ask questions about using Foundry Local Network with fellow AI professionals and developers. Enhance your AI development skills with practical examples. Stay at the forefront of LLM application development. Speakers Product Manager Foundry Local Maanav Dalal Product Manager |Foundry Local Microsoft Maanav Dalal is a PM on the AI Frameworks team. He's super inquisitive about the ways you use AI in daily life, so be encouraged to strike up a conversation with him about that. LinkedIn ProfileBuilding AI Apps with the Foundry Local C# SDK
What Is Foundry Local? Foundry Local is a lightweight runtime designed to run AI models directly on user devices. It supports a wide range of hardware (CPU, GPU, NPU) and provides a consistent developer experience across platforms. The SDKs are available in multiple languages, including Python, JavaScript, Rust, and now C#. Why a C# SDK? The C# SDK brings Foundry Local into the heart of the .NET ecosystem. It allows developers to: Download and manage models locally. Run inference using OpenAI-compatible APIs. Integrate seamlessly with existing .NET applications. This means you can build intelligent apps that run offline, reduce latency, and maintain data privacy—all without sacrificing developer productivity. Bootstrap Process: How the SDK Gets You Started One of the most developer-friendly aspects of the C# SDK is its automatic bootstrap process. Here's what happens under the hood when you initialise the SDK: Service Discovery and Startup The SDK automatically locates the Foundry Local installation on the device and starts the inference service if it's not already running. Model Download and Caching If the specified model isn't already cached locally, the SDK will download the most performant model variant (e.g. GPU, CPU, NPU) for the end user's hardware from the Foundry model catalog. This ensures you're always working with the latest optimised version. Model Loading into Inference Service Once downloaded (or retrieved from cache), the model is loaded into the Foundry Local inference engine, ready to serve requests. This streamlined process means developers can go from zero to inference with just a few lines of code—no manual setup or configuration required. Leverage Your Existing AI Stack One of the most exciting aspects of the Foundry Local C# SDK is its compatibility with popular AI tools such as: OpenAI SDK - Foundry local provides an OpenAI compliant chat completions (and embedding) API meaning. If you’re already using `OpenAI` chat completions API, you can reuse your existing code with minimal changes. Semantic Kernel - Foundry Local also integrates well with Semantic Kernel, Microsoft’s open-source framework for building AI agents. You can use Foundry Local models as plugins or endpoints within Semantic Kernel workflows—enabling advanced capabilities like memory, planning, and tool calling. Quick Start Example Follow these three steps: 1. Create a new project Create a new C# project and navigate to it: dotnet new console -n hello-foundry-local cd hello-foundry-local 2. Install NuGet packages Install the following NuGet packages into your project: dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0 dotnet add package OpenAI --version 2.2.0-beta.4 3. Use the OpenAI SDK with Foundry Local The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK. Copy-and-paste the following code into a C# file named Program.cs: using Microsoft.AI.Foundry.Local; using OpenAI; using OpenAI.Chat; using System.ClientModel; using System.Diagnostics.Metrics; var alias = "phi-3.5-mini"; var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias); var model = await manager.GetModelInfoAsync(aliasOrModelId: alias); ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey); OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions { Endpoint = manager.Endpoint }); var chatClient = client.GetChatClient(model?.ModelId); var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'"); Console.Write($"[ASSISTANT]: "); foreach (var completionUpdate in completionUpdates) { if (completionUpdate.ContentUpdate.Count > 0) { Console.Write(completionUpdate.ContentUpdate[0].Text); } } Run the code using the following command: dotnet run Final thoughts The Foundry Local C# SDK empowers developers to build intelligent, privacy-preserving applications that run anywhere. Whether you're working on desktop, mobile, or embedded systems, this SDK offers a robust and flexible way to bring AI closer to your users. Ready to get started? Dive into the official documentation: Getting started guide C# Reference documentation You can also make contributions to the C# SDK by creating a PR on GitHub: Foundry Local on GitHub199Views0likes0CommentsBuild Multi‑Agent AI Systems with Microsoft
Like many of you I have been on a journey to build AI systems where multiple agents (AI models with tools and autonomy) collaborate to solve complex tasks. In this post, I want to share the engineering challenges we faced, the architecture we designed with Azure AI Foundry, and the lessons learned along the way. Our goal is to empower AI engineers and developers to leverage multi-agent systems for real-world applications, with the benefit of Microsoft’s tools, research insights, and enterprise-grade platform. Why Multi‑Agent Systems? The Need for AI Teamwork Building a single AI agent to perform a task is often straightforward. However, many real-world processes are too complex for one agent alone. Tasks like in-depth research, enterprise workflow automation, or multi-step customer service involve context switching and specialized knowledge that overwhelm a lone chatbot. Multi-agent systems address this by distributing work across specialized agents while maintaining coordination. This approach brings several advantages: Scalability: Workloads can be split among agents, enabling horizontal scaling as tasks or data increase. More agents can handle more subtasks in parallel, avoiding bottlenecks. Specialisation: Each agent can be fine-tuned for a specific role or domain (e.g. research, summarisation, data extraction), which improves performance and maintainability. No single model has to be a master of all trades. Flexibility: Modular agents can be reused in different workflows or recombined to create new capabilities. It’s easy to extend the system by adding or swapping an agent without redesigning everything. Robustness: If one agent fails or underperforms, others can pick up the slack. Decoupling tasks means the overall system can tolerate faults better than a monolithic agent. This mirrors how human teams work: we achieve more by dividing and conquering complex problems. In fact, internal experiments and industry reports have shown that groups of AI agents can significantly outperform a single powerful model on complex, open-ended tasks. For example, Anthropic found a multi-agent system (Claude agents working together) answered 90% more queries correctly than a single-agent approach in one evaluation. The ability to operate in parallel is key – our experience likewise showed that multiple agents exploring different aspects of a problem can cover far more ground, albeit with increased resource usage. Challenge: A downside of multi-agent setups is they consume more resources (more model calls, more tokens) than single-agent runs. In Anthropic’s research, multi-agent systems used ~15× the tokens of a single chat session. We’ve observed similarly that letting agents think and interact in depth pays off in better results, but at a cost. Ensuring the task’s value justifies the cost is important when choosing a multi-agent solution. Designing the Architecture: Orchestration via a Lead Agent To harness these benefits, we designed a multi-agent architecture built around an orchestrator-worker pattern – very similar to Anthropic’s “lead agent and subagents” approach. In Azure AI Foundry (our enterprise AI platform), this takes shape as Connected Agents: a mechanism where a main agent can spawn and coordinate child agents to handle sub-tasks. The main agent is the brain of the operation, responsible for understanding the user’s request, breaking it into parts, and delegating those parts to the appropriate specialist agents. Each agent in the system is defined with three core components: Instructions (prompt/policy): defining the agent’s goal, role, and constraints (its “game plan”). Model: an LLM that powers the agent’s reasoning and dialogue (e.g. GPT-4 or other models available in Foundry). Tools: external capabilities the agent can invoke to get information or take actions (e.g. web search, databases, APIs). By composing agents with different instructions and tools, we create a team where each agent has a clear role. The main agent’s role is orchestration; the sub-agents focus on specific tasks. This separation of concerns makes the system easier to understand and debug, and prevents any single context window from becoming overloaded. How it works (overview): When a user query comes in, the lead agent analyzes the request and devises a plan. It may decide that multiple pieces of information or steps are needed. The lead agent then spins up subordinate agents in parallel to gather or compute those pieces]. Each sub-agent operates with its own context window and tools, exploring one aspect of the task. They report their findings back to the lead agent, which integrates the results and decides if more exploration is required. The loop continues until the lead agent is satisfied that it can produce a final answer, at which point it consolidates everything and returns the result to the user. This orchestrator/sub-agent pattern is powerful because it lets complex tasks be solved through natural language delegation rather than hard-coded logic. Notably, the main agent doesn’t need an if/else tree written by us to decide which sub-agent handles what; it uses the language model’s reasoning to route tasks. In Azure AI Foundry’s Connected Agents, the primary agent simply says (in effect) “You, Agent A, do X; You, Agent B, do Y,” and the platform handles the rest—no custom orchestration code needed. This drastically simplified our development: we focus on crafting the right prompts and agent designs, and let the AI figure out the coordination. Example: Sales Assistant with Specialist Agents To make this concrete, imagine a Sales Preparation Assistant that helps a sales team research a client before a meeting. Instead of trying to cram all knowledge and skills into one model, we give the assistant a team of four sub-agents, each an expert in a different area. The main agent (“Sales Assistant”) will ask each specialist for input and then compile a briefing. Agent Role Purpose & Task Example Tools/Models Used Market Research Agent Gathers industry trends and news related to the client’s sector. Bing Web Search, internal news API Competitive Analysis Agent Finds information on the client’s competitors and market position. Web Search, Company DB Customer Insights Agent Summarises the client’s history and interactions (from CRM data). Azure Cognitive Search on CRM, GPT-4 Financial Analysis Agent Reviews the client’s financial data and recent performance. Finance database query tool, Excel APIs Main Sales Assistant Orchestrator that delegates to the above agents, then synthesises a final report for the sales team. GPT-4 (with instructions to compile and format results) In this scenario, the Main Sales Assistant agent would ask each sub-agent to report on their specialty (market news, competition, CRM insights, finances). Rather than one AI trying to do it all (and possibly missing nuances), we have focused mini-AIs each doing a thorough job in parallel. This approach was shown to reduce the overall time required and improve the quality of the final output. In early trials, such multi-agent setups often succeed where single agents fall short – for instance, finding all relevant facts across disparate sources and preparing a comprehensive briefing more quickly. Development is easier too: if tomorrow we need to add a “Regulatory Compliance Agent” for a new client requirement, we can plug it in without retraining or heavily modifying the others. Orchestration under the hood: Azure AI Foundry’s Agent Service provides the runtime that makes all this work reliably. It manages the message passing between the main agent and sub-agents, ensures each tool invocation is executed (with retries on failure), and keeps a structured log of the entire multi-agent conversation (we call it a thread). This means developers don’t have to manually implement how agents call each other or share data; the platform handles those mechanics. Foundry also supports true agent-to-agent messaging if agents need to talk directly, but often a hierarchical pattern (through the main agent) suffices for task delegation. Tools, Knowledge, and the Model Context Protocol (MCP) For agents to be effective, especially in enterprise scenarios, they must integrate with external knowledge sources and services – no single LLM knows everything or can perform all actions. Microsoft’s approach emphasizes a rich tool integration layer. In our system, tools range from web search and databases to APIs for taking real actions (sending emails, executing workflows, etc.) Equipping agents with the right tools extends their capabilities dramatically: an agent can retrieve up-to-date info, pull data behind corporate firewalls, or trigger business processes. One key innovation is the Model-Context Protocol (MCP), which Foundry uses to manage tools. MCP provides a structured way for agents to discover and use tools dynamically at runtime. Traditionally, if you wanted your AI agent to use a new tool, you might have to hard-code that tool’s API and update the agent’s code or prompt. With MCP, tools are defined on a central tool server (with descriptions and endpoints), and agents can query this server to see what tools are available. The agent’s SDK then generates the necessary code “stubs” to call the tool on the fly. This means: Easier maintenance: You can add, update, or remove tools in one place (the MCP registry) without changing the agent’s code. When the Finance database API updates, just update its MCP entry; all agents automatically get the new version next time they run. Dynamic adaptability: Agents can choose tools based on context. For example, a research agent might discover that a new MarketAnalysisAPI tool is available and start using it for a finance query, whereas previously it only had a generic web search. Separation of concerns: Those building AI agents can rely on domain experts to maintain the tool definitions, while they focus on the agent logic. Agents treat tools in a uniform way, as functions they can call. In practice, tool selection became a critical part of our agent design. A lesson we learned is that giving agents access to the right tool, with a clear description, can make or break their performance. If a tool’s description is vague or overlapping with another, the agent might choose the wrong approach and wander down a blind alley. For instance, we saw cases where an agent would stubbornly query an internal knowledge base for information that actually only existed on the web, simply because the tool prompt made the web search sound less relevant. We addressed this by carefully curating tool descriptions and even building an internal tool-testing agent that automatically tries out tools and suggests better descriptions for them. Ensuring each tool had a distinct purpose and clear usage guidance dramatically improved our agents’ success rate in choosing the optimal tool for a given job. Finally, multi-modal support is worth noting. Some tasks involve not just text, but images or other media. Our multi-agent architecture, especially with Azure AI Foundry, can incorporate vision-capable models as agents or tools. For example, an “Image Analysis Agent” could be part of a team, or an agent might call a vision API tool. The Telco customer service demo (using Foundry + OpenAI Agent SDK) featured an agent that could handle image uploads (like an ID document) by invoking an image-processing function. The orchestration framework doesn’t fundamentally change with multi-modality, it simply treats the vision model as another specialist agent or tool in the conversation. The ability to plug in different AI skills (text, vision, search, etc.) under a unified agent system is a big advantage of Microsoft’s approach: the agent team becomes cross-functional, each member with their own modality or expertise, collectively solving richer tasks than any single foundation model could. Reliability, Safety, and Enterprise-Grade Engineering While the basic idea of agents chatting and calling tools is elegant, productionizing this system for enterprise use brought serious engineering challenges. We needed our multi-agent system to be reliable, controllable, and secure. Here are the key areas we focused on and how we addressed them: Observability and Debugging Multi-agent chains can be complex and non-deterministic each run might involve different paths as agents make choices. Early on, we realized that treating the system as a black box was untenable. Developers and operators must be able to observe what each agent is “thinking” and doing, or else diagnosing issues would be impossible. Azure AI Foundry’s Agent Service was built with full conversation traceability in mind. Every message between agents (and to the user), every tool invocation and result, is captured in a structured thread log. We integrated this with Azure Application Insights telemetry, so one can monitor performance, latencies, errors, and even token consumption of agents in real time. This tracing proved invaluable. For example, when a complex workflow wasn’t producing the expected outcome, we could replay the entire agent conversation step by step to see where things went awry. In one instance, we found that two sub-agents were given slightly overlapping responsibilities, causing them to waste time retrieving nearly identical information. The logs and message transcripts made this immediately clear, guiding us to tighten the role definitions. Moreover, because the system logs are structured (not just free-form text), we could build automatic analysis tools like checking how often an agent hits a retry or how many cycles a conversation goes through before completion – to spot anomalies. This kind of observability was something the open-source community also highlighted as crucial; in fact, Sematic Kernel, AutoGen frameworks introduce metrics tracking and message tracing for exactly this reason. We also developed visual debugging tools. One example is the AutoGen Studio (a low-code interface from Microsoft Research) which allows developers to visually inspect agent interactions in real time, pause agents, or adjust their behavior on the fly. This interactive approach accelerates the prompt-engineering loop: one can watch agents argue or collaborate live, and intervene if needed. Such capabilities turned out to be vital for understanding emergent behaviors in multi-agent setups. Coordination Complexity and State Management As more agents come into play, keeping them coordinated and preserving shared context is hard. Early versions of agents would sometimes spawn excessive numbers of agents or get stuck in loops. For instance, one of our prototypes (before we applied strict limits) ended up in a degenerate state where two agents kept handing control back and forth without making progress. This taught us to implement guardrails and smarter orchestration policies. In Azure AI Foundry, beyond the simple connected-agent pattern, we introduced a more structured orchestration capability called Multi-Agent Workflows. This lets developers explicitly define states, transitions, and triggers in a workflow that involves multiple agents. It’s like flowcharting the high-level process that the agents should follow, including how they pass data around. We use this for long-running or highly critical processes where you want extra determinism for example, an onboarding workflow might have clearly defined phases (Verification → Provisioning → Notification) each handled by different agents, and you want to ensure the process doesn’t derail. The workflow engine enforces that the system moves to the next state only when all agents in the current state have completed and certain conditions (triggers) are met. It also provides persistence: if the process needs to wait (say, for an external event or simply because it’s a lengthy task), the state is saved and can be resumed later without losing context. These workflow features were a response to reliability needs, they give fine-grained control and error recovery in multi-agent systems that operate over extended periods. In practice, we learned to use the simpler Connected Agents approach for quick, on-the-fly delegations (it’s amazingly capable with minimal setup), and reserve Workflow Orchestration for scenarios where we must guarantee a robust sequence over minutes, hours, or days. By having both options, we can strike a balance between flexibility and control as needed. Trust, Safety, and Governance When you let AI agents act autonomously (especially if they can use tools that modify data or interact with the real world), safety is paramount. From day one, our design included enterprise-grade safety measures: Content Filtering and Policy Enforcement: All AI outputs go through content filters to catch disallowed content or potential prompt injection attacks. The Foundry Agent Service has integrated guardrails so that even if an agent tries something risky (e.g., a tool returns a sensitive info that should not be shown), policies can prevent misuse or leakage. For example, we configured financial analysis agents with rules not to output certain PII or to stop if they detect a regulatory compliance issue, handing off to a human instead. Identity and Access Control: Agents operate with identities managed via Microsoft Entra ID (Azure AD). This means every action an agent takes can be attributed and audited. Role-Based Access Control (RBAC) is enforced: an agent only has access to the data and APIs its role permits. If an agent’s credentials are compromised or misused, Azure’s standard auditing can alert us. Essentially, agents are first-class service principals in our cloud stack. Network Isolation and Compliance: For enterprise deployments, Azure AI Foundry allows agents to run in isolated networks (so they can’t arbitrarily call external services unless allowed) and to use customer-managed storage and search indices. This addresses the data governance aspect, we can ensure an agent looking up internal documents only sees what it’s supposed to, and all data stays within compliant boundaries. Auditability: As mentioned earlier, every decision an agent makes (every tool it calls, every answer it gives) is recorded. This is crucial for trust, if a multi-agent system is making business decisions, we need to be able to explain and justify those decisions later. By retaining the full reasoning trace and sources, we make the system’s work transparent and auditable. In fact, our “Deep Research” agents output not just answers but also a log of how they arrived at that answer, including citations to source material for each claim. This level of detail is a must-have in regulated industries or any high-stakes use case. Overall, baking in trust and safety by design was a non-negotiable requirement. It does introduce some overhead – e.g., being strict about content filtering can sometimes stop an agent from a creative solution until we refine its prompt or the filter thresholds, but it’s worth it for the confidence it gives to deploy these agents at scale. Performance and Cost Considerations We touched on the resource cost of multi-agent systems. Another challenge was ensuring the system runs efficiently. Without care, adding agents can linearly increase cost and latency. We mitigated this in a few ways: Parallelism: We make agents run concurrently wherever possible. Our lead agents typically fire off multiple sub-agents at once rather than sequentially waiting for one then starting the next. Also, our agents themselves can issue parallel tool calls. In fact, we enabled some of our retrieval agents to batch multiple search queries and send them all at the same time. Anthropic reported that this kind of parallelism cut their research task times by up to 90%, and we’ve observed similar dramatic speed-ups. By doing in 1 minute what a single agent might take 10 minutes to do step-by-step, we make the approach far more practical. Of course, the flip side is hitting many APIs and LLM endpoints concurrently can spike usage costs; we carefully monitor usage and recommend multi-agent mode only when needed for the problem complexity. Scaling rules and agent limits: One lesson learned was to prevent “agent sprawl.” We devised guidelines (and encoded some in prompts) about how many sub-agents to use for a given task complexity. For simple fact queries, the main agent is encouraged to handle it alone or with at most one helper; for moderately complex tasks, maybe spin up 2–3; only truly complex projects get a dozen specialists. This avoids the situation where an overzealous orchestrator might launch an army of agents and overkill the problem. These limits were informed by experimentation and echo the principle of scaling effort to the problem size. Model selection: Multi-agent systems don’t always need the largest model for every agent. We often use a mix of model sizes to optimize cost. For instance, a straightforward data extraction agent might be powered by a cheaper GPT-3.5, while the synthesis agent uses GPT-4 for the final answer quality. Foundry makes it easy to deploy a range of model endpoints (including open-source Llama-based models) and each agent can pick the one best suited. We learned that using an expensive model for a simple sub-task is wasteful; a smaller model with the right tools can do the job just as well. This mix-and-match approach helped keep our compute costs in check without sacrificing outcome quality. Lessons Learned and Best Practices Building these multi-agent systems was an iterative learning process. Here are some of the key lessons and best practices that emerged, which we believe will be useful to anyone developing their own: Let’s expand on a couple of these points: Prompt engineering for multi-agent is different: We quickly discovered that writing prompts for a team of agents is an order of magnitude more complex than for a single chatbot. Not only do you have to get each agent’s behavior right, you must shape how they interact. One principle that served us well was: “Think like your agents.” When debugging, we’d often step through the conversation from each agent’s perspective, almost role-playing as them, to see why they might be doing something silly. If an agent was repeating another’s results, maybe our instructions were too vague and they didn’t realise that sub-task was already covered. The fix would be to clarify the division of labour in the lead agent’s prompt or introduce an ordering (e.g., Agent B only runs after Agent A’s info is in, etc.). Another principle: teach the orchestrator to delegate effectively. The main agent’s prompt now includes explicit guidance on how to break down tasks and how to phrase sub-agent assignments with plenty of detail. We learned that if the lead just says “Research topic X” to two different agents, they might both do the same thing. Now, the lead agent provides distinct objectives and context to each sub-agent (e.g., focus one on recent news, another on historical data, etc.). This reduced redundancy and missed coverage dramatically. Let the AI help improve itself: One delightful surprise was that large models can be quite good at analyzing and refining their own strategies when asked. We sometimes gave an agent a chance to critique its output or plan, essentially a self-reflection step. In other cases we had a “judge” agent evaluate the final answers against criteria (accuracy, completeness, etc.) These evaluations not only gave us a score for benchmarking changes, but the judge’s feedback (being an LLM) often highlighted exactly where an agent went off track or missed something. In a sense, we used one AI to tell us how to make another AI better. This kind of meta-prompting and self-correction became a powerful tool in our development cycle, allowing faster iteration without full human-in-the-loop at every turn. Know when to simplify: Not every problem needs a fleet of agents. A big lesson was to use the simplest approach that works. If a single agent with a smart prompt can handle a task reliably, that’s fine! We reserved multi-agent mode for when there was clear added value e.g., problems requiring parallel exploration, different expertise, or lengthy reasoning that benefits from splitting into parts. This discipline kept our systems leaner and easier to maintain. It also helped us explain the value to stakeholders: we could justify the complexity by pointing to concrete gains (like a task that went from 2 hours by a single high-end model to 10 minutes by a team of agents with better results). Conclusion and Next Steps Multi-agent AI systems have moved from intriguing research demos to practical, production-ready solutions. Our journey involved close collaboration between teams such as those who built open-source frameworks like AutoGen to experiment with multi-agent interactions) and the Azure AI product teams (who turned these concepts into the robust Azure AI Foundry Agent Service). Along the way, we learned how to orchestrate LLMs at scale, how to keep them in check, and how to squeeze the most value out of agent collaboration. Today, Azure AI Foundry’s Agents platform provides a unified environment to develop, test, and deploy multi-agent systems, complete with the orchestration, observability, and safety features to make them enterprise-ready. The public preview of features like Connected Agents and Deep Research (which is essentially an advanced research agent that uses the web + analysis in a multi-step process) is already enabling customers to build “AI teams” that tackle complex workflows. This is just the beginning. We’re continuing to improve the platform with feedback from developers: upcoming releases will further tighten integration with the broader Azure ecosystem (for example, more seamless use of Azure Cognitive Search, Excel as a tool, etc.), expand the library of pre-built agent templates in the Agent Catalog (so you can start with a solid example for common scenarios), and introduce more advanced coordination patterns inspired by real-world use cases. If you’re an AI engineer or developer eager to explore multi-agent systems, now is a great time to dive in. Here are some resources to get you started: Microsoft AI Agents for Beginners - Learn all about AI Agents with this FREE curricula Azure AI Foundry Documentation – Learn more about the Agent Service and how to configure agents, tools, and workflows. Microsoft Learn Modules – step-by-step tutorial to build a connected multi-agent solution (for example, a ticket triage system) using Azure AI Foundry Agent Service. This will walk you through setting up agents and using the SDK. Microsoft MCP for Beginners: Integrating MCP Tools – Another tutorial focused on the Model Context Protocol, showing how to enable dynamic tool discovery for your agents. Azure AI Foundry Agent Catalog – Browse a growing collection of open-sourced agent examples contributed by Microsoft and partners, covering scenarios from content compliance to manufacturing optimization. These samples are great starting points to see how multi-agent code is structured in real projects. Multi-agent systems represent a significant shift in how we conceptualise AI solutions: from single brilliant assistants to teams of specialised agents working in concert. The engineering journey hasn’t been easy we navigated challenges in coordination, built new tooling for control, and refined prompts endlessly. But the end result is a new class of AI applications that are more powerful, resilient, and tunable. We hope the insights shared here help you in your own journey to build with AI agents. We’re excited to see what you will create with these technologies. As we continue to push the frontier of agentic AI (both in research and in Azure), one thing is clear: many minds – human or AI – are often better than one. Happy building! Userful References Introducing Multi-Agent Orchestration in Foundry Agent Service – Build ... Building a multimodal, multi-agent system using Azure AI Agent Service ... How we built our multi-agent research system \ Anthropic What is Azure AI Foundry Agent Service? - Azure AI Foundry Multi-Agent Systems and MCP Tools Integration with Azure AI Foundry ... Introducing Deep Research in Azure AI Foundry Agent Service AutoGen v0.4: Advancing the development of agentic AI systemsBuild Enterprise-Ready AI Agents with the New Azure Postgres LangChain + LangGraph Connector
AI agents are only as powerful as the data layer behind them. That’s why we’re excited to announce native LangChain + LangGraph connector for Azure Database for PostgreSQL. With this release, Postgres becomes your single source of truth for AI agents, handling knowledge retrieval, chat history, and long-term memory all in one place. This new connector is packed with everything you need to build secure, scalable and enterprise-ready AI agents on Azure without the complexity. With EntraID authentication, DiskANN acceleration, vector store, and a dedicated agent store, you can go from prototype to production on Azure faster than ever. You can quickly get started with the LangChain + LangGraph connector today pip install langchain-azure-postgresql In this post, we’ll cover: How Azure Postgres connector for LangGraph can serve as the single persistence + retrieval layer for an AI agent New first-class connector for LangChain +LangGraph A practical example to help you get started Azure PostgreSQL as the single persistence + retrieval layer for an AI agent When building AI agents today, developers face a fragmented stack: Vector storage and search require a library, service or separate database. Chat history & short-term memory need yet another data source. Long-term memory often means bolting on yet another system. This sprawl leads to complex integrations, higher costs, and weaker security, making it hard to scale AI agents reliably. The Solution The new Azure Postgres connector for LangChain + LangGraph transforms your Azure Postgres database to the single persistence + retrieval layer for AI agents. Instead of working on a fragmented stack, developers can now: Run embeddings + semantic search with built-in DiskANN acceleration in the same database that powers their application logic. Persist chat history and short-term memory and keep agent conversations grounded via seamless context retrieval from data stored in Postgres. Capture, retrieve, and evolve knowledge over time with a built-in long-term memory without bolting on external systems. All in one database, simplified, secure, and enterprise ready. Postgres becomes the persistent and retrieval data layer for your AI agent. Built for Enterprise Readiness: LangChain + LangGraph Connector This release unlocks several new capabilities that make it easy to build robust, production-ready agents: Auth with EntraID: Enterprise-grade identity to securely connect LangChain + LangGraph workflows to Azure Database for PostgreSQL within a centrally managed security perimeter based on identity. DiskANN & Extensions: First-class support for faster vector search using pgvector combined with DiskANN indexing, enabling support for high-dimensional vectors and cost-efficient search. Additionally, helper functions ensure your favorite extensions are installed. Native Vector Store: Store and query embeddings, enabling semantic search and Retrieval-Augmented Generation (RAG) scenarios. Dedicated Agent Store: Persist agent state, memory, and chat history with structured access patterns, perfect for multi-turn conversations and long-term context. Together, these features give developers a turnkey persistence solution for building reliable AI agents without stitching together multiple storage systems. Using LangGraph on Azure Database for PostgreSQL Using LangGraph with Azure Database for PostgreSQL is easy. Enable the vector & pg_diskann Extension: Allowlist the vector and pg_diskann extension within your server configuration. Import LangChain + LangGraph connector pip install langchain-azure-postgresql pip install -qU langchain-openai pip install -qU azure-identity Login to Azure, to your Entra ID Run az login in your terminal, where you will also run the LangGraph code. az login To get started, you need to set up a production-ready vector store for your agent in a few lines of code. # 1. Auth: Securely connect to Azure Postgres connection_pool = AzurePGConnectionPool(azure_conn_info=ConnectionInfo(host=os.environ["PGHOST"])) #2. Create embeddings embeddings = AzureOpenAIEmbeddings(model="text-embedding-3-small") # 3. Initialize a vector store in Postgres with DiskANN vector_store = AzurePGVectorStore(connection=connection, embedding=embeddings) Use LangGraph to build a sample agent. Here’s a practical example that combines vector search and checkpointer inside Postgres: #4 Define the tool for data retrieval. def get_data_from_vector_store(query: str) -> str: """Get data from the vector store.""" results = vector_store.similarity_search(query) return results #5 Define the agent, checkpointer and memory store. with connection_pool.getconn() as conn: agent = create_react_agent( model=model, tools=[get_data_from_vector_store], checkpointer=PostgresSaver(conn) ) #6 Run the agent and print results config = {"configurable": {"thread_id": "1", "user_id": "1"}} response = agent.invoke( {"messages": [{"role": "user", "content": "What does my database say about cats? Make sure you address me with my name"}]}, config ) for msg in response["messages"][-2:]: msg.pretty_print() With just a few lines of code, you can: Uses the vector store backed by Postgres Enable DiskANN for semantic search Use checkpointers for short-term conversation history Learn More This is just the beginning. With native LangChain + LangGraph support in Azure PostgreSQL, developers can now rely on a single, secure, high-performance data layer for building the next generation of AI agents. 👉 Ready to start? All the code are available in the Azure Postgres Agents Demo GitHub repository. See how easy it is to bring your AI agent to life on Azure. 👉 Check out the docs for more details on the LangChain + LangGraph connector.Announcing Live Interpreter API - Now in Public Preview
Today, we’re excited to introduce Live Interpreter –a breakthrough new capability in Azure Speech Translation – that makes real-time, multilingual communication effortless. Live Interpreter continuously identifies the language being spoken without requiring you to set an input language and delivers low latency speech-to-speech translation in a natural voice that preserves the speaker’s style and tone.4.5KViews1like0CommentsModel Mondays S2E01 Recap: Advanced Reasoning Session
About Model Mondays Want to know what Reasoning models are and how you can build advanced reasoning scenarios like a Deep Research agent using Azure AI Foundry? Check out this recap from Model Mondays Season 2 Ep 1. Model Mondays is a weekly series to help you build your model IQ in three steps: 1. Catch the 5-min Highlights on Monday, to get up to speed on model news 2. Catch the 15-min Spotlight on Monday, for a deep-dive into a model or tool 3. Catch the 30-min AMA on Friday, for a Q&A session with subject matter experts Want to follow along? Register Here- to watch upcoming livestreams for Season 2 Visit The Forum- to see the full AMA schedule for Season 2 Register Here - to join the AMA on Friday Jun 20 Spotlight On: Advanced Reasoning This week, the Model Mondays spotlight was on Advanced Reasoning with subject matter expert Marlene Mhangami. In this blog post, I'll talk about my five takeaways from this episode: Why Are Reasoning Models Important? What Is an Advanced Reasoning Scenario? How Can I Get Started with Reasoning Models ? Spotlight: My Aha Moment Highlights: What’s New in Azure AI 1. Why Are Reasoning Models Important? In today's fast-evolving AI landscape, it's no longer enough for models to just complete text or summarize content. We need AI that can: Understand multi-step tasks Make decisions based on logic Plan sequences of actions or queries Connect context across turns Reasoning models are large language models (LLMs) trained with reinforcement learning techniques to "think" before they answer. Rather than simply generating a response based on probability, these models follow an internal thought process producing a chain of reasoning before responding. This makes them ideal for complex problem-solving tasks. And they’re the foundation of building intelligent, context-aware agents. They enable next-gen AI workflows in everything from customer support to legal research and healthcare diagnostics. Reason: They allow AI to go beyond surface-level response and deliver solutions that reflect understanding, not just language patterning. 2. What does Advanced Reasoning involve? An advanced reasoning scenario is one where a model: Breaks a complex prompt into smaller steps Retrieves relevant external data Uses logic to connect dots Outputs a structured, reasoned answer Example: A user asks: What are the financial and operational risks of expanding a startup to Southeast Asia in 2025? This is the kind of question that requires extensive research and analysis. A reasoning model might tackle this by: Retrieving reports on Southeast Asia market conditions Breaking down risks into financial, political, and operational buckets Cross-referencing data with recent trends Returning a reasoned, multi-part answer 3. How Can I Get Started with Reasoning Models? To get started, you need to visit a catalog that has examples of these models. Try the GitHub Models Marketplace and look for the reasoning category in the filter. Try the Azure AI Foundry model catalog and look for reasoning models by name. Example: The o-series of models from Azure Open AI The DeepSeek-R1 models The Grok 3 models The Phi-4 reasoning models Next, you can use SDKs or Playground for exploring the model capabiliies. 1. Try Lab 331 - for a beginner-friendly guide. 2. Try Lab 333 - for an advanced project. 3. Try the GitHub Model Playground - to compare reasoning and GPT models. 4. Try the Deep Research Agent using LangChain - sample as a great starting project. Have questions or comments? Join the Friday AMA on Azure AI Foundry Discord: 4. Spotlight: My Aha Moment Before this session, I thought reasoning meant longer or more detailed responses. But this session helped me realize that reasoning means structured thinking — models now plan, retrieve, and respond with logic. This inspired me to think about building AI agents that go beyond chat and actually assist users like a teammate. It also made me want to dive deeper into LangChain + Azure AI workflows to build mini-agents for real-world use. 5. Highlights: What’s New in Azure AI Here’s what’s new in the Azure AI Foundry: Direct From Azure Models - Try hosted models like OpenAI GPT on PTU plans SORA Video Playground - Generate video from prompts via SORA models Grok 3 Models - Now available for secure, scalable LLM experiences DeepSeek R1-0528 - A reasoning-optimized, Microsoft-tuned open-source model These are all available in the Azure Model Catalog and can be tried with your Azure account. Did You Know? Your first step is to find the right model for your task. But what if you could have the model automatically selected for you_ based on the prompt you provide? That's the magic of Model Router a deployable AI chat model that dynamically selects the best LLM based on your prompt. Instead of choosing one model manually, the Router makes that choice in real time. Currently, this works with a fixed set of Azure OpenAI models, including a reasoning model option. Keep an eye on the documentation for more updates. Why it’s powerful: Saves cost by switching between models based on complexity Optimizes performance by selecting the right model for the task Lets you test and compare model outputs quickly Try it out in Azure AI Foundry or read more in the Model Catalog Coming Up Next Next week, we dive into Model Context Protocol, an open protocol that empowers agentic AI applications by making it easier to discover and integrate knowledge and action tools with your model choices. Register Here to get reminded - and join us live on Monday! Join The Community Great devs don't build alone! In a fast-pased developer ecosystem, there's no time to hunt for help. That's why we have the Azure AI Developer Community. Join us today and let's journey together! Join the Discord - for real-time chats, events & learning Explore the Forum - for AMA recaps, Q&A, and help! About Me. I'm Sharda, a Gold Microsoft Learn Student Ambassador interested in cloud and AI. Find me on Github, Dev.to,, Tech Community and Linkedin. In this blog series I have summarizef my takeaways from this week's Model Mondays livestream .365Views0likes0CommentsModel Mondays S2:E3 Understanding SLMs and Reasoning with Mojan Javaheripi
This week in Model Mondays, we focus on Small Language Models (SLMs) and Reasoning — and learn how reasoning models leverage inference-time scaling to execute complex tasks, but how can we use these in resource-constrained devices? Read on for my recap of Mojan Javaheripi's insights on Phi-4 reasoning models that are redefining small language models (SLM) for the agentic era of apps. About Model Mondays Model Mondays is a weekly series designed to help you build your Azure AI Foundry Model IQ step by step. Here's how it works: 5-Minute Highlights – Quick news and updates about Azure AI models and tools on Monday 15-Minute Spotlight – Deep dive into a key model, protocol, or feature on Monday 30-Minute AMA on Friday – Live Q&A with subject matter experts from Monday livestream If you want to grow your skills with the latest in AI model development, Model Mondays is the place to start. Want to follow along? Register Here - to watch upcoming Model Monday livestreams Watch Playlists to replay past Model Monday episodes Register Here- to join the AMA on SLMs and Reasoning on Friday Jul 03 Visit The Forum - to view Foundry Friday AMAs and recaps This post was generated with AI help and human revision & review. To learn more about our motivation and workflows, please refer to this document in our website. We are continuing to experiment with ideas here - feedback is welcome! Just drop us a comment and let us know! Spotlight On: SLMs and Reasoning Missed watching the livestream? Catch up on the episode below - and visit https://aka.ms/model-mondays/playlist to catch up on all the previous episodes in the series. And check out the Discussion Forum post here for all the resources and updates from the AMA and more. 1. What is this topic and why is it important? Small Language Models (SLMs) like Phi-4 represent a breakthrough in making advanced reasoning capabilities accessible on resource-constrained devices. While large language models require massive computational resources, SLMs can deliver sophisticated reasoning while running on edge devices, mobile phones, and local hardware. This is crucial because it democratizes AI access, reduces latency, and enables privacy-preserving applications where data doesn't need to leave the device. Reasoning models use inference-time scaling, meaning they can "think" longer about complex problems to arrive at better solutions. Phi-4 specifically excels at mathematical reasoning, code generation, and logical problem-solving while maintaining a smaller footprint than traditional large models. 2. What is one key takeaway from the episode? The key insight is that Phi-4 proves that model size doesn't always correlate with reasoning capability. Through advanced training techniques and architectural improvements, SLMs can achieve reasoning performance that rivals much larger models while being practical for deployment in real-world, resource-constrained environments. This opens up entirely new possibilities for agentic applications that can run locally and respond quickly. 3. How can I get started? To get started with SLMs and reasoning: 1. Explore Phi-4 on Azure AI Foundry Model Catalog 2. Try the reasoning capabilities in Azure AI Foundry Playground 3. Download and experiment with Phi-4 for local development 4. Check out the sample applications and use cases in the Azure AI Foundry documentation What's new in Azure AI Foundry? Azure AI Foundry continues to evolve to support the growing ecosystem of Small Language Models and agentic apps. Recently, new capabilities have been added to make it easier to fine-tune SLMs like Phi-4 directly in Azure AI Studio. Updates include: Enhanced Model Catalog: Easier discovery of SLMs, reasoning models, and multi-modal models. Improved Prompt Flow Integration: Now with templates specifically designed for SLM-based reasoning tasks. New Evaluation Tools: Built-in model comparison dashboards to quickly test reasoning performance across different SLM variants. Edge Deployment Support: Simplified workflows for packaging and deploying SLMs to local devices and edge environments. Want to get a summary of ALL the news from Jun 2025? Just visit this post on Azure AI Foundry Discussions Forum for all the links! My A-Ha Moment The biggest Aha moment for me was realizing that a model doesn’t need to be huge to be smart. Phi-4 proved that small models can actually handle complex reasoning tasks just like big models. What really clicked for me: You don’t need heavy GPUs or cloud servers. These models can run on mobile phones, edge devices, or small local machines. Your data stays on your device, which is great for privacy and faster responses. It’s a total game changer because now we can build intelligent apps that work even on low-resource devices. Coming up Next Week Next week, we dive into AI Developer Experiences with Leo Yao. We'll explore how to streamline the AI developer journey from model selection and usage to evaluation and app deployment using the AI Toolkit and Azure AI Foundry extensions for Visual Studio Code. Discover the key capabilities they provide for generative AI app & agent development. Register Here! to be notified - then watch live on YouTube below. Join The Community Great devs don't build alone! In a fast-pased developer ecosystem, there's no time to hunt for help. That's why we have the Azure AI Developer Community. Join us today and let's journey together! 1. Join the Discord - for real-time chats, events & learning 2. Explore the Forum - for AMA recaps, Q&A, and help! About Me: I'm Sharda, a Gold Microsoft Learn Student Ambassador interested in cloud and AI. Find me on Github, Dev.to, Tech Community and Linkedin. In this blog series I have summarized my takeaways from this week's Model Mondays livestream.202Views0likes0CommentsModel Mondays S2:E6 Understanding Research & Innovation with SeokJin Han and Saumil Shrivastava
In this week's blog post, we dive into the cutting-edge research happening at Azure AI Foundry Labs. From the MCP Server that makes it easy to experiment with new models and tools, to Magentic-UI that brings human-centered agent workflows to life, there’s a lot to unpack!131Views0likes0CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.846Views2likes1Comment