llm
27 TopicsIntroducing Azure AI Travel Agents: A Flagship MCP-Powered Sample for AI Travel Solutions
We are excited to introduce AI Travel Agents, a sample application with enterprise functionality that demonstrates how developers can coordinate multiple AI agents (written in multiple languages) to explore travel planning scenarios. It's built with LlamaIndex.TS for agent orchestration, Model Context Protocol (MCP) for structured tool interactions, and Azure Container Apps for scalable deployment. TL;DR: Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. NOTE: This example uses mock data and is intended for demonstration purposes rather than production use. The Challenge: Scaling Personalized Travel Planning Travel agencies grapple with complex tasks: analyzing diverse customer needs, recommending destinations, and crafting itineraries, all while integrating real-time data like trending spots or logistics. Traditional systems falter with latency, scalability, and coordination, leading to delays and frustrated clients. The AI Travel Agents tackles these issues with a technical trifecta: LlamaIndex.TS orchestrates six AI agents for efficient task handling. MCP equips agents with travel-specific data and tools. Azure Container Apps ensures scalable, serverless deployment. This architecture delivers operational efficiency and personalized service at scale, transforming chaos into opportunity. LlamaIndex.TS: Orchestrating AI Agents The heart of The AI Travel Agents is LlamaIndex.TS, a powerful agentic framework that orchestrates multiple AI agents to handle travel planning tasks. Built on a Node.js backend, LlamaIndex.TS manages agent interactions in a seamless and intelligent manner: Task Delegation: The Triage Agent analyzes queries and routes them to specialized agents, like the Itinerary Planning Agent, ensuring efficient workflows. Agent Coordination: LlamaIndex.TS maintains context across interactions, enabling coherent responses for complex queries, such as multi-city trip plans. LLM Integration: Connects to Azure OpenAI, GitHub Models or any local LLM using Foundy Local for advanced AI capabilities. LlamaIndex.TS’s modular design supports extensibility, allowing new agents to be added with ease. LlamaIndex.TS is the conductor, ensuring agents work in sync to deliver accurate, timely results. Its lightweight orchestration minimizes latency, making it ideal for real-time applications. MCP: Fueling Agents with Data and Tools The Model Context Protocol (MCP) empowers AI agents by providing travel-specific data and tools, enhancing their functionality. MCP acts as a data and tool hub: Real-Time Data: Supplies up-to-date travel information, such as trending destinations or seasonal events, via the Web Search Agent using Bing Search. Tool Access: Connects agents to external tools, like the .NET-based customer queries analyzer for sentiment analysis, the Python-based itinerary planning for trip schedules or destination recommendation tools written in Java. For example, when the Destination Recommendation Agent needs current travel trends, MCP delivers via the Web Search Agent. This modularity allows new tools to be integrated seamlessly, future-proofing the platform. MCP’s role is to enrich agent capabilities, leaving orchestration to LlamaIndex.TS. Azure Container Apps: Scalability and Resilience Azure Container Apps powers The AI Travel Agents sample application with a serverless, scalable platform for deploying microservices. It ensures the application handles varying workloads with ease: Dynamic Scaling: Automatically adjusts container instances based on demand, managing booking surges without downtime. Polyglot Microservices: Supports .NET (Customer Query), Python (Itinerary Planning), Java (Destination Recommandation) and Node.js services in isolated containers. Observability: Integrates tracing, metrics, and logging enabling real-time monitoring. Serverless Efficiency: Abstracts infrastructure, reducing costs and accelerating deployment. Azure Container Apps' global infrastructure delivers low-latency performance, critical for travel agencies serving clients worldwide. The AI Agents: A Quick Look While MCP and Azure Container Apps are the stars, they support a team of multiple AI agents that drive the application’s functionality. Built and orchestrated with Llamaindex.TS via MCP, these agents collaborate to handle travel planning tasks: Triage Agent: Directs queries to the right agent, leveraging MCP for task delegation. Customer Query Agent: Analyzes customer needs (emotions, intents), using .NET tools. Destination Recommendation Agent: Suggests tailored destinations, using Java. Itinerary Planning Agent: Crafts efficient itineraries, powered by Python. Web Search Agent: Fetches real-time data via Bing Search. These agents rely on MCP’s real-time communication and Azure Container Apps’ scalability to deliver responsive, accurate results. It's worth noting though this sample application uses mock data for demonstration purpose. In real worl scenario, the application would communicate with an MCP server that is plugged in a real production travel API. Key Features and Benefits The AI Travel Agents offers features that showcase the power of MCP and Azure Container Apps: Real-Time Chat: A responsive Angular UI streams agent responses via MCP’s SSE, ensuring fluid interactions. Modular Tools: MCP enables tools like analyze_customer_query to integrate seamlessly, supporting future additions. Scalable Performance: Azure Container Apps ensures the UI, backend and the MCP servers handle high traffic effortlessly. Transparent Debugging: An accordion UI displays agent reasoning providing backend insights. Benefits: Efficiency: LlamaIndex.TS streamlines operations. Personalization: MCP’s data drives tailored recommendations. Scalability: Azure ensures reliability at scale. Thank You to Our Contributors! The AI Travel Agents wouldn’t exist without the incredible work of our contributors. Their expertise in MCP development, Azure deployment, and AI orchestration brought this project to life. A special shoutout to: Pamela Fox – Leading the developement of the Python MCP server. Aaron Powell and Justin Yoo – Leading the developement of the .NET MCP server. Rory Preddy – Leading the developement of the Java MCP server. Lee Stott and Kinfey Lo – Leading the developement of the Local AI Foundry Anthony Chu and Vyom Nagrani – Leading Azure Container Apps roadmap Matt Soucoup and Julien Dubois – Leading the ACA DevRel strategy Wassim Chegham – Architected MCP and backend orchestration. And many more! See the GitHub repository for all contributors. Thank you for your dedication to pushing the boundaries of AI and cloud technology! Try It Out Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Conclusion Developers can explore today the open-source project on GitHub, with setup and deployment instructions. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. This is still a work in progress and we also welcome all kind of contributions. Please fork and star the repo to stay tuned for updates! ◾️We would love your feedback and continue the discussion in the Azure AI Foundry Discord aka.ms/foundry/discord On behalf of Microsoft DevRel Team.Swagger Auto-Generation on MCP Server
Would you like to generate a swagger.json directly on an MCP server on-the-fly? In many use cases, using remote MCP servers is not uncommon. In particular, if you're using Azure API Management (APIM), Azure API Center (APIC) or Copilot Studio in Power Platform, integrating with remote MCP servers is inevitable.Introducing the Microsoft Agent Framework
Introducing the Microsoft Agent Framework: A Unified Foundation for AI Agents and Workflows The landscape of AI development is evolving rapidly, and Microsoft is at the forefront with the release of the Microsoft Agent Framework an open-source SDK designed to empower developers to build intelligent, multi-agent systems with ease and precision. Whether you're working in .NET or Python, this framework offers a unified, extensible foundation that merges the best of Semantic Kernel and AutoGen, while introducing powerful new capabilities for agent orchestration and workflow design. Introducing Microsoft Agent Framework: The Open-Source Engine for Agentic AI Apps | Azure AI Foundry Blog Introducing Microsoft Agent Framework | Microsoft Azure Blog Why Another Agent Framework? Both Semantic Kernel and AutoGen have pioneered agentic development, Semantic Kernel with its enterprise-grade features and AutoGen with its research-driven abstractions. The Microsoft Agent Framework is the next generation of both, built by the same teams to unify their strengths: AutoGen’s simplicity in multi-agent orchestration. Semantic Kernel’s robustness in thread-based state management, telemetry, and type safety. New capabilities like graph-based workflows, checkpointing, and human-in-the-loop support This convergence means developers no longer have to choose between experimentation and production. The Agent Framework is designed to scale from single-agent prototypes to complex, enterprise-ready systems Core Capabilities AI Agents AI agents are autonomous entities powered by LLMs that can process user inputs, make decisions, call tools and MCP servers, and generate responses. They support providers like Azure OpenAI, OpenAI, and Azure AI, and can be enhanced with: Agent threads for state management. Context providers for memory. Middleware for action interception. MCP clients for tool integration Use cases include customer support, education, code generation, research assistance, and more—especially where tasks are dynamic and underspecified. Workflows Workflows are graph-based orchestrations that connect multiple agents and functions to perform complex, multi-step tasks. They support: Type-based routing Conditional logic Checkpointing Human-in-the-loop interactions Multi-agent orchestration patterns (sequential, concurrent, hand-off, Magentic) Workflows are ideal for structured, long-running processes that require reliability and modularity. Developer Experience The Agent Framework is designed to be intuitive and powerful: Installation: Python: pip install agent-framework .NET: dotnet add package Microsoft.Agents.AI Integration: Works with Foundry SDK, MCP SDK, A2A SDK, and M365 Copilot Agents Samples and Manifests: Explore declarative agent manifests and code samples Learning Resources: Microsoft Learn modules AI Agents for Beginners AI Show demos Azure AI Foundry Discord community Migration and Compatibility If you're currently using Semantic Kernel or AutoGen, migration guides are available to help you transition smoothly. The framework is designed to be backward-compatible where possible, and future updates will continue to support community contributions via the GitHub repository. Important Considerations The Agent Framework is in public preview. Feedback and issues are welcome on the GitHub repository. When integrating with third-party servers or agents, review data sharing practices and compliance boundaries carefully. The Microsoft Agent Framework marks a pivotal moment in AI development, bringing together research innovation and enterprise readiness into a single, open-source foundation. Whether you're building your first agent or orchestrating a fleet of them, this framework gives you the tools to do it safely, scalably, and intelligently. Ready to get started? Download the SDK, explore the documentation, and join the community shaping the future of AI agents.Week 2 . Microsoft Agents Hack Online Events and Readiness Resources
https://aka.ms/agentshack 2025 is the year of AI agents! But what exactly is an agent, and how can you build one? Whether you're a seasoned developer or just starting out, this FREE three-week virtual hackathon is your chance to dive deep into AI agent development. Register Now: https://aka.ms/agentshack 🔥 Learn from expert-led sessions streamed live on YouTube, covering top frameworks like Semantic Kernel, Autogen, the new Azure AI Agents SDK and the Microsoft 365 Agents SDK. Week 2 Events: April 14th-18th Day/Time Topic Track 4/14 08:00 AM PT Building custom engine agents with Azure AI Foundry and Visual Studio Code Copilots 4/15 07:00 AM PT Your first AI Agent in JS with Azure AI Agent Service JS 4/15 09:00 AM PT Building Agentic Applications with AutoGen v0.4 Python 4/15 12:00 PM PT AI Agents + .NET Aspire C# 4/15 03:00 PM PT Prototyping AI Agents with GitHub Models Python 4/16 04:00 AM PT Multi-agent AI apps with Semantic Kernel and Azure Cosmos DB C# 4/16 06:00 AM PT Building declarative agents with Microsoft Copilot Studio & Teams Toolkit Copilots 4/16 07:00 AM PT Prompting is the New Scripting: Meet GenAIScript JS 4/16 09:00 AM PT Building agents with an army of models from the Azure AI model catalog Python 4/16 12:00 PM PT Multi-Agent API with LangGraph and Azure Cosmos DB Python 4/16 03:00 PM PT Mastering Agentic RAG Python 4/17 06:00 AM PT Build your own agent with OpenAI, .NET, and Copilot Studio C# 4/17 09:00 AM PT Building smarter Python AI agents with code interpreters Python 4/17 12:00 PM PT Building Java AI Agents using LangChain4j and Dynamic Sessions Java 4/17 03:00 PM PT Agentic Voice Mode Unplugged Python1.3KViews0likes0CommentsBuild Multi‑Agent AI Systems with Microsoft
Like many of you I have been on a journey to build AI systems where multiple agents (AI models with tools and autonomy) collaborate to solve complex tasks. In this post, I want to share the engineering challenges we faced, the architecture we designed with Azure AI Foundry, and the lessons learned along the way. Our goal is to empower AI engineers and developers to leverage multi-agent systems for real-world applications, with the benefit of Microsoft’s tools, research insights, and enterprise-grade platform. Why Multi‑Agent Systems? The Need for AI Teamwork Building a single AI agent to perform a task is often straightforward. However, many real-world processes are too complex for one agent alone. Tasks like in-depth research, enterprise workflow automation, or multi-step customer service involve context switching and specialized knowledge that overwhelm a lone chatbot. Multi-agent systems address this by distributing work across specialized agents while maintaining coordination. This approach brings several advantages: Scalability: Workloads can be split among agents, enabling horizontal scaling as tasks or data increase. More agents can handle more subtasks in parallel, avoiding bottlenecks. Specialisation: Each agent can be fine-tuned for a specific role or domain (e.g. research, summarisation, data extraction), which improves performance and maintainability. No single model has to be a master of all trades. Flexibility: Modular agents can be reused in different workflows or recombined to create new capabilities. It’s easy to extend the system by adding or swapping an agent without redesigning everything. Robustness: If one agent fails or underperforms, others can pick up the slack. Decoupling tasks means the overall system can tolerate faults better than a monolithic agent. This mirrors how human teams work: we achieve more by dividing and conquering complex problems. In fact, internal experiments and industry reports have shown that groups of AI agents can significantly outperform a single powerful model on complex, open-ended tasks. For example, Anthropic found a multi-agent system (Claude agents working together) answered 90% more queries correctly than a single-agent approach in one evaluation. The ability to operate in parallel is key – our experience likewise showed that multiple agents exploring different aspects of a problem can cover far more ground, albeit with increased resource usage. Challenge: A downside of multi-agent setups is they consume more resources (more model calls, more tokens) than single-agent runs. In Anthropic’s research, multi-agent systems used ~15× the tokens of a single chat session. We’ve observed similarly that letting agents think and interact in depth pays off in better results, but at a cost. Ensuring the task’s value justifies the cost is important when choosing a multi-agent solution. Designing the Architecture: Orchestration via a Lead Agent To harness these benefits, we designed a multi-agent architecture built around an orchestrator-worker pattern – very similar to Anthropic’s “lead agent and subagents” approach. In Azure AI Foundry (our enterprise AI platform), this takes shape as Connected Agents: a mechanism where a main agent can spawn and coordinate child agents to handle sub-tasks. The main agent is the brain of the operation, responsible for understanding the user’s request, breaking it into parts, and delegating those parts to the appropriate specialist agents. Each agent in the system is defined with three core components: Instructions (prompt/policy): defining the agent’s goal, role, and constraints (its “game plan”). Model: an LLM that powers the agent’s reasoning and dialogue (e.g. GPT-4 or other models available in Foundry). Tools: external capabilities the agent can invoke to get information or take actions (e.g. web search, databases, APIs). By composing agents with different instructions and tools, we create a team where each agent has a clear role. The main agent’s role is orchestration; the sub-agents focus on specific tasks. This separation of concerns makes the system easier to understand and debug, and prevents any single context window from becoming overloaded. How it works (overview): When a user query comes in, the lead agent analyzes the request and devises a plan. It may decide that multiple pieces of information or steps are needed. The lead agent then spins up subordinate agents in parallel to gather or compute those pieces]. Each sub-agent operates with its own context window and tools, exploring one aspect of the task. They report their findings back to the lead agent, which integrates the results and decides if more exploration is required. The loop continues until the lead agent is satisfied that it can produce a final answer, at which point it consolidates everything and returns the result to the user. This orchestrator/sub-agent pattern is powerful because it lets complex tasks be solved through natural language delegation rather than hard-coded logic. Notably, the main agent doesn’t need an if/else tree written by us to decide which sub-agent handles what; it uses the language model’s reasoning to route tasks. In Azure AI Foundry’s Connected Agents, the primary agent simply says (in effect) “You, Agent A, do X; You, Agent B, do Y,” and the platform handles the rest—no custom orchestration code needed. This drastically simplified our development: we focus on crafting the right prompts and agent designs, and let the AI figure out the coordination. Example: Sales Assistant with Specialist Agents To make this concrete, imagine a Sales Preparation Assistant that helps a sales team research a client before a meeting. Instead of trying to cram all knowledge and skills into one model, we give the assistant a team of four sub-agents, each an expert in a different area. The main agent (“Sales Assistant”) will ask each specialist for input and then compile a briefing. Agent Role Purpose & Task Example Tools/Models Used Market Research Agent Gathers industry trends and news related to the client’s sector. Bing Web Search, internal news API Competitive Analysis Agent Finds information on the client’s competitors and market position. Web Search, Company DB Customer Insights Agent Summarises the client’s history and interactions (from CRM data). Azure Cognitive Search on CRM, GPT-4 Financial Analysis Agent Reviews the client’s financial data and recent performance. Finance database query tool, Excel APIs Main Sales Assistant Orchestrator that delegates to the above agents, then synthesises a final report for the sales team. GPT-4 (with instructions to compile and format results) In this scenario, the Main Sales Assistant agent would ask each sub-agent to report on their specialty (market news, competition, CRM insights, finances). Rather than one AI trying to do it all (and possibly missing nuances), we have focused mini-AIs each doing a thorough job in parallel. This approach was shown to reduce the overall time required and improve the quality of the final output. In early trials, such multi-agent setups often succeed where single agents fall short – for instance, finding all relevant facts across disparate sources and preparing a comprehensive briefing more quickly. Development is easier too: if tomorrow we need to add a “Regulatory Compliance Agent” for a new client requirement, we can plug it in without retraining or heavily modifying the others. Orchestration under the hood: Azure AI Foundry’s Agent Service provides the runtime that makes all this work reliably. It manages the message passing between the main agent and sub-agents, ensures each tool invocation is executed (with retries on failure), and keeps a structured log of the entire multi-agent conversation (we call it a thread). This means developers don’t have to manually implement how agents call each other or share data; the platform handles those mechanics. Foundry also supports true agent-to-agent messaging if agents need to talk directly, but often a hierarchical pattern (through the main agent) suffices for task delegation. Tools, Knowledge, and the Model Context Protocol (MCP) For agents to be effective, especially in enterprise scenarios, they must integrate with external knowledge sources and services – no single LLM knows everything or can perform all actions. Microsoft’s approach emphasizes a rich tool integration layer. In our system, tools range from web search and databases to APIs for taking real actions (sending emails, executing workflows, etc.) Equipping agents with the right tools extends their capabilities dramatically: an agent can retrieve up-to-date info, pull data behind corporate firewalls, or trigger business processes. One key innovation is the Model-Context Protocol (MCP), which Foundry uses to manage tools. MCP provides a structured way for agents to discover and use tools dynamically at runtime. Traditionally, if you wanted your AI agent to use a new tool, you might have to hard-code that tool’s API and update the agent’s code or prompt. With MCP, tools are defined on a central tool server (with descriptions and endpoints), and agents can query this server to see what tools are available. The agent’s SDK then generates the necessary code “stubs” to call the tool on the fly. This means: Easier maintenance: You can add, update, or remove tools in one place (the MCP registry) without changing the agent’s code. When the Finance database API updates, just update its MCP entry; all agents automatically get the new version next time they run. Dynamic adaptability: Agents can choose tools based on context. For example, a research agent might discover that a new MarketAnalysisAPI tool is available and start using it for a finance query, whereas previously it only had a generic web search. Separation of concerns: Those building AI agents can rely on domain experts to maintain the tool definitions, while they focus on the agent logic. Agents treat tools in a uniform way, as functions they can call. In practice, tool selection became a critical part of our agent design. A lesson we learned is that giving agents access to the right tool, with a clear description, can make or break their performance. If a tool’s description is vague or overlapping with another, the agent might choose the wrong approach and wander down a blind alley. For instance, we saw cases where an agent would stubbornly query an internal knowledge base for information that actually only existed on the web, simply because the tool prompt made the web search sound less relevant. We addressed this by carefully curating tool descriptions and even building an internal tool-testing agent that automatically tries out tools and suggests better descriptions for them. Ensuring each tool had a distinct purpose and clear usage guidance dramatically improved our agents’ success rate in choosing the optimal tool for a given job. Finally, multi-modal support is worth noting. Some tasks involve not just text, but images or other media. Our multi-agent architecture, especially with Azure AI Foundry, can incorporate vision-capable models as agents or tools. For example, an “Image Analysis Agent” could be part of a team, or an agent might call a vision API tool. The Telco customer service demo (using Foundry + OpenAI Agent SDK) featured an agent that could handle image uploads (like an ID document) by invoking an image-processing function. The orchestration framework doesn’t fundamentally change with multi-modality, it simply treats the vision model as another specialist agent or tool in the conversation. The ability to plug in different AI skills (text, vision, search, etc.) under a unified agent system is a big advantage of Microsoft’s approach: the agent team becomes cross-functional, each member with their own modality or expertise, collectively solving richer tasks than any single foundation model could. Reliability, Safety, and Enterprise-Grade Engineering While the basic idea of agents chatting and calling tools is elegant, productionizing this system for enterprise use brought serious engineering challenges. We needed our multi-agent system to be reliable, controllable, and secure. Here are the key areas we focused on and how we addressed them: Observability and Debugging Multi-agent chains can be complex and non-deterministic each run might involve different paths as agents make choices. Early on, we realized that treating the system as a black box was untenable. Developers and operators must be able to observe what each agent is “thinking” and doing, or else diagnosing issues would be impossible. Azure AI Foundry’s Agent Service was built with full conversation traceability in mind. Every message between agents (and to the user), every tool invocation and result, is captured in a structured thread log. We integrated this with Azure Application Insights telemetry, so one can monitor performance, latencies, errors, and even token consumption of agents in real time. This tracing proved invaluable. For example, when a complex workflow wasn’t producing the expected outcome, we could replay the entire agent conversation step by step to see where things went awry. In one instance, we found that two sub-agents were given slightly overlapping responsibilities, causing them to waste time retrieving nearly identical information. The logs and message transcripts made this immediately clear, guiding us to tighten the role definitions. Moreover, because the system logs are structured (not just free-form text), we could build automatic analysis tools like checking how often an agent hits a retry or how many cycles a conversation goes through before completion – to spot anomalies. This kind of observability was something the open-source community also highlighted as crucial; in fact, Sematic Kernel, AutoGen frameworks introduce metrics tracking and message tracing for exactly this reason. We also developed visual debugging tools. One example is the AutoGen Studio (a low-code interface from Microsoft Research) which allows developers to visually inspect agent interactions in real time, pause agents, or adjust their behavior on the fly. This interactive approach accelerates the prompt-engineering loop: one can watch agents argue or collaborate live, and intervene if needed. Such capabilities turned out to be vital for understanding emergent behaviors in multi-agent setups. Coordination Complexity and State Management As more agents come into play, keeping them coordinated and preserving shared context is hard. Early versions of agents would sometimes spawn excessive numbers of agents or get stuck in loops. For instance, one of our prototypes (before we applied strict limits) ended up in a degenerate state where two agents kept handing control back and forth without making progress. This taught us to implement guardrails and smarter orchestration policies. In Azure AI Foundry, beyond the simple connected-agent pattern, we introduced a more structured orchestration capability called Multi-Agent Workflows. This lets developers explicitly define states, transitions, and triggers in a workflow that involves multiple agents. It’s like flowcharting the high-level process that the agents should follow, including how they pass data around. We use this for long-running or highly critical processes where you want extra determinism for example, an onboarding workflow might have clearly defined phases (Verification → Provisioning → Notification) each handled by different agents, and you want to ensure the process doesn’t derail. The workflow engine enforces that the system moves to the next state only when all agents in the current state have completed and certain conditions (triggers) are met. It also provides persistence: if the process needs to wait (say, for an external event or simply because it’s a lengthy task), the state is saved and can be resumed later without losing context. These workflow features were a response to reliability needs, they give fine-grained control and error recovery in multi-agent systems that operate over extended periods. In practice, we learned to use the simpler Connected Agents approach for quick, on-the-fly delegations (it’s amazingly capable with minimal setup), and reserve Workflow Orchestration for scenarios where we must guarantee a robust sequence over minutes, hours, or days. By having both options, we can strike a balance between flexibility and control as needed. Trust, Safety, and Governance When you let AI agents act autonomously (especially if they can use tools that modify data or interact with the real world), safety is paramount. From day one, our design included enterprise-grade safety measures: Content Filtering and Policy Enforcement: All AI outputs go through content filters to catch disallowed content or potential prompt injection attacks. The Foundry Agent Service has integrated guardrails so that even if an agent tries something risky (e.g., a tool returns a sensitive info that should not be shown), policies can prevent misuse or leakage. For example, we configured financial analysis agents with rules not to output certain PII or to stop if they detect a regulatory compliance issue, handing off to a human instead. Identity and Access Control: Agents operate with identities managed via Microsoft Entra ID (Azure AD). This means every action an agent takes can be attributed and audited. Role-Based Access Control (RBAC) is enforced: an agent only has access to the data and APIs its role permits. If an agent’s credentials are compromised or misused, Azure’s standard auditing can alert us. Essentially, agents are first-class service principals in our cloud stack. Network Isolation and Compliance: For enterprise deployments, Azure AI Foundry allows agents to run in isolated networks (so they can’t arbitrarily call external services unless allowed) and to use customer-managed storage and search indices. This addresses the data governance aspect, we can ensure an agent looking up internal documents only sees what it’s supposed to, and all data stays within compliant boundaries. Auditability: As mentioned earlier, every decision an agent makes (every tool it calls, every answer it gives) is recorded. This is crucial for trust, if a multi-agent system is making business decisions, we need to be able to explain and justify those decisions later. By retaining the full reasoning trace and sources, we make the system’s work transparent and auditable. In fact, our “Deep Research” agents output not just answers but also a log of how they arrived at that answer, including citations to source material for each claim. This level of detail is a must-have in regulated industries or any high-stakes use case. Overall, baking in trust and safety by design was a non-negotiable requirement. It does introduce some overhead – e.g., being strict about content filtering can sometimes stop an agent from a creative solution until we refine its prompt or the filter thresholds, but it’s worth it for the confidence it gives to deploy these agents at scale. Performance and Cost Considerations We touched on the resource cost of multi-agent systems. Another challenge was ensuring the system runs efficiently. Without care, adding agents can linearly increase cost and latency. We mitigated this in a few ways: Parallelism: We make agents run concurrently wherever possible. Our lead agents typically fire off multiple sub-agents at once rather than sequentially waiting for one then starting the next. Also, our agents themselves can issue parallel tool calls. In fact, we enabled some of our retrieval agents to batch multiple search queries and send them all at the same time. Anthropic reported that this kind of parallelism cut their research task times by up to 90%, and we’ve observed similar dramatic speed-ups. By doing in 1 minute what a single agent might take 10 minutes to do step-by-step, we make the approach far more practical. Of course, the flip side is hitting many APIs and LLM endpoints concurrently can spike usage costs; we carefully monitor usage and recommend multi-agent mode only when needed for the problem complexity. Scaling rules and agent limits: One lesson learned was to prevent “agent sprawl.” We devised guidelines (and encoded some in prompts) about how many sub-agents to use for a given task complexity. For simple fact queries, the main agent is encouraged to handle it alone or with at most one helper; for moderately complex tasks, maybe spin up 2–3; only truly complex projects get a dozen specialists. This avoids the situation where an overzealous orchestrator might launch an army of agents and overkill the problem. These limits were informed by experimentation and echo the principle of scaling effort to the problem size. Model selection: Multi-agent systems don’t always need the largest model for every agent. We often use a mix of model sizes to optimize cost. For instance, a straightforward data extraction agent might be powered by a cheaper GPT-3.5, while the synthesis agent uses GPT-4 for the final answer quality. Foundry makes it easy to deploy a range of model endpoints (including open-source Llama-based models) and each agent can pick the one best suited. We learned that using an expensive model for a simple sub-task is wasteful; a smaller model with the right tools can do the job just as well. This mix-and-match approach helped keep our compute costs in check without sacrificing outcome quality. Lessons Learned and Best Practices Building these multi-agent systems was an iterative learning process. Here are some of the key lessons and best practices that emerged, which we believe will be useful to anyone developing their own: Let’s expand on a couple of these points: Prompt engineering for multi-agent is different: We quickly discovered that writing prompts for a team of agents is an order of magnitude more complex than for a single chatbot. Not only do you have to get each agent’s behavior right, you must shape how they interact. One principle that served us well was: “Think like your agents.” When debugging, we’d often step through the conversation from each agent’s perspective, almost role-playing as them, to see why they might be doing something silly. If an agent was repeating another’s results, maybe our instructions were too vague and they didn’t realise that sub-task was already covered. The fix would be to clarify the division of labour in the lead agent’s prompt or introduce an ordering (e.g., Agent B only runs after Agent A’s info is in, etc.). Another principle: teach the orchestrator to delegate effectively. The main agent’s prompt now includes explicit guidance on how to break down tasks and how to phrase sub-agent assignments with plenty of detail. We learned that if the lead just says “Research topic X” to two different agents, they might both do the same thing. Now, the lead agent provides distinct objectives and context to each sub-agent (e.g., focus one on recent news, another on historical data, etc.). This reduced redundancy and missed coverage dramatically. Let the AI help improve itself: One delightful surprise was that large models can be quite good at analyzing and refining their own strategies when asked. We sometimes gave an agent a chance to critique its output or plan, essentially a self-reflection step. In other cases we had a “judge” agent evaluate the final answers against criteria (accuracy, completeness, etc.) These evaluations not only gave us a score for benchmarking changes, but the judge’s feedback (being an LLM) often highlighted exactly where an agent went off track or missed something. In a sense, we used one AI to tell us how to make another AI better. This kind of meta-prompting and self-correction became a powerful tool in our development cycle, allowing faster iteration without full human-in-the-loop at every turn. Know when to simplify: Not every problem needs a fleet of agents. A big lesson was to use the simplest approach that works. If a single agent with a smart prompt can handle a task reliably, that’s fine! We reserved multi-agent mode for when there was clear added value e.g., problems requiring parallel exploration, different expertise, or lengthy reasoning that benefits from splitting into parts. This discipline kept our systems leaner and easier to maintain. It also helped us explain the value to stakeholders: we could justify the complexity by pointing to concrete gains (like a task that went from 2 hours by a single high-end model to 10 minutes by a team of agents with better results). Conclusion and Next Steps Multi-agent AI systems have moved from intriguing research demos to practical, production-ready solutions. Our journey involved close collaboration between teams such as those who built open-source frameworks like AutoGen to experiment with multi-agent interactions) and the Azure AI product teams (who turned these concepts into the robust Azure AI Foundry Agent Service). Along the way, we learned how to orchestrate LLMs at scale, how to keep them in check, and how to squeeze the most value out of agent collaboration. Today, Azure AI Foundry’s Agents platform provides a unified environment to develop, test, and deploy multi-agent systems, complete with the orchestration, observability, and safety features to make them enterprise-ready. The public preview of features like Connected Agents and Deep Research (which is essentially an advanced research agent that uses the web + analysis in a multi-step process) is already enabling customers to build “AI teams” that tackle complex workflows. This is just the beginning. We’re continuing to improve the platform with feedback from developers: upcoming releases will further tighten integration with the broader Azure ecosystem (for example, more seamless use of Azure Cognitive Search, Excel as a tool, etc.), expand the library of pre-built agent templates in the Agent Catalog (so you can start with a solid example for common scenarios), and introduce more advanced coordination patterns inspired by real-world use cases. If you’re an AI engineer or developer eager to explore multi-agent systems, now is a great time to dive in. Here are some resources to get you started: Microsoft AI Agents for Beginners - Learn all about AI Agents with this FREE curricula Azure AI Foundry Documentation – Learn more about the Agent Service and how to configure agents, tools, and workflows. Microsoft Learn Modules – step-by-step tutorial to build a connected multi-agent solution (for example, a ticket triage system) using Azure AI Foundry Agent Service. This will walk you through setting up agents and using the SDK. Microsoft MCP for Beginners: Integrating MCP Tools – Another tutorial focused on the Model Context Protocol, showing how to enable dynamic tool discovery for your agents. Azure AI Foundry Agent Catalog – Browse a growing collection of open-sourced agent examples contributed by Microsoft and partners, covering scenarios from content compliance to manufacturing optimization. These samples are great starting points to see how multi-agent code is structured in real projects. Multi-agent systems represent a significant shift in how we conceptualise AI solutions: from single brilliant assistants to teams of specialised agents working in concert. The engineering journey hasn’t been easy we navigated challenges in coordination, built new tooling for control, and refined prompts endlessly. But the end result is a new class of AI applications that are more powerful, resilient, and tunable. We hope the insights shared here help you in your own journey to build with AI agents. We’re excited to see what you will create with these technologies. As we continue to push the frontier of agentic AI (both in research and in Azure), one thing is clear: many minds – human or AI – are often better than one. Happy building! Userful References Introducing Multi-Agent Orchestration in Foundry Agent Service – Build ... Building a multimodal, multi-agent system using Azure AI Agent Service ... How we built our multi-agent research system \ Anthropic What is Azure AI Foundry Agent Service? - Azure AI Foundry Multi-Agent Systems and MCP Tools Integration with Azure AI Foundry ... Introducing Deep Research in Azure AI Foundry Agent Service AutoGen v0.4: Advancing the development of agentic AI systemsBuild a smart shopping AI Agent with memory using the Azure AI Foundry Agent service
When we think about human intelligence, memory is one of the first things that comes to mind. It’s what enables us to learn from our experiences, adapt to new situations, and make more informed decisions over time. Similarly, AI Agents become smarter with memory. For example, an agent can remember your past purchases, your budget, your preferences, and suggest gifts for your friends based on the learning from the past conversations. Agents usually break tasks into steps (plan → search → call API → parse → write), but then they might forget what happened in earlier steps without memory. Agents repeat tool calls, fetch the same data again, or miss simple rules like “always refer to the user by their name.” As a result of repeating the same context over and over again, the agents can spend more tokens, achieve slower results, and provide inconsistent answers. You can read my other article about why memory is important for AI Agents. In this article, we’ll explore why memory is so important for AI Agents and walk through an example of a Smart Shopping Assistant to see how memory makes it more helpful and personalized. You will learn how to integrate Memori with the Azure AI Foundry AI Agent service. Smart Shopping Experience With Memory for an AI Agent This demo showcases an Agent that remembers customer preferences, shopping behavior, and purchase history to deliver personalized recommendations and experiences. The demo walks through five shopping scenarios where the assistant remembers customer preferences, budgets, and past purchases to give personalized recommendations. From buying Apple products and work setups to gifts, home needs, and books, the assistant adapts to each need and suggests complementary options. Learns Customer Preferences: Remembers past purchases and preferences Provides Personalized Recommendations: Suggests products based on shopping history Budget-Aware Shopping: Considers customer budget constraints Cross-Category Intelligence: Connects purchases across different product categories Gift Recommendations: Suggests gifts based on the customer's history Contextual Conversations: Maintains shopping context across interactions Check the GitHub repo with the full agent source code and try out the live demo. How Smart Shopping Assistant Works We use the Azure AI Foundry Agent Service to build the shopping assistant and added Memori, an open-source memory solution, to give it persistent memory. You can check out the Memori GitHub repo here: https://github.com/GibsonAI/memori. We connect Memori to a local SQLite database, so the assistant can store and recall information. You can also use any other relational databases like PostgreSQL or MySQL. Note that it is a simplified version of the actual smart shopping assistant implementation. Check out the GitHub repo code for the full version. from azure.ai.agents.models import FunctionTool from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from dotenv import load_dotenv from memori import Memori, create_memory_tool # Constants DATABASE_PATH = "sqlite:///smart_shopping_memory.db" NAMESPACE = "smart_shopping_assistant" # Create Azure provider configuration for Memori azure_provider = ProviderConfig.from_azure( api_key=os.environ["AZURE_OPENAI_API_KEY"], azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], api_version=os.environ["AZURE_OPENAI_API_VERSION"], model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"], ) # Initialize Memori for persistent memory memory_system = Memori( database_connect=DATABASE_PATH, conscious_ingest=True, auto_ingest=True, verbose=False, provider_config=azure_provider, namespace=NAMESPACE, ) # Enable the memory system memory_system.enable() # Create memory tool for agents memory_tool = create_memory_tool(memory_system) def search_memory(query: str) -> str: """Search customer's shopping history and preferences""" try: if not query.strip(): return json.dumps({"error": "Please provide a search query"}) result = memory_tool.execute(query=query.strip()) memory_result = ( str(result) if result else "No relevant shopping history found" ) return json.dumps( { "shopping_history": memory_result, "search_query": query, "timestamp": datetime.now().isoformat(), } ) except Exception as e: return json.dumps({"error": f"Memory search error: {str(e)}"}) ... This setup records every conversation, and user preferences are saved under a namespace called smart_shopping_assistant. We plug Memori into the Azure AI Foundry agent as a function tool. The agent can call search_memory() to look at the shopping history each time. ... functions = FunctionTool(search_memory) # Get configuration from environment project_endpoint = os.environ["PROJECT_ENDPOINT"] model_name = os.environ["MODEL_DEPLOYMENT_NAME"] # Initialize the AIProjectClient project_client = AIProjectClient( endpoint=project_endpoint, credential=DefaultAzureCredential() ) print("Creating Smart Shopping Assistant...") instructions = """You are an advanced AI shopping assistant with memory capabilities. You help customers find products, remember their preferences, track purchase history, and provide personalized recommendations. """ agent = project_client.agents.create_agent( model=model_name, name="smart-shopping-assistant", instructions=instructions, tools=functions.definitions, ) thread = project_client.agents.threads.create() print(f"Created shopping assistant with ID: {agent.id}") print(f"Created thread with ID: {thread.id}") ... This integration makes the Azure-powered agent memory-aware: it can search customer history, remember preferences, and use that knowledge when responding. Setting Up and Running AI Foundry Agent with Memory Go to the Azure AI Foundry portal and create a project by following the guide in the Microsoft docs. Deploy a model like GPT-4o. You will need the Project Endpoint and Model Deployment Name to run the example. 1. Before running the demo, install the required libraries: pip install memorisdk azure-ai-projects azure-identity python-dotenv 2. Set your Azure environment variables: # Azure AI Foundry Project Configuration export PROJECT_ENDPOINT="https://your-project.eastus2.ai.azure.com" # Azure OpenAI Configuration export AZURE_OPENAI_API_KEY="your-azure-openai-api-key-here" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" export AZURE_OPENAI_API_VERSION="2024-12-01-preview" 3. Run the demo: python smart_shopping_demo.py The script runs predefined conversations to show how the assistant works in real life. Example: Hi! I'm looking for a new smartphone. I prefer Apple products and my budget is around $1000. The assistant responds by considering previous preferences, suggesting iPhone 15 Pro and accessories, and remembering your price preference for the future. So next time, it might suggest AirPods Pro too. The assistant responds by considering previous preferences, suggesting iPhone 15 Pro and accessories, and remembering your price preference for the future. So next time, it might suggest AirPods Pro too. How Memori Helps Memori decides which long-term memories are important enough to promote into short-term memory, so agents always have the right context at the right time. Memori adds powerful memory features for AI Agents: Structured memory: Learns and validates preferences using Pydantic-based logic Short-term vs. long-term memory: You decide what’s important to keep Multi-agent memory: Shared knowledge between different agents Automatic conversation recording: Just one line of code Multi-tenancy: Achieved with namespaces, so you can handle many users in the same setup. What You Can Build with This You can customize the demo further by: Expanding the product catalog with real inventory and categories that matter to your store. Adding new tools like “track my order,” “compare two products,” or “alert me when the price drops.” Connecting to a real store API (Shopify, WooCommerce, Magento, or a custom backend) so recommendations are instantly shoppable. Enabling cross-device memory, so the assistant remembers the same user whether they’re on web, mobile, or even a voice assistant. Integrating with payment and delivery services, letting users complete purchases right inside the conversation. Final Thoughts AI agents become truly useful when they can remember. With Memori + Azure AI Founder, you can build assistants that learn from each interaction, gets smarter over time, and deliver delightful, personal experiences.European AI and Cloud Summit 2025
Dusseldorf , Germany added 3,000+ tech enthusiast and hosted the Microsoft for Startups Cloud AI Pitch Competition between May 26-28, 2025. We were pleased to attend the European AI Cloud and Collaboration Summits and Biz Apps Summit– to participate and to gain so much back from everyone. It was a packed week filled with insights, feedback, and fun. Below is a recap of various aspects of the event - across keynotes, general sessions, breakout sessions, and the Expo Hall. The event in a nutshell: 3,000+ attendees 237 speakers overall – 98 Microsoft Valued Professionals, 16 Microsoft Valued Professional Regional Directorss, 51 from Microsoft Product Groups and Engineering 306 sessions | 13 tutorials (workshops) 70 sponsors | One giant Expo Hall PreDay Workshop AI Beginner Development Powerclass This workshop is designed to give you a hands-on introduction to the core concepts and best practices for interacting with OpenAI models in Azure AI Foundry portal. Innovate with Azure OpenAI's GPT-4o multimodal model in this hands-on experience in Azure AI Foundry. Learn the core concepts and best practices to effectively generate with text, sound, and images using GPT-4o-mini, DALL-E and GPT-4o-realtime. Create AI assistants that enhance user experiences and drive innovation. Workshop for you azure-ai-foundry/ai-tutorials: This repo includes a collection of tutorials to help you get started with building Generative AI applications using Azure AI Foundry. Each tutorial is designed to be self-contained and provides step-by-step instructions to guide you in the development process. Keynotes & general sessions Day 1 Keynote The Future of AI Is Already Here, Marco Casalaina, VP Products of Azure AI and AI Futurist at Microsoft This session discussed the new and revolutionary changes that you're about to see in AI - and how many of them are available for you to try now. Marco shared how AI is becoming ubiquitous, multimodal, multilingual, and autonomous, and how it will change our lives and our businesses. This session covered: • Incredible advances in multilingual AI • How Copilot (and every AI) are grounded to data, and how we do it in Azure OpenAI • Responsible AI, including evaluation for correctness, and real time content safety • The rise of AI Agents • And how AI is going to move from question-answering to taking action Day 2 Keynote Leveraging Microsoft AI: Navigating the EU AI Act and Unlocking Future Opportunities, Azar Koulibaly, General Manager and Associate General Counsel This session was for developers and business decision makers, Azar set the stage for Microsoft’s advancements in AI and how they align with the latest regulatory framework. Exploring the EU AI Act, its key components, and its implications for AI development and deployment within the European Union. The audience gained a comprehensive understanding of the EU AI Act's objectives, including the promotion of trustworthy AI, the mitigation of risks, and the enhancement of transparency and accountability. Learning aboutMicrosoft's Cloud and AI Services provide robust support for compliance with these new regulations, ensuring that your AI projects are both innovative and legally sound and Microsoft trust center resources. He delved into the opportunities that come with using Microsoft’s state-of-the-art tools, services, and technologies. Discover how partnering with Microsoft can accelerate your AI initiatives, drive business growth, and create competitive advantages in an evolving regulatory landscape. Join us to unlock the full potential of AI while navigating the complexities of the EU AI Act with confidence. General Sessions It is crucial to ensure your organization is technically ready for the full potential of AI. The sessions focused on technical readiness and ensuring you have the latest guidance. Our experts will shared the best practices and provide guidance on how to leverage AI and Azure AI Foundry to maximize the benefits of Agents, LLM and Generative within your organization. Expo Hall + Cloud AI Statup Stage + tutorials The Expo Hall was Buzzing with demos, discussions, interviews, podcasts, lightning talks, popcorn, catering trucks, cotton candy, SWAG, prizes, and community. There was a busy Cloud AI StartupStage, a Business Stage, and shorter talks delivered in front of a shiny airstream trailer. Cloud AI Startup Stage This was a highly informative and engaging event focused on Artificial Intelligence (AI) and its potential for startups. The Microsoft for Startups is a platform to provide startups with the resources, tools, and support they need to succeed. This portion of the event offered value to budding entrepreneurs and established startups looking to scale. For example, on day 2, we focused on accelerating innovation with Microsoft Founders Hub and Azure AI. Startups could kickstart their journey with Azure credits and gain access to 30+ tools and services, benefiting from additional credits and offerings as they evolve. It’s a great way for Startups to navigate technical and business challenges. Cloud AI Startup Pitch The Microsoft for Startups AI Pitch Competition and Startup Stage at European Cloud Summit 2025 This was a highly informative and engaging event focused on Artificial Intelligence (AI) and its potential for startups. The Microsoft Startup Programme was introduced as a platform that provides startups with the resources, tools, and support they need to succeed. The AI Empowerment session provided an in-depth overview of the various AI services available through Microsoft Azure AI Foundry and how these cutting-edge technologies can be integrated into business operations. This was perfect for startups looking to get started with AI or those interested in joining the Microsoft Startup Programme. The Spotlight on Innovation session showcased innovative startups from the European Cloud Summit, giving attendees a unique insight into the cutting-edge ideas that are shaping our future. The Empowering Innovation session featured a panel of experts and successful startup founders sharing insights on leveraging Microsoft technologies, navigating the startup ecosystem, and securing funding. This was valuable for budding entrepreneurs or established startups looking to scale. Startup Showcases Holistic AI - End to End AI Governance Platform, Raj Bharat Patel, Securing Advantage in the Era of Agentic AI With this autonomy comes high variability: the difference between a minor efficiency and a major one could mean millions in savings. Conversely, a seemingly small misstep could cascade into catastrophic reputational or compliance risk. The stakes are high—but so is the potential. The next frontier introduces a new paradigm: AI managing AI. As organizations deploy swarms of autonomous agents across business functions, the challenge expands beyond governing human-AI interactions. Now, it's about ensuring that AI agents can monitor, evaluate, and optimize each other—in real time, at scale. This demands a shift in both architecture and mindset: toward native-AI platforms and a new human role, moving from human-in-the-loop to human-on-the-loop—strategically overseeing autonomous systems, not micromanaging them. In this session, Raj Bharat Patel will explore how forward-thinking enterprises can prepare for this emerging reality—building governance and orchestration infrastructures that enable scale, speed, and safety in the age of agentic AI. D-ID | The #1 Choice for AI Generated Video Creation Platform, Yaniv Levi In a world increasingly powered by AI, how do we make digital experiences feel more human? At D-ID we enable businesses to create lifelike, interactive avatars that are transforming the way users communicate with AI, making it more intuitive, personal, and memorable. In this session, we’ll share how our collaboration with Microsoft for Startups helped us scale and innovate, enabling seamless integration with Microsoft’s tools to enhance customer-facing AI agents, and LLM-powered solutions while adding a powerful and personalized new layer of humanlike expression. Startup Pitch Competition Day 2 of the event focused on accelerating innovation with Microsoft Starups and Azure AI Foundry. Startups can kickstart their journey with Azure credits and gain access to 30+ tools and services, benefiting from additional credits and offerings as they evolve. The Azure AI Foundry provides access to an extensive range of AI models, including OpenAI, Meta, Nvidia, Hugging Face. Moreover, startups can navigate through technical and business challenges with the help of free 1:1 sessions with Microsoft experts. The Cloud Startup Stage Competition showcased the most innovative startups in the Microsoft Azure and Azure OpenAI ecosystem, highlighting their groundbreaking solutions and business models. This was a celebration of innovation and success and provided insights into the experiences, challenges, and future plans of these startups The Judges The Pitches . The Winners 1 st Place Graia 2 nd Place iThink365 3 rd Place ShArc Overall, this event was highly informative, engaging, and valuable for anyone interested in AI and its potential for startups. The Microsoft Startup Programme and Azure AI Foundry are powerful tools that can help startups achieve success and transform their ideas into successful businesses. In the end... We are grateful for this year's active and engaging #CollabSummit & #CloudSummit #BizAppSummit— so much goodness, caring, learning, and fun! Great questions, stories, understanding of your concerns, and the sharing in fun. Thank you and see you next year! We look forward to seeing in Cololgne - May 5-7, 2025 – Collaboration Summit (@CollabSummit), Cloud Summit (@EUCloudSummit), and BizApps Summit (@BizAppsSummit) and continue the discussion around AI in our Azure AI Discord CommunityQuest 6 - I want to build an AI Agent
Quest 6 of the JS AI Build-a-thon marks a major milestone — building your first intelligent AI agent using the Azure AI Foundry VS Code extension. In this quest, you’ll design, test, and integrate an agent that can use tools like Bing Search, respond to goals, and adapt in real-time. With updated instructions, real-world workflows, and powerful tooling, this is where your AI app gets truly smart.Use Prompty with Foundry Local
Prompty is a powerful tool for managing prompts in AI applications. Not only does it allow you to easily test your prompts during development, but it also provides observability, understandability and portability. Here's how to use Prompty with Foundry Local to support your AI applications with on-device inference. Foundry Local At the Build '25 conference, Microsoft announced Foundry Local, a new tool that allows developers to run AI models locally on their devices. Foundry Local offers developers several benefits, including performance, privacy, and cost savings. Why Prompty? When you build AI applications with Foundry Local, but also other language model hosts, consider using Prompty to manage your prompts. With Prompty, you store your prompts in separate files, making it easy to test and adjust them without changing your code. Prompty also supports templating, allowing you to create dynamic prompts that adapt to different contexts or user inputs. Using Prompty with Foundry Local The most convenient way to use Prompty with Foundry Local is to create a new configuration for Foundry Local. Using a separate configuration allows you to seamlessly test your prompts without having to repeat the configuration for every prompt. It also allows you to easily switch between different configurations, such as Foundry Local and other language model hosts. Install Prompty and Foundry Local To get started, install the Prompty Visual Studio Code extension and Foundry Local. Start Foundry Local from the command line by running foundry service start and note the URL on which it listens for requests, such as http://localhost:5272 or http://localhost:5273. Create a new Prompty configuration for Foundry Local If you don't have a Prompty file yet, create one to easily access Prompty settings. In Visual Studio Code, open Explorer, click right to open the context menu, and select New Prompty. This creates a basic.prompty file in your workspace. Create the Foundry Local configuration From the status bar, select default to open the Prompty configuration picker. When prompted to select the configuration, choose Add or Edit.... In the settings pane, choose Edit in settings.json. In the settings.json file, to the prompty.modelConfigurations collection, add a new configuration for Foundry Local, for example (ignore comments): { // Foundry Local model ID that you want to use "name": "Phi-4-mini-instruct-generic-gpu", // API type; Foundry Local exposes OpenAI-compatible APIs "type": "openai", // API key required for the OpenAI SDK, but not used by Foundry Local "api_key": "local", // The URL where Foundry Local exposes its API "base_url": "http://localhost:5272/v1" } Important: Be sure to check that you use the correct URL for Foundry Local. If you started Foundry Local with a different port, adjust the URL accordingly. Save your changes, and go back to the .prompty file. Once again, select the default configuration from the status bar, and choose Phi-4-mini-instruct-generic-gpu from the list. Since the model and API are configured, you can remove them from the .prompty file. Test your prompts With the newly created Foundry Local configuration selected, in the .prompty file, press F5 to test the prompt. The first time you run the prompt, it may take a few seconds because Foundry Local needs to load the model. Eventually, you should see the response from Foundry Local in the output pane. Summary Using Prompty with Foundry Local allows you to easily manage and test your prompts while running AI models locally. By creating a dedicated Prompty configuration for Foundry Local, you can conveniently test your prompts with Foundry Local models and switch between different model hosts and models if needed.