mcp
35 TopicsUnlocking AI-Driven Data Access: Azure Database for MySQL Support via the Azure MCP Server
Step into a new era of data-driven intelligence with the fusion of Azure MCP Server and Azure Database for MySQL, where your MySQL data is no longer just stored, but instantly conversational, intelligent and action-ready. By harnessing the open-standard Model Context Protocol (MCP), your AI agents can now query, analyze and automate in natural language, accessing tables, surfacing insights and acting on your MySQL-driven business logic as easily as chatting with a colleague. It’s like giving your data a voice and your applications a brain, all within Azure’s trusted cloud platform. We are excited to announce that we have added support for Azure Database for MySQL in Azure MCP Server. The Azure MCP Server leverages the Model Context Protocol (MCP) to allow AI agents to seamlessly interact with various Azure services to perform context-aware operations such as querying databases and managing cloud resources. Building on this foundation, the Azure MCP Server now offers a set of tools that AI agents and apps can invoke to interact with Azure Database for MySQL - enabling them to list and query databases, retrieve schema details of tables, and access server configurations and parameters. These capabilities are delivered through the same standardized interface used for other Azure services, making it easier to the adopt the MCP standard for leveraging AI to work with your business data and operations across the Azure ecosystem. Before we delve into these new tools and explore how to get started with them, let’s take a moment to refresh our understanding of MCP and the Azure MCP Server - what they are, how they work, and why they matter. MCP architecture and key components The Model Context Protocol (MCP) is an emerging open protocol designed to integrate AI models with external data sources and services in a scalable, standardized, and secure manner. MCP dictates a client-server architecture with four key components: MCP Host, MCP Client, MCP Server and external data sources, services and APIs that provide the data context required to enhance AI models. To explain briefly, an MCP Host (AI apps and agents) includes an MCP client component that connects to one or more MCP Servers. These servers are lightweight programs that securely interface with external data sources, services and APIs and exposes them to MCP clients in the form of standardized capabilities called tools, resources and prompts. Learn more: MCP Documentation What is Azure MCP Server? Azure offers a multitude of cloud services that help developers build robust applications and AI solutions to address business needs. The Azure MCP Server aims to expose these powerful services for agentic usage, allowing AI systems to perform operations that are context-aware of your Azure resources and your business data within them, while ensuring adherence to the Model Context Protocol. It supports a wide range of Azure services and tools including Azure AI Search, Azure Cosmos DB, Azure Storage, Azure Monitor, Azure CLI and Developer CLI extensions. This means that you can empower AI agents, apps and tools to: Explore your Azure resources, such as listing and retrieving details on your Azure subscriptions, resource groups, services, databases, and tables. Search, query and analyze your data and logs. Execute CLI and Azure Developer CLI commands directly, and more! Learn more: Azure MCP Server GitHub Repository Introducing new Azure MCP Server tools to interact with Azure Database for MySQL The Azure MCP Server now includes the following tools that allow AI agents to interact with Azure Database for MySQL and your valuable business data residing in these servers, in accordance with the MCP standard: Tool Description Example Prompts azmcp_mysql_server_list List all MySQL servers in a subscription & resource group "List MySQL servers in resource group 'prod-rg'." "Show MySQL servers in region 'eastus'." azmcp_mysql_server_config_get Retrieve the configuration of a MySQL server "What is the backup retention period for server 'my-mysql-server'?" "Show storage allocation for server 'my-mysql-server'." azmcp_mysql_server_param_get Retrieve a specific parameter of a MySQL server "Is slow_query_log enabled on server my-mysql-server?" "Get innodb_buffer_pool_size for server my-mysql-server." azmcp_mysql_server_param_set Set a specific parameter of a MySQL server to a specific value "Set max_connections to 500 on server my-mysql-server." "Set wait_timeout to 300 on server my-mysql-server." azmcp_mysql_table_list List all tables in a MySQL database "List tables starting with 'tmp_' in database 'appdb'." "How many tables are in database 'analytics'?" azmcp_mysql_table_schema_get Get the schema of a specific table in a MySQL database "Show indexes for table 'transactions' in database 'billing'." "What is the primary key for table 'users' in database 'auth'?" azmcp_mysql_database_query Executes a SELECT query on a MySQL Database. The query must start with SELECT and cannot contain any destructive SQL operations for security reasons. “How many orders were placed in the last 30 days in the salesdb.orders table?” “Show the number of new users signed up in the last week in appdb.users grouped by day.” These interactions are secured using Microsoft Entra authentication, which enables seamless, identity-based access to Azure Database for MySQL - eliminating the need for password storage and enhancing overall security. How are these new tools in the Azure MCP Server different from the standalone MCP Server for Azure Database for MySQL? We have integrated the key capabilities of the Azure Database for MySQL MCP server into the Azure MCP Server, making it easier to connect your agentic apps not only to Azure Database for MySQL but also to other Azure services through one unified and secure interface! How to get started Installing and running the Azure MCP Server is quick and easy! Use GitHub Copilot in Visual Studio Code to gain meaningful insights from your business data in Azure Database for MySQL. Pre-requisites Install Visual Studio Code. Install GitHub Copilot and GitHub Copilot Chat extensions. An Azure Database for MySQL with Microsoft Entra authentication enabled. Ensure that the MCP Server is installed on a system with network connectivity and credentials to connect to Azure Database for MySQL. Installation and Testing Please use this guide for installation: Azure MCP Server Installation Guide Try the following prompts with your Azure Database for MySQL: Azure Database for MySQL tools for Azure MCP Server Try it out and share your feedback! Start using Azure MCP Server with the MySQL tools today and let our cloud services become your AI agent’s most powerful ally. We’re counting on your feedback - every comment, suggestion, or bug-report helps us build better tools together. Stay tuned: more features and capabilities are on the horizon! Feel free to comment below or write to us with your feedback and queries at AskAzureDBforMySQL@service.microsoft.com.119Views0likes0CommentsLearn MCP from our free livestream series in December
Our Python + MCP series is a three-part, hands-on journey into one of the most important emerging technologies of 2025: MCP (Model Context Protocol) — an open standard for extending AI agents and chat interfaces with real-world tools, data, and execution environments. Whether you're building custom GitHub Copilot tools, powering internal developer agents, or creating AI-augmented applications, MCP provides the missing interoperability layer between LLMs and the systems they need to act on. Across the series, we move from local prototyping, to cloud deployment, to enterprise-grade authentication and security, all powered by Python and the FastMCP SDK. Each session builds on the last, showing how MCP servers can evolve from simple localhost services to fully authenticated, production-ready services running in the cloud — and how agents built with frameworks like Langchain and Microsoft’s agent-framework can consume them at every stage. 🔗 Register for the entire series. You can also scroll down to learn about each live stream and register for individual sessions. In addition to the live streams, you can also join office hours after each session in Foundry Discord to ask any follow-up questions. To get started with your MCP learnings before the series, check out the free MCP-for-beginners course on GitHub. Building MCP servers with FastMCP 16 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the intro session of our Python + MCP series, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like Langchain and Microsoft agent-framework to MCP servers. Deploying MCP servers to the cloud 17 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our second session of the Python + MCP series, we're deploying MCP servers to the cloud! We'll walk through the process of containerizing a FastMCP server with Docker and deploying to Azure Container Apps, and also demonstrate a FastMCP server running directly on Azure Functions. Then we'll explore private networking options for MCP servers, using virtual networks that restrict external access to internal MCP tools and agents. Authentication for MCP servers 18 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our third session of the Python + MCP series, we're exploring the best ways to build authentication layers on top of your MCP servers. That could be as simple as an API key to gate access, but for the servers that provide user-specific data, we need to use an OAuth2-based authentication flow. MCP authentication is built on top of OAuth2 but with additional requirements like PRM and DCR/CIMD, which can make it difficult to implement fully. In this session, we'll demonstrate the full MCP auth flow, and provide examples that implement MCP Auth on top of Microsoft Entra.Function Calling with Small Language Models
In our previous article on running Phi-4 locally, we built a web-enhanced assistant that could search the internet and provide informed answers. Here's what that implementation looked like: def web_enhanced_query(question): # 1. ALWAYS search (hardcoded decision) search_results = search_web(question) # 2. Inject results into prompt prompt = f"""Here are recent search results: {search_results} Question: {question} Using only the information above, give a clear answer.""" # 3. Model just summarizes what it reads return ask_phi4(endpoint, model_id, prompt) Today, we're upgrading to true function calling. With this, we have ability to transform small language models from passive text generators into intelligent agents that can: Decide when to use external tools Reason which tool bests fit each task Execute real-world actions thrugh apis Function calling represents a significant evolution in AI capabilities. Let's understand where this positions our small language models: Agent Classification Framework Simple Reflex Agents (Basic) React to immediate input with predefined rules Example: Thermostat, basic chatbot Without function calling, models operate here Model-Based Agents (Intermediate) Maintain internal state and context Example: Robot vacuum with room mapping Function calling enables this level Goal-Based Agents (Advanced) Plan multi-step sequences to achieve objectives Example: Route planner, task scheduler Function calling + reasoning enables this Learning Agents (Expert) Adapt and improve over time Example: Recommendation systems Future: Function calling + fine-tuning As usual with these articles, let's get ready to get our hands dirty! Project Setup Let's set up our environment for building function-calling assistants. Prerequisites First, ensure you have Foundry Local installed and a model running. We'll use Qwen 2.5-7B for this tutorial as it has excellent function calling support. Important: Not all small language models support function calling equally. Qwen 2.5 was specifically trained for this capability and provides a reliable experience through Foundry Local. # 1. Check Foundry Local is installed foundry --version # 2. Start the Foundry Local service foundry service start # 3. Download and run Qwen 2.5-7B foundry model run qwen2.5-7b Python Environment Setup # 1. Create Python virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 2. Install dependencies pip install openai requests python-dotenv # 3. Get a free OpenWeatherMap API key # Sign up at: https://openweathermap.org/api ``` Create `.env` file: ``` OPENWEATHER_API_KEY=your_api_key_here ``` Building a Weather-Aware Assistant So in this scenario, a user wants to plan outdoor activities but needs weather context. Without function calling, You will get something like this: User: "Should I schedule my team lunch outside at 2pm in Birmingham?" Model: "That depends on weather conditions. Please check the forecast for rain and temperature." However, with fucntion-calling you get an answer that is able to look up the weather and reply with the needed context. We will do that now. Understanding Foundry Local's Function Calling Implementation Before we start coding, there's an important implementation detail to understand. Foundry Local uses a non-standard function calling format. Instead of returning function calls in the standard OpenAI tool_calls field, Qwen models return the function call as JSON text in the response content. For example, when you ask about weather, instead of: # Standard OpenAI format message.tool_calls = [ {"name": "get_weather", "arguments": {"location": "Birmingham"}} ] You get: # Foundry Local format message.content = '{"name": "get_weather", "arguments": {"location": "Birmingham"}}' This means we need to parse the JSON from the content ourselves. Don't worry—this is straightforward, and I'll show you exactly how to handle it! Step 1: Define the Weather Tool Create weather_assistant.py: import os from openai import OpenAI import requests import json import re from dotenv import load_dotenv load_dotenv() # Initialize Foundry Local client client = OpenAI( base_url="http://127.0.0.1:59752/v1/", api_key="not-needed" ) # Define weather tool tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather information for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city or location name" }, "units": { "type": "string", "description": "Temperature units", "enum": ["celsius", "fahrenheit"], "default": "celsius" } }, "required": ["location"] } } } ] A tool is necessary because it provides the model with a structured specification of what external functions are available and how to use them. The tool definition contains the function name, description, parameters schema, and information returned. Step 2: Implement the Weather Function def get_weather(location: str, units: str = "celsius") -> dict: """Fetch weather data from OpenWeatherMap API""" api_key = os.getenv("OPENWEATHER_API_KEY") url = "http://api.openweathermap.org/data/2.5/weather" params = { "q": location, "appid": api_key, "units": "metric" if units == "celsius" else "imperial" } response = requests.get(url, params=params, timeout=5) response.raise_for_status() data = response.json() temp_unit = "°C" if units == "celsius" else "°F" return { "location": data["name"], "temperature": f"{round(data['main']['temp'])}{temp_unit}", "feels_like": f"{round(data['main']['feels_like'])}{temp_unit}", "conditions": data["weather"][0]["description"], "humidity": f"{data['main']['humidity']}%", "wind_speed": f"{round(data['wind']['speed'] * 3.6)} km/h" } The model calls this function to get the weather data. it contacts OpenWeatherMap API, gets real weather data and returns it as a python dictionary Step 3: Parse Function Calls from Content This is the crucial step where we handle Foundry Local's non-standard format: def parse_function_call(content: str): """Extract function call JSON from model response""" if not content: return None json_pattern = r'\{"name":\s*"get_weather",\s*"arguments":\s*\{[^}]+\}\}' match = re.search(json_pattern, content) if match: try: return json.loads(match.group()) except json.JSONDecodeError: pass try: parsed = json.loads(content.strip()) if isinstance(parsed, dict) and "name" in parsed: return parsed except json.JSONDecodeError: pass return None Step 4: Main Chat Function with Function Calling and lastly, calling the model. Notice the tools and tool_choice parameter. Tools tells the model it is allowed to output a tool_call requesting that the function be executed. While tool_choice instructs the model how to decide whether to call a tool. def chat(user_message: str) -> str: """Process user message with function calling support""" messages = [ {"role": "user", "content": user_message} ] response = client.chat.completions.create( model="qwen2.5-7b-instruct-generic-cpu:4", messages=messages, tools=tools, tool_choice="auto", temperature=0.3, max_tokens=500 ) message = response.choices[0].message if message.content: function_call = parse_function_call(message.content) if function_call and function_call.get("name") == "get_weather": print(f"\n[Function Call] {function_call.get('name')}({function_call.get('arguments')})") args = function_call.get("arguments", {}) weather_data = get_weather(**args) print(f"[Result] {weather_data}\n") final_prompt = f"""User asked: "{user_message}" Weather data: {json.dumps(weather_data, indent=2)} Provide a natural response based on this weather information.""" final_response = client.chat.completions.create( model="qwen2.5-7b-instruct-generic-cpu:4", messages=[{"role": "user", "content": final_prompt}], max_tokens=200, temperature=0.7 ) return final_response.choices[0].message.content return message.content Step 5: Run the script Now put all the above together and run the script def main(): """Interactive weather assistant""" print("\nWeather Assistant") print("=" * 50) print("Ask about weather or general questions.") print("Type 'exit' to quit\n") while True: user_input = input("You: ").strip() if user_input.lower() in ['exit', 'quit']: print("\nGoodbye!") break if user_input: response = chat(user_input) print(f"Assistant: {response}\n") if __name__ == "__main__": if not os.getenv("OPENWEATHER_API_KEY"): print("Error: OPENWEATHER_API_KEY not set") print("Set it with: export OPENWEATHER_API_KEY='your_key_here'") exit(1) main() Note: Make sure Qwen 2.5 is running in Foundry Local in a new terminal Now let's talk about Model Context Protocol! Our weather assistant works beautifully with a single function, but what happens when you need dozens of tools? Database queries, file operations, calendar integration, email—each would require similar setup code. This is where Model Context Protocol (MCP) comes in. MCP is an open standard that provides pre-built, standardized servers for common tools. Instead of writing custom integration code for every capability, you can connect to MCP servers that handle the complexity for you. With MCP, You only need one command to enable weather, database, and file access npx @modelcontextprotocol/server-weather npx @modelcontextprotocol/server-sqlite npx @modelcontextprotocol/server-filesystem Your model automatically discovers and uses these tools without custom integration code. Learn more: Model Context Protocol Documentation EdgeAI Course - Module 03: MCP Integration Key Takeaways Function calling transforms models into agents - From passive text generators to active problem-solvers Qwen 2.5 has excellent function calling support - Specifically trained for reliable tool use Foundry Local uses non-standard format - Parse JSON from content instead of tool_calls field Start simple, then scale with MCP - Build one tool to understand the pattern, then leverage standards Documentation Running Phi-4 Locally with Foundry Local Phi-4: Small Language Models That Pack a Punch Microsoft Foundry Local GitHub EdgeAI for Beginners Course OpenWeatherMap API Documentation Model Context Protocol Qwen 2.5 Documentation Thank you for reading! I hope this article helps you build more capable AI agents with small language models. Function calling opens up incredible possibilities—from simple weather assistants to complex multi-tool workflows. Start with one tool, understand the pattern, and scale from there.252Views0likes0CommentsAzure Skilling at Microsoft Ignite 2025
The energy at Microsoft Ignite was unmistakable. Developers, architects, and technical decision-makers converged in San Francisco to explore the latest innovations in cloud technology, AI applications, and data platforms. Beyond the keynotes and product announcements was something even more valuable: an integrated skilling ecosystem designed to transform how you build with Azure. This year Azure Skilling at Microsoft Ignite 2025 brought together distinct learning experiences, over 150+ hands-on labs, and multiple pathways to industry-recognized credentials—all designed to help you master skills that matter most in today's AI-driven cloud landscape. Just Launched at Ignite Microsoft Ignite 2025 offered an exceptional array of learning opportunities, each designed to meet developers anywhere on the skilling journey. Whether you joined us in-person or on-demand in the virtual experience, multiple touchpoints are available to deepen your Azure expertise. Ignite 2025 is in the books, but you can still engage with the latest Microsoft skilling opportunities, including: The Azure Skills Challenge provides a gamified learning experience that lets you compete while completing task-based achievements across Azure's most critical technologies. These challenges aren't just about badges and bragging rights—they're carefully designed to help you advance technical skills and prepare for Microsoft role-based certifications. The competitive element adds urgency and motivation, turning learning into an engaging race against the clock and your peers. For those seeking structured guidance, Plans on Learn offer curated sets of content designed to help you achieve specific learning outcomes. These carefully assembled learning journeys include built-in milestones, progress tracking, and optional email reminders to keep you on track. Each plan represents 12-15 hours of focused learning, taking you from concept to capability in areas like AI application development, data platform modernization, or infrastructure optimization. The Microsoft Reactor Azure Skilling Series, running December 3-11, brings skilling to life through engaging video content, mixing regular programming with special Ignite-specific episodes. This series will deliver technical readiness and programming guidance in a livestream presentation that's more digestible than traditional documentation. Whether you're catching episodes live with interactive Q&A or watching on-demand later, you’ll get world-class instruction that makes complex topics approachable. Beyond Ignite: Your Continuous Learning Journey Here's the critical insight that separates Ignite attendees who transform their careers from those who simply collect swag: the real learning begins after the event ends. Microsoft Ignite is your launchpad, not your destination. Every module you start, every lab you complete, and every challenge you tackle connects to a comprehensive learning ecosystem on Microsoft Learn that's available 24/7, 365 days a year. Think of Ignite as your intensive immersion experience—the moment when you gain context, build momentum, and identify the skills that will have the biggest impact on your work. What you do in the weeks and months following determines whether that momentum compounds into career-defining expertise or dissipates into business as usual. For those targeting career advancement through formal credentials, Microsoft Certifications, Applied Skills and AI Skills Navigator, provide globally recognized validation of your expertise. Applied Skills focus on scenario-based competencies, demonstrating that you can build and deploy solutions, not simply answer theoretical questions. Certifications cover role-based scenarios for developers, data engineers, AI engineers, and solution architects. The assessment experiences include performance-based testing in dedicated Azure tenants where you complete real configuration and development tasks. And finally, the NEW AI Skills Navigator is an agentic learning space, bringing together AI-powered skilling experiences and credentials in a single, unified experience with Microsoft, LinkedIn Learning and GitHub – all in one spot Why This Matters: The Competitive Context The cloud skills race is intensifying. While our competitors offer robust training and content, Microsoft's differentiation comes not from having more content—though our 1.4 million module completions last fiscal year and 35,000+ certifications awarded speak to scale—but from integration of services to orchestrate workflows. Only Microsoft offers a truly unified ecosystem where GitHub Copilot accelerates your development, Azure AI services power your applications, and Azure platform services deploy and scale your solutions—all backed by integrated skilling content that teaches you to maximize this connected experience. When you continue your learning journey after Ignite, you're not just accumulating technical knowledge. You're developing fluency in an integrated development environment that no competitor can replicate. You're learning to leverage AI-powered development tools, cloud-native architectures, and enterprise-grade security in ways that compound each other's value. This unified expertise is what transforms individual developers into force-multipliers for their organizations. Start Now, Build Momentum, Never Stop Microsoft Ignite 2025 offered the chance to compress months of learning into days of intensive, hands-on experience, but you can still take part through the on-demand videos, the Global Ignite Skills Challenge, visiting the GitHub repos for the /Ignite25 labs, the Reactor Azure Skilling Series, and the curated Plans on Learn provide multiple entry points regardless of your current skill level or preferred learning style. But remember: the developers who extract the most value from Ignite are those who treat the event as the beginning, not the culmination, of their learning journey. They join hackathons, contribute to GitHub repositories, and engage with the Azure community on Discord and technical forums. The question isn't whether you'll learn something valuable from Microsoft Ignite 2025-that's guaranteed. The question is whether you'll convert that learning into sustained momentum that compounds over months and years into career-defining expertise. The ecosystem is here. The content is ready. Your skilling journey doesn't end when Ignite does—it accelerates.Unleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration
Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities. What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key Features of MCP Standardized Communication – MCP provides a structured way for AI models to interact with various tools. Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights. Secure & Scalable – Enables safe and scalable integration with enterprise applications. Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods. MCP Architecture & How It Works MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works: Components of MCP MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions. MCP Client – An intermediary service that forwards the AI model's requests to MCP servers. MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.). Data Sources – Various backend systems, including local storage, cloud databases, and external APIs. Data Flow in MCP The AI model sends a request (e.g., "fetch user profile data"). The MCP client forwards the request to the appropriate MCP server. The MCP server retrieves the required data from a database or API. The response is sent back to the AI model via the MCP client. Integrating MCP with Azure OpenAI Services Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information. Benefits of Azure OpenAI Services + MCP Integration ✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems. ✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information. ✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail. Hands-On Tools for MCP Implementation To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway. Microsoft Semantic Workbench A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities. Features: Build and test multi-agent AI assistants. Configure settings and interactions between AI models and external tools. Supports GitHub Codespaces for cloud-based development. Explore Semantic Workbench Workbench interface examples Microsoft AI Gateway A plug-and-play interface that allows developers to experiment with MCP using Azure API Management. Features: Credential Manager – Securely handle API credentials. Live Experimentation – Test AI model interactions with external tools. Pre-built Labs – Hands-on learning for developers. Explore AI Gateway Setting Up MCP with Azure OpenAI Services Step 1: Create a Virtual Environment First, create a virtual environment using Python: python -m venv .venv Activate the environment: # Windows venv\Scripts\activate # MacOS/Linux source .venv/bin/activate Step 2: Install Required Libraries Create a requirements.txt file and add the following dependencies: langchain-mcp-adapters langgraph langchain-openai Then, install the required libraries: pip install -r requirements.txt Step 3: Set Up OpenAI API Key Ensure you have your OpenAI API key set up: # Windows setx OPENAI_API_KEY "<your_api_key> # MacOS/Linux export OPENAI_API_KEY=<your_api_key> Building an MCP Server This server performs basic mathematical operations like addition and multiplication. Create the Server File First, create a new Python file: touch math_server.py Then, implement the server: from mcp.server.fastmcp import FastMCP # Initialize the server mcp = FastMCP("Math") MCP.tool() def add(a: int, b: int) -> int: return a + b MCP.tool() def multiply(a: int, b: int) -> int: return a * b if __name__ == "__main__": mcp.run(transport="stdio") Your MCP server is now ready to run. Building an MCP Client This client connects to the MCP server and interacts with it. Create the Client File First, create a new file: touch client.py Then, implement the client: import asyncio from mcp import ClientSession, StdioServerParameters from langchain_openai import ChatOpenAI from mcp.client.stdio import stdio_client # Define server parameters server_params = StdioServerParameters( command="python", args=["math_server.py"], ) # Define the model model = ChatOpenAI(model="gpt-4o") async def run_agent(): async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await load_mcp_tools(session) agent = create_react_agent(model, tools) agent_response = await agent.ainvoke({"messages": "what's (4 + 6) x 14?"}) return agent_response["messages"][3].content if __name__ == "__main__": result = asyncio.run(run_agent()) print(result) Your client is now set up and ready to interact with the MCP server. Running the MCP Server and Client Step 1: Start the MCP Server Open a terminal and run: python math_server.py This starts the MCP server, making it available for client connections. Step 2: Run the MCP Client In another terminal, run: python client.py Expected Output 140 This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o. Conclusion Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions. Next Steps: Explore more MCP tools and integrations. Extend your MCP setup to work with additional APIs. Deploy your solution in a cloud environment for broader accessibility. For further details, visit the GitHub repository for MCP integration examples and best practices. MCP GitHub Repository MCP Documentation Semantic Workbench AI Gateway MCP Video Walkthrough MCP Blog MCP Github End to End Demo57KViews10likes6CommentsPython + IA: Resumen y Recursos
Acabamos de concluir nuestra serie sobre Python + IA, un recorrido completo de nueve sesiones donde exploramos a fondo cómo usar modelos de inteligencia artificial generativa desde Python. Durante la serie presentamos varios tipos de modelos, incluyendo LLMs, modelos de embeddings y modelos de visión. Profundizamos en técnicas populares como RAG, tool calling y salidas estructuradas. Evaluamos la calidad y seguridad de la IA mediante evaluaciones automatizadas y red-teaming. Finalmente, desarrollamos agentes de IA con frameworks populares de Python y exploramos el nuevo Model Context Protocol (MCP). Para que puedas aplicar lo aprendido, todos nuestros ejemplos funcionan con GitHub Models, un servicio que ofrece modelos gratuitos a todos los usuarios de GitHub para experimentación y aprendizaje. Aunque no hayas asistido a las sesiones en vivo, ¡todavía puedes acceder a todos los materiales usando los enlaces de abajo! Si eres instructor, puedes usar las diapositivas y el código en tus propias clases. Python + IA: Modelos de Lenguaje Grandes (LLMs) 📺 Ver grabación En esta sesión exploramos los LLMs, los modelos que impulsan ChatGPT y GitHub Copilot. Usamos Python con paquetes como OpenAI SDK y LangChain, experimentamos con prompt engineering y ejemplos few-shot, y construimos una aplicación completa basada en LLMs. También explicamos la importancia de la concurrencia y el streaming en apps de IA. Diapositivas: aka.ms/pythonia/diapositivas/llms Código: python-openai-demos Guía de repositorio: video Python + IA: Embeddings Vectoriales 📺 Ver grabación En nuestra segunda sesión, aprendemos sobre los modelos de embeddings vectoriales, que convierten texto o imágenes en arreglos numéricos. Comparamos métricas de distancia, aplicamos cuantización y experimentamos con modelos multimodales. Diapositivas: aka.ms/pythonia/diapositivas/embeddings Código: vector-embedding-demos Guía de repositorio: video Python + IA: Retrieval Augmented Generation (RAG) 📺 Ver grabación Descubrimos cómo usar RAG para mejorar las respuestas de los LLMs añadiendo contexto relevante. Construimos flujos RAG en Python con distintas fuentes (CSVs, sitios web, documentos y bases de datos) y terminamos con una aplicación completa basada en Azure AI Search. Diapositivas: aka.ms/pythonia/diapositivas/rag Código: python-openai-demos Guía de repositorio: video Python + IA: Modelos de Visión 📺 Ver grabación Los modelos de visión aceptan texto e imágenes, como GPT-4o y GPT-4o mini. Creamos una app de chat con imágenes, realizamos extracción de datos y construimos un motor de búsqueda multimodal. Diapositivas: aka.ms/pythonia/diapositivas/vision Código: vector-embeddings Guía de repositorio: video Python + IA: Salidas Estructuradas 📺 Ver grabación Aprendemos a generar respuestas estructuradas con LLMs usando Pydantic BaseModel. Este enfoque permite validación automática de los resultados, útil para extracción de entidades, clasificación y flujos de agentes. Diapositivas: aka.ms/pythonia/diapositivas/salidas Código: python-openai-demos y entity-extraction-demos Guía de repositorio: video Python + IA: Calidad y Seguridad 📺 Ver grabación Analizamos cómo usar la IA de forma segura y cómo evaluar la calidad de las respuestas. Mostramos cómo configurar Azure AI Content Safety y usar el Azure AI Evaluation SDK para medir resultados de los modelos. Diapositivas: aka.ms/pythonia/diapositivas/calidad Código: ai-quality-safety-demos Guía de repositorio: video Python + IA: Tool Calling 📺 Ver grabación Exploramos el tool calling, base para crear agentes de IA. Definimos herramientas con esquemas JSON y funciones Python, manejamos llamadas paralelas y flujos iterativos. Diapositivas: aka.ms/pythonia/diapositivas/herramientas Código: python-openai-demos Guía de repositorio: video Python + IA: Agentes de IA 📺 Ver grabación Creamos agentes de IA con frameworks como el agent-framework de Microsoft y LangGraph, mostrando arquitecturas con múltiples herramientas, supervisores y flujos con intervención humana. Diapositivas: aka.ms/pythonia/diapositivas/agentes Código: python-ai-agents-demos Guía de repositorio: video Python + IA: Model Context Protocol (MCP) 📺 Ver grabación Cerramos la serie con MCP (Model Context Protocol), la tecnología más innovadora de 2025. Mostramos cómo usar el SDK de FastMCP en Python para crear un servidor MCP local, conectarlo a GitHub Copilot, construir un cliente MCP y conectar frameworks como LangGraph y agent-framework. También discutimos los riesgos de seguridad asociados. Diapositivas: aka.ms/pythonia/diapositivas/mcp Código: python-ai-mcp-demos Guía de repositorio: video Además Si tienen preguntas, por favor, en el canal #Espanol en nuestro Discord: https://aka.ms/pythonia/discord Todos los jueves tengo office hours: https://aka.ms/pythonia/horas Encuentra más tutoriales 100% en español sobre Python + AI en https://youtube.com/@lagpsLevel up your Python + AI skills with our complete series
We've just wrapped up our live series on Python + AI, a comprehensive nine-part journey diving deep into how to use generative AI models from Python. The series introduced multiple types of models, including LLMs, embedding models, and vision models. We dug into popular techniques like RAG, tool calling, and structured outputs. We assessed AI quality and safety using automated evaluations and red-teaming. Finally, we developed AI agents using popular Python agents frameworks and explored the new Model Context Protocol (MCP). To help you apply what you've learned, all of our code examples work with GitHub Models, a service that provides free models to every GitHub account holder for experimentation and education. Even if you missed the live series, you can still access all the material using the links below! If you're an instructor, feel free to use the slides and code examples in your own classes. If you're a Spanish speaker, check out the Spanish version of the series. Python + AI: Large Language Models 📺 Watch recording In this session, we explore Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We use Python to interact with LLMs using popular packages like the OpenAI SDK and LangChain. We experiment with prompt engineering and few-shot examples to improve outputs. We also demonstrate how to build a full-stack app powered by LLMs and explain the importance of concurrency and streaming for user-facing AI apps. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vector embeddings 📺 Watch recording In our second session, we dive into a different type of model: the vector embedding model. A vector embedding is a way to encode text or images as an array of floating-point numbers. Vector embeddings enable similarity search across many types of content. In this session, we explore different vector embedding models, such as the OpenAI text-embedding-3 series, through both visualizations and Python code. We compare distance metrics, use quantization to reduce vector size, and experiment with multimodal embedding models. Slides for this session Code repository with examples: vector-embedding-demos Python + AI: Retrieval Augmented Generation 📺 Watch recording In our third session, we explore one of the most popular techniques used with LLMs: Retrieval Augmented Generation. RAG is an approach that provides context to the LLM, enabling it to deliver well-grounded answers for a particular domain. The RAG approach works with many types of data sources, including CSVs, webpages, documents, and databases. In this session, we walk through RAG flows in Python, starting with a simple flow and culminating in a full-stack RAG application based on Azure AI Search. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vision models 📺 Watch recording Our fourth session is all about vision models! Vision models are LLMs that can accept both text and images, such as GPT-4o and GPT-4o mini. You can use these models for image captioning, data extraction, question answering, classification, and more! We use Python to send images to vision models, build a basic chat-with-images app, and create a multimodal search engine. Slides for this session Code repository with examples: openai-chat-vision-quickstart Python + AI: Structured outputs 📺 Watch recording In our fifth session, we discover how to get LLMs to output structured responses that adhere to a schema. In Python, all you need to do is define a Pydantic BaseModel to get validated output that perfectly meets your needs. We focus on the structured outputs mode available in OpenAI models, but you can use similar techniques with other model providers. Our examples demonstrate the many ways you can use structured responses, such as entity extraction, classification, and agentic workflows. Slides for this session Code repository with examples: python-openai-demos Python + AI: Quality and safety 📺 Watch recording This session covers a crucial topic: how to use AI safely and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. We focus on Azure tools that make it easier to deploy safe AI systems into production. We demonstrate how to configure the Azure AI Content Safety system when working with Azure AI models and how to handle errors in Python code. Then we use the Azure AI Evaluation SDK to evaluate the safety and quality of output from your LLM. Slides for this session Code repository with examples: ai-quality-safety-demos Python + AI: Tool calling 📺 Watch recording In the final part of the series, we focus on the technologies needed to build AI agents, starting with the foundation: tool calling (also known as function calling). We define tool call specifications using both JSON schema and Python function definitions, then send these definitions to the LLM. We demonstrate how to properly handle tool call responses from LLMs, enable parallel tool calling, and iterate over multiple tool calls. Understanding tool calling is absolutely essential before diving into agents, so don't skip over this foundational session. Slides for this session Code repository with examples: python-openai-demos Python + AI: Agents 📺 Watch recording In the penultimate session, we build AI agents! We use Python AI agent frameworks such as the new agent-framework from Microsoft and the popular LangGraph framework. Our agents start simple and then increase in complexity, demonstrating different architectures such as multiple tools, supervisor patterns, graphs, and human-in-the-loop workflows. Slides for this session Code repository with examples: python-ai-agent-frameworks-demos Python + AI: Model Context Protocol 📺 Watch recording In the final session, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like LangGraph and Microsoft agent-framework to MCP servers. With great power comes great responsibility, so we briefly discuss the security risks that come with MCP, both as a user and as a developer. Slides for this session Code repository with examples: python-mcp-demo1.3KViews0likes0CommentsServerless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔Orchestrating Multi-Agent Intelligence: MCP-Driven Patterns in Agent Framework
Building reliable AI systems requires modular, stateful coordination and deterministic workflows that enable agents to collaborate seamlessly. The Microsoft Agent Framework provides these foundations, with memory, tracing, and orchestration built in. This implementation demonstrates four multi-agentic patterns — Single Agent, Handoff, Reflection, and Magentic Orchestration — showcasing different interaction models and collaboration strategies. From lightweight domain routing to collaborative planning and self-reflection, these patterns highlight the framework’s flexibility. At the core is Model Context Protocol (MCP), connecting agents, tools, and memory through a shared context interface. Persistent session state, conversation thread history, and checkpoint support are handled via Cosmos DB when configured, with an in-memory dictionary as a default fallback. This setup enables dynamic pattern swapping, performance comparison, and traceable multi-agent interactions — all within a unified, modular runtime. Business Scenario: Contoso Customer Support Chatbot Contoso’s chatbot handles multi-domain customer inquiries like billing anomalies, promotion eligibility, account locks, and data usage questions. These require combining structured data (billing, CRM, security logs, promotions) with unstructured policy documents processed via vector embeddings. Using MCP, the system orchestrates tool calls to fetch real-time structured data and relevant policy content, ensuring policy-aligned, auditable responses without exposing raw databases. This enables the assistant to explain anomalies, recommend actions, confirm eligibility, guide account recovery, and surface risk indicators—reducing handle time and improving first-contact resolution while supporting richer multi-agent reasoning. Architecture & Core Concepts The Contoso chatbot leverages the Microsoft Agent Framework to deliver a modular, stateful, and workflow-driven architecture. At its core, the system consists of: Base Agent: All agent patterns—single agent, reflection, handoff and magentic orchestration—inherit from a common base class, ensuring consistent interfaces for message handling, tool invocation, and state management. Backend: A FastAPI backend manages session routing, agent execution, and workflow orchestration. Frontend: A React-based UI (or Streamlit alternative) streams responses in real-time and visualizes agent reasoning and tool calls. Modular Runtime and Pattern Swapping One of the most powerful aspects of this implementation is its modular runtime design. Each agentic pattern—Single, Reflection, Handoff, and Magnetic—plugs into a shared execution pipeline defined by the base agent and MCP integration. By simply updating the .env configuration (e.g., agent_module=handoff), developers can swap in and out entire coordination strategies without touching the backend, frontend, or memory layers. This makes it easy to compare agent styles side by side, benchmark reasoning behaviors, and experiment with orchestration logic—all while maintaining a consistent, deterministic runtime. The same MCP connectors, FastAPI backend, and Cosmos/in-memory state management work seamlessly across every pattern, enabling rapid iteration and reliable evaluation. # Dynamic agent pattern loading agent_module_path = os.getenv("AGENT_MODULE") agent_module = __import__(agent_module_path, fromlist=["Agent"]) Agent = getattr(agent_module, "Agent") # Common MCP setup across all patterns async def _create_tools(self, headers: Dict[str, str]) -> List[MCPStreamableHTTPTool] | None: if not self.mcp_server_uri: return None return [MCPStreamableHTTPTool( name="mcp-streamable", url=self.mcp_server_uri, headers=headers, timeout=30, request_timeout=30, )] Memory & State Management State management is critical for multi-turn conversations and cross-agent workflows. The system supports two out-of-the-box options: Persistent Storage (Cosmos DB) Acts as the durable, enterprise-ready backend. Stores serialized conversation threads and workflow checkpoints keyed by tenant and session ID. Ensures data durability and auditability across restarts. In-Memory Session Store Default fallback when Cosmos DB credentials are not configured. Maintains ephemeral state per session for fast prototyping or lightweight use cases. All patterns leverage the same thread-based state abstraction, enabling: Session isolation: Each user session maintains its own state and history. Checkpointing: Multi-agent workflows can snapshot shared and executor-local state at any point, supporting pause/resume and fault recovery. Model Context Protocol (MCP): Acts as the connector between agents and tools, standardizing how data is fetched and results are returned to agents, whether querying structured databases or unstructured knowledge sources. Core Principles Across all patterns, the framework emphasizes: Modularity: Components are interchangeable—agents, tools, and state stores can be swapped without disrupting the system. Stateful Coordination: Multi-agent workflows coordinate through shared and local state, enabling complex reasoning without losing context. Deterministic Workflows: While agents operate autonomously, the workflow layer ensures predictable, auditable execution of multi-agent tasks. Unified Execution: From single-agent Q&A to complex Magentic orchestrations, every agent follows the same execution lifecycle and integrates seamlessly with MCP and the state store. Multi-Agent Patterns: Workflow and Coordination With the architecture and core concepts established, we can now explore the agentic patterns implemented in the Contoso chatbot. Each pattern builds on the base agent and MCP integration but differs in how agents orchestrate tasks and communicate with one another to handle multi-domain customer queries. In the sections that follow, we take a deeper dive into each pattern’s workflow and examine the under-the-hood communication flows between agents: Single Agent – A simple, single-domain agent handling straightforward queries. Reflection Agent – Allows agents to introspect and refine their outputs. Handoff Pattern – Routes conversations intelligently to specialized agents across domains. Magentic Orchestration – Coordinates multiple specialist agents for complex, parallel tasks. For each pattern, the focus will be on how agents communicate and coordinate, showing the practical orchestration mechanisms in action. Single Intelligent Agent The Single Agent Pattern represents the simplest orchestration style within the framework. Here, a single autonomous agent handles all reasoning, decision-making, and tool interactions directly — without delegation or multi-agent coordination. When a user submits a request, the single agent processes the query using all tools, memory, and data sources available through the Model Context Protocol (MCP). It performs retrieval, reasoning, and response composition in a single, cohesive loop. Communication Flow: User Input → Agent: The user submits a question or command. Agent → MCP Tools: The agent invokes one or more tools (e.g., vector retrieval, structured queries, or API calls) to gather relevant context and data. Agent → User: The agent synthesizes the tool outputs, applies reasoning, and generates the final response to the user. Session Memory: Throughout the exchange, the agent stores conversation history and extracted entities in the configured memory store (in-memory or Cosmos DB). Key Communication Principles: Single Responsibility: One agent performs both reasoning and action, ensuring fast response times and simpler state management. Direct Tool Invocation: The agent has direct access to all registered tools through MCP, enabling flexible retrieval and action chaining. Stateful Execution: The session memory preserves dialogue context, allowing the agent to maintain continuity across user turns. Deterministic Behavior: The workflow is fully predictable — input, reasoning, tool call, and output occur in a linear sequence. Reflection pattern The Reflection Pattern introduces a lightweight, two-agent communication loop designed to improve the quality and reliability of responses through structured self-review. In this setup, a Primary Agent first generates an initial response to the user’s query. This draft is then passed to a Reviewer Agent, whose role is to critique and refine the response—identifying gaps, inaccuracies, or missed context. Finally, the Primary Agent incorporates this feedback and produces a polished final answer for the user. This process introduces one round of reflection and improvement without adding excessive latency, balancing quality with responsiveness. Communication Flow: User Input → Primary Agent: The user submits a query. Primary Agent → Reviewer Agent: The primary generates an initial draft and passes it to the reviewer. Reviewer Agent → Primary Agent: The reviewer provides feedback or suggested improvements. Primary Agent → User: The primary revises its response and sends the refined version back to the user. Key Communication Principles: Two-Stage Dialogue: Structured interaction between Primary and Reviewer ensures each output undergoes quality assurance. Focused Review: The Reviewer doesn’t recreate answers—it critiques and enhances, reducing redundancy. Stateful Context: Both agents operate over the same shared memory, ensuring consistency between draft and revision. Deterministic Flow: A single reflection round guarantees predictable latency while still improving answer quality. Transparent Traceability: Each step—initial draft, feedback, and final output—is logged, allowing developers to audit reasoning or assess quality improvements over time. In practice, this pattern enables the system to reason about its own output before responding, yielding clearer, more accurate, and policy-aligned answers without requiring multiple independent retries. Handoff Pattern When a user request arrives, the system first routes it through an Intent Classifier (or triage agent) to determine which domain specialist should handle the conversation. Once identified, control is handed off directly to that Specialist Agent, which uses its own tools, domain knowledge, and state context to respond. This specialist continues to handle the user interaction as long as the conversation stays within its domain. If the user’s intent shifts — for example, moving from billing to security — the conversation is routed back to the Intent Classifier, which re-assigns it to the correct specialist agent. This pattern reduces latency and maintains continuity by minimizing unnecessary routing. Each handoff is tracked through the shared state store, ensuring seamless context carry-over and full traceability of decisions. Key Communication Principles: Dynamic Routing: The Intent Classifier routes user input to the right specialist domain. Domain Persistence: The specialist remains active while the user stays within its domain. Context Continuity: Conversation history and entities persist across agents through the shared state store. Traceable Handoffs: Every routing decision is logged for observability and auditability. Low Latency: Responses are faster since domain-appropriate agents handle queries directly. In practice, this means a user could begin a conversation about billing, continue seamlessly, and only be re-routed when switching topics — without losing any conversational context or history. Magentic Pattern The Magentic Pattern is designed for open-ended, multi-faceted tasks that require multiple agents to collaborate. It introduces a Manager (Planner) Agent, which interprets the user’s goal, breaks it into subtasks, and orchestrates multiple Specialist Agents to execute those subtasks. The Manager creates and maintains a Task Ledger, which tracks the status, dependencies, and results of each specialist’s work. As specialists perform their tool calls or reasoning, the Manager monitors their progress, gathers intermediate outputs, and can dynamically re-plan, dispatch additional tasks, or adjust the overall workflow. When all subtasks are complete, the Manager synthesizes the combined results into a coherent final response for the user. Key Communication Principles: Centralized Orchestration: The Manager coordinates all agent interactions and workflow logic. Parallel and Sequential Execution: Specialists can work simultaneously or in sequence based on task dependencies. Task Ledger: Acts as a transparent record of all task assignments, updates, and completions. Dynamic Re-planning: The Manager can modify or extend workflows in real time based on intermediate findings. Shared Memory: All agents access the same state store for consistent context and result sharing. Unified Output: The Manager consolidates results into one response, ensuring coherence across multi-agent reasoning. In practice, Magentic orchestration enables complex reasoning where the system might combine insights from multiple agents — e.g., billing, product, and security — and present a unified recommendation or resolution to the user. Choosing the Right Agent for Your Use Case Selecting the appropriate agent pattern hinges on the complexity of the task and the level of coordination required. As use cases evolve from straightforward queries to intricate, multi-step processes, the need for specialized orchestration increases. Below is a decision matrix to guide your choice: Feature / Requirement Single Agent Reflection Agent Handoff Pattern Magentic Orchestration Handles simple, domain-bound tasks ✔ ✔ ✖ ✖ Supports review / quality assurance ✖ ✔ ✖ ✔ Multi-domain routing ✖ ✖ ✔ ✔ Open-ended / complex workflows ✖ ✖ ✖ ✔ Parallel agent collaboration ✖ ✖ ✖ ✔ Direct tool access ✔ ✔ ✔ ✔ Low latency / fast response ✔ ✔ ✔ ✖ Easy to implement / low orchestration ✔ ✔ ✖ ✖ Dive Deeper: Explore, Build, and Innovate We've explored various agent patterns, from Single Agent to Magentic Orchestration, each tailored to different use cases and complexities. To see these patterns in action, we invite you to explore our Github repo. Clone the repo, experiment with the examples, and adapt them to your own scenarios. Additionally, beyond the patterns discussed here, the repository also features a Human-in-the-Loop (HITL) workflow designed for fraud detection. This workflow integrates human oversight into AI decision-making, ensuring higher accuracy and reliability. For an in-depth look at this approach, we recommend reading our detailed blog post: Building Human-in-the-loop AI Workflows with Microsoft Agent Framework | Microsoft Community Hub Engage with these resources, and start building intelligent, reliable, and scalable AI systems today! This repository and content is developed and maintained by James Nguyen, Nicole Serafino, Kranthi Kumar Manchikanti, Heena Ugale, and Tim Sullivan.From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.