mcp
49 TopicsIf You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.MCP Demystified: Tools vs Resources vs Prompts Explained Simply
Introduction When developers start working with Model Context Protocol (MCP), one of the most confusing parts is understanding the difference between MCP Tools, Resources, and Prompts. All three are important components in modern AI application development, but they serve completely different purposes. In real-world AI systems like chatbots, AI agents, and copilots, using these components correctly can make your application scalable, clean, and easy to maintain. If used incorrectly, it can lead to confusion, bugs, and poor system design. In this article, we will clearly explain the difference between MCP Tools, Resources, and Prompts in simple words, using real-world examples and practical explanations. This guide is helpful for both beginner and intermediate developers working with AI and MCP. What Are MCP Tools? MCP Tools are functions or services that an AI model can use to perform real-world actions. These actions usually involve doing something outside the AI system, such as calling an API, updating a database, or sending a message. In simple terms, Tools represent what the AI can do. Real-World Analogy Think of MCP Tools like service workers in a company. For example, a delivery person delivers packages, a support agent updates tickets, and a payment system processes transactions. Similarly, MCP Tools perform specific tasks when requested by the AI. Examples of MCP Tools A tool that fetches user details from a database A tool that sends emails or notifications A tool that creates or updates support tickets A tool that calls third-party APIs like payment gateways A tool that triggers workflows in enterprise systems Key Understanding Tools are action-based. They execute operations and return results. Whenever your AI needs to "do something," you should use a Tool. What Are MCP Resources? MCP Resources are data sources that the AI model can access to read information. These are typically read-only and provide context or knowledge to the AI. In simple terms, Resources represent what the AI can read or see. Real-World Analogy Think of MCP Resources like books in a library or documents in a company. You can read and learn from them, but you cannot directly change their content. Examples of MCP Resources A database table containing customer information A knowledge base with FAQs and documentation System logs that track user activity Configuration files or static datasets Company policy documents or guidelines Key Understanding Resources are data-based. They provide information but do not perform any action. Whenever your AI needs information to make a decision, you should use a Resource. What Are MCP Prompts? MCP Prompts are structured instructions or templates that guide how the AI model should think, behave, and respond. In simple terms, Prompts represent how you instruct the AI. Real-World Analogy Think of Prompts like instructions given to an employee. For example, “Write a professional email,” “Summarize this report,” or “Answer politely to the customer.” These instructions shape how the output is generated. Examples of MCP Prompts A prompt to summarize customer feedback A prompt to generate a support response in a polite tone A prompt to analyze data and provide insights A prompt to translate text into another language A prompt to generate code based on requirements Key Understanding Prompts are instruction-based. They define how the AI should process input and generate output. Key Differences Between MCP Tools, Resources, and Prompts Understanding the difference between MCP Tools, Resources, and Prompts is important for building scalable AI systems. Tools vs Resources vs Prompts Tools are used for performing actions Resources are used for reading data Prompts are used for guiding AI behavior Detailed Comparison Tools interact with external systems and can change data or trigger operations Resources only provide data and do not modify anything Prompts control how the AI thinks, responds, and formats its output Comparison Table Aspect MCP Tools MCP Resources MCP Prompts Purpose Perform actions Provide data Guide behavior Nature Active Passive Instructional Usage API calls, updates Data reading AI response generation Output Action result Data Generated content How MCP Tools, Resources, and Prompts Work Together In real-world AI systems, these three components are used together to create powerful workflows. Step-by-Step Flow The user sends a request to the AI system The Prompt defines how the AI should understand and respond The AI fetches required information from Resources If an action is required, the AI uses a Tool The AI combines everything and generates a final response Practical Example Consider an AI customer support system: The Prompt ensures the response is polite and helpful The Resource provides customer history and previous tickets The Tool updates the ticket status or sends an email notification This combination helps build intelligent, real-world AI applications. Advantages of Understanding MCP Concepts Helps developers design clean and scalable AI architecture Improves clarity in system design and reduces confusion Enhances performance by separating responsibilities Makes debugging and maintenance easier Supports faster development of AI-powered applications Common Mistakes Developers Make Using Tools when only data retrieval is needed Treating Resources as editable systems Writing vague or unclear Prompts Mixing responsibilities between Tools, Resources, and Prompts Not structuring MCP components properly in applications Best Practices for Using MCP Tools, Resources, and Prompts Clearly define the role of each component before implementation Use Tools only for actions that change system state or trigger operations Use Resources strictly for reading and retrieving data Write clear, specific, and well-structured Prompts Test Tools, Resources, and Prompts independently before integration Keep your architecture modular and easy to scale Summary Understanding the difference between MCP Tools, Resources, and Prompts is essential for modern AI application development using Model Context Protocol. Tools allow AI systems to perform actions, Resources provide the necessary data, and Prompts guide how the AI behaves and generates responses. When these components are used correctly, developers can build scalable, efficient, and intelligent AI systems. Mastering these MCP concepts will help you design better architectures and create powerful AI-driven applications in today’s evolving technology landscape.Building MCP servers with Entra ID and pre-authorized clients
The Model Context Protocol (MCP) gives AI agents a standard way to call external tools, but things get more complicated when those tools need to know who the user is. In this post, I’ll show how to build an MCP server with the Python FastMCP package that authenticates users with Microsoft Entra ID when they connect from a pre-authorized client such as VS Code. If you need to build a server that works with any MCP clients, read my previous blog post. With Microsoft Entra as the authorization server, supporting arbitrary clients currently requires adding an OAuth proxy in front, which increases security risk. This post focuses on the simpler pre-authorized-client path instead. MCP auth Let’s start by digging into the MCP auth spec, since that explains both the shape of the flow and the constraints we run into with Entra. The MCP specification includes an authorization protocol based on OAuth 2.1, so an MCP client can send a request that includes a Bearer token from an authorization server, and the MCP server can validate that token. In OAuth 2.1 terms, the MCP client is acting as the OAuth client, the MCP server is the resource server, the signed-in user is the resource owner, and the authorization server issues an access token. In this case, Entra will be our authorization server. We can't necessarily use any OAuth-compatible authorization servers, as MCP auth requires more than just the core OAuth 2.1 functionality. In OAuth, the authorization server needs a relationship with the client. MCP auth describes three options: Pre-registration: the auth server has a pre-existing relationship and has the client ID in its database already CIMD (Client Identity Metadata Document): the MCP client sends the URL of its CIMD, a JSON document that describes its attributes, and the auth server bases its interactions on that information. DCR (Dynamic Client Registration): when the auth server sees a new client, it explicitly registers it and stores the client information in its own data. DCR is now considered a "legacy" path, as the hope is for CIMD to be the supported path in the future. For each MCP scenario - each combination of MCP server, MCP client, and authorization server - we need to determine which of those options are viable and optimal. Here's one way of thinking through it: VS Code supports all of MCP auth, so its MCP client includes both CIMD and DCR support. However, the Microsoft Entra authorization server does not support CIMD or DCR. That leaves us with only one official option: pre-registration. If we desperately need support for arbitrary clients, it is possible to put a CIMD/DCR proxy in front of Entra, as discussed in my previous blog post, but the Entra team discourages that approach due to increased security risks. When using pre-registration, the auth flow is relatively simple (but still complex, because hey, this is OAuth!): User asks to use auth-restricted MCP server MCP client makes a request to MCP server without a bearer token MCP server responds with an HTTP 401 and a pointer to its PRM (Protected Resource Metadata) document MCP client reads PRM to discover the authorization server and options MCP client redirects to authorization server, including its client ID User signs into authorization server Authorization server returns authorization code MCP client exchanges authorization code for access token Authorization server returns access token MCP client re-tries original request, but now with bearer token included MCP server validates bearer token and returns successfully Here's what that looks like: Now let's dig into the code for implementing MCP auth with the pre-registered VS Code client. Registering the MCP server with Entra Before the server can use Entra to authorize users, we need to register the server with Entra via an app registration. We can do registration using the Azure Portal, Azure CLI, Microsoft Graph SDK, or even Bicep. In this case, I use the Python MS Graph SDK as it allows me to specify everything programmatically. First, I create the Entra app registration, specifying the sign-in audience (single-tenant) and configuring the MCP server as a protected resource: scope_id = str(uuid.uuid4()) Application( display_name="Entra App for MCP server", sign_in_audience="AzureADMyOrg", api=ApiApplication( requested_access_token_version=2, oauth2_permission_scopes=[ PermissionScope( admin_consent_description="Allows access to the MCP server as the signed-in user.", admin_consent_display_name="Access MCP Server", id=scope_id, is_enabled=True, type="User", user_consent_description="Allow access to the MCP server on your behalf.", user_consent_display_name="Access MCP Server", value="user_impersonation") ], pre_authorized_applications=[ PreAuthorizedApplication( app_id=VSCODE_CLIENT_ID, delegated_permission_ids=[scope_id], )])) The api parameter is doing the heavy lifting, ensuring that other applications (like VS Code) can request permission to access the server on behalf of a user. Here's what each parameter does: requested_access_token_version=2: Entra ID has two token formats (v1.0 and v2.0). We need v2.0 because that's what FastMCP's token validator expects. oauth2_permission_scopes: This defines a permission called user_impersonation that MCP clients can request when connecting to your server. It's the server saying: "I accept tokens that let an MCP client act on behalf of a signed-in user." Without at least one scope defined, no MCP client can obtain a token for your server — Entra wouldn't know what permission to grant. The name user_impersonation is a convention (we could call it anything), but it clearly signals that the MCP client is accessing your server as the user, not as itself. pre_authorized_applications: This list tells Entra which client applications are pre-approved to request tokens for this server’s API without showing an extra consent prompt to the user. In this case, I list VS Code’s application ID and tie it to the user_impersonation scope, so VS Code can request a token for the MCP server as the signed-in user. Thanks to that configuration, when VS Code requests a token, it will request a token with the scope "api://{app_id}/user_impersonation" , and the FastMCP server will validate that incoming tokens contain that scope. Next, I create a Service Principal for that Entra app registration, which represents the Entra app in my tenant request_principal = ServicePrincipal(app_id=app.app_id, display_name=app.display_name) await graph_client.service_principals.post(request_principal) Securing credentials for Entra app registrations I also need a way for the server to prove that it can use that Entra app registration. There are three options: Client secret: Easiest to set up, but since it's a secret, it must be stored securely, protected carefully, and rotated regularly. Certificate: Stronger than a client secret and generally better suited for production, but it still requires certificate storage, renewal, and lifecycle management. Managed identity as Federated Identity Credential (MI-as-FIC): No stored secret, no certificate to manage, and usually the best choice when your app is hosted on Azure. No support for local development however. I wanted the best of both worlds: easy local development on my machine, but the most secure production story for deployment on Azure Container Apps. So I actually created two Entra app registrations, one for local with client secret, and one for production with managed identity. Here's how I set up the password for the local Entra app: password_credential = await graph_client.applications.by_application_id(app.id).add_password.post( AddPasswordPostRequestBody( password_credential=PasswordCredential(display_name="FastMCPSecret"))) It's a bit trickier to set up the MI-as-FIC, since we first need to provision the managed identity and associate that with our Azure Container Apps resource. I set all of that up in Bicep, and then after provisioning completes, I run this code to configure a FIC using the managed identity: fic = FederatedIdentityCredential( name="miAsFic", issuer=f"https://login.microsoftonline.com/{tenant_id}/v2.0", subject=managed_identity_principal_id, audiences=["api://AzureADTokenExchange"], ) await graph_client.applications.by_application_id( prod_app_id ).federated_identity_credentials.post(fic) Since I now have two Entra app registrations, I make sure that the environment variables in my local .env point to the secret-secured local Entra app registration, and the environment variables on my Azure Container App point to the FIC-secured prod Entra app registration. Granting admin consent This next step is only necessary if the MCP server uses the on-behalf-of (OBO) flow to exchange the incoming access token for a token to a downstream API, such as Microsoft Graph. In this case, my demo server uses OBO so it can query Microsoft Graph to check the signed-in user's group membership. The earlier code added VS Code as a pre-authorized application, but that only allows VS Code to obtain a token for the MCP server itself; it does not grant the MCP server permission to call Microsoft Graph on the user's behalf. Because the MCP sign-in flow in VS Code does not include a separate consent step for those downstream Graph scopes, I grant admin consent up front so the OBO exchange can succeed. This code grants the admin consent to the associated service principal for the Graph API resource and scopes: server_principal = await graph_client.service_principals_with_app_id(app.app_id).get() graph_principal = await graph_client.service_principals_with_app_id( "00000003-0000-0000-c000-000000000000" # Graph API ).get() await graph_client.oauth2_permission_grants.post( OAuth2PermissionGrant( client_id=server_principal.id, consent_type="AllPrincipals", resource_id=graph_principal.id, scope="User.Read email offline_access openid profile", ) ) If our MCP server needed to use an OBO flow with another resource server, we could request additional grants for those resources and scopes. Our Entra app registration is now ready for the MCP server, so let's move on to see the server code. Using FastMCP servers with Entra In our MCP server code, we configure FastMCP's RemoteAuthProvider based on the details from the Entra app registration process: from fastmcp.server.auth import RemoteAuthProvider from fastmcp.server.auth.providers.azure import AzureJWTVerifier verifier = AzureJWTVerifier( client_id=ENTRA_CLIENT_ID, tenant_id=AZURE_TENANT_ID, required_scopes=["user_impersonation"], ) auth = RemoteAuthProvider( token_verifier=verifier, authorization_servers=[f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/v2.0"], base_url=base_url, ) Notice that we do not need to pass in a client secret at this point, even when using the local Entra app registration. FastMCP validates the tokens using Entra's public keys - no Entra app credentials needed. To make it easy for our MCP tools to access an identifier for the currently logged in user, we define a middleware that inspects the claims of the current token using FastMCP's get_access_token() and sets the "oid" (Entra object identifier) in the state: class UserAuthMiddleware(Middleware): def _get_user_id(self): token = get_access_token() if not (token and hasattr(token, "claims")): return None return token.claims.get("oid") async def on_call_tool(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: await context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) async def on_read_resource(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: await context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) When we initialize the FastMCP server, we set the auth provider and include that middleware: mcp = FastMCP("Expenses Tracker", auth=auth, middleware=[UserAuthMiddleware()]) Now, every request made to the MCP server will require authentication. The server will return a 401 if a valid token isn't provided, and that 401 will prompt the VS Code MCP client to kick off the MCP authorization flow. Inside each tool, we can grab the user id from the state, and use that to customize the response for the user, like to store or query items in a database. MCP.tool async def add_user_expense( date: Annotated[date, "Date of the expense in YYYY-MM-DD format"], amount: Annotated[float, "Positive numeric amount of the expense"], description: Annotated[str, "Human-readable description of the expense"], ctx: Context, ): """Add a new expense to Cosmos DB.""" user_id = await ctx.get_state("user_id") if not user_id: return "Error: Authentication required (no user_id present)" expense_item = { "id": str(uuid.uuid4()), "user_id": user_id, "date": date.isoformat(), "amount": amount, "description": description } await cosmos_container.create_item(body=expense_item) Using OBO flow in FastMCP server Remember when we granted admin consent for the Entra app registration earlier? That means we can use an OBO flow inside the MCP server, to make calls to the Graph API on behalf of the signed-in user. To make it easier to exchange and validate tokens, we use the Python MSAL SDK and configure a ConfidentialClientApplication . When using the local secret-secured Entra app registration, this is all we need to set it up: from msal import ConfidentialClientApplication confidential_client = ConfidentialClientApplication( client_id=entra_client_id, client_credential=os.environ["ENTRA_DEV_CLIENT_SECRET"], authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}", token_cache=TokenCache(), ) When using the production FIC-secured Entra app registration, we need a function that returns tokens for the managed identity: from msal import ManagedIdentityClient, TokenCache, UserAssignedManagedIdentity mi_client = ManagedIdentityClient( UserAssignedManagedIdentity(client_id=os.environ["AZURE_CLIENT_ID"]), http_client=requests.Session(), token_cache=TokenCache()) def _get_mi_assertion(): result = mi_client.acquire_token_for_client(resource="api://AzureADTokenExchange") if "access_token" not in result: raise RuntimeError(f"Failed to get MI assertion: {result.get('error_description', 'unknown error')}") return result["access_token"] confidential_client = ConfidentialClientApplication( client_id=entra_client_id, client_credential={"client_assertion": _get_mi_assertion}, authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}", token_cache=TokenCache()) Inside any code that requires OBO, we ask MSAL to exchange the MCP access token for a Graph API access token: graph_resource_access_token = confidential_client.acquire_token_on_behalf_of( user_assertion=access_token.token, scopes=["https://graph.microsoft.com/.default"] ) graph_token = graph_resource_access_token["access_token"] Once we successfully acquire the token, we can use that token with the Graph API, for any operations permitted by the scopes in the admin consent granted earlier. For this example, we call the Graph API to check whether the logged in user is a member of a particular Entra group: client = httpx.AsyncClient() url = ("https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group" f"?$filter=id eq '{group_id}'&$count=true") response = await client.get( url, headers={ "Authorization": f"Bearer {graph_token}", "ConsistencyLevel": "eventual", }) data = response.json() membership_count = data.get("@odata.count", 0) is_admin = membership_count > 0 FastMCP 3.0 now provides a way to restrict tool visibility based on authorization checks, so I wrapped the above code in a function and set it as the auth constraint for the admin tool: async def require_admin_group(ctx: AuthContext) -> bool: graph_token = exchange_for_graph_token(ctx.token.token) return await check_user_in_group(graph_token, admin_group_id) @mcp.tool(auth=require_admin_group) async def get_expense_stats(ctx: Context): """Get expense statistics. Only accessible to admins.""" ... FastMCP will run that function both when an MCP client requests the list of tools, to determine which tools can be seen by the current user, and again when a user tries to use that tool, for an added just-in-time security check. This is just one way to use an OBO flow however. You can use it directly inside tools, like to query for more details from the Graph API, upload documents to OneDrive/SharePoint/Notes, send emails, etc. All together now For the full code, check out the open source azure-cosmosdb-identity-aware-mcp-server repository. The most relevant files for the Entra authentication setup are: auth_init.py: Creates the Entra app registrations for production and local development, defines the delegated user_impersonation scope, pre-authorizes VS Code, creates the service principal, and grants admin consent for the Microsoft Graph scopes used in the OBO flow. auth_postprovision.py: Adds the federated identity credential (FIC) after deployment so the container app's managed identity can act as the production Entra app without storing a client secret. main.py: Implements the MCP server using FastMCP's RemoteAuthProvider and AzureJWTVerifier for direct Entra authentication, plus OBO-based Microsoft Graph calls for admin group membership checks. As always, please let me know if you have further questions or ideas for other Entra integrations. Acknowledgements: Thank you to Matt Gotteiner for his guidance in implementing the OBO flow and review of the blog post.Agents League: Meet the Winners
Agents League brought together developers from around the world to build AI agents using Microsoft's developer tools. With 100+ submissions across three tracks, choosing winners was genuinely difficult. Today, we're proud to announce the category champions. 🎨 Creative Apps Winner: CodeSonify View project CodeSonify turns source code into music. As a genuinely thoughtful system, its functions become ascending melodies, loops create rhythmic patterns, conditionals trigger chord changes, and bugs produce dissonant sounds. It supports 7 programming languages and 5 musical styles, with each language mapped to its own key signature and code complexity directly driving the tempo. What makes CodeSonify stand out is the depth of execution. CodeSonify team delivered three integrated experiences: a web app with real-time visualization and one-click MIDI export, an MCP server exposing 5 tools inside GitHub Copilot in VS Code Agent Mode, and a diff sonification engine that lets you hear a code review. A clean refactor sounds harmonious. A messy one sounds chaotic. The team even built the MIDI generator from scratch in pure TypeScript with zero external dependencies. Built entirely with GitHub Copilot assistance, this is one of those projects that makes you think about code differently. 🧠 Reasoning Agents Winner: CertPrep Multi-Agent System View project CertPrep Multi-Agent System team built a production-grade 8-agent system for personalized Microsoft certification exam preparation, supporting 9 exam families including AI-102, AZ-204, AZ-305, and more. Each agent has a distinct responsibility: profiling the learner, generating a week-by-week study schedule, curating learning paths, tracking readiness, running mock assessments, and issuing a GO / CONDITIONAL GO / NOT YET booking recommendation. The engineering behind the scene here is impressive. A 3-tier LLM fallback chain ensures the system runs reliably even without Azure credentials, with the full pipeline completing in under 1 second in mock mode. A 17-rule guardrail pipeline validates every agent boundary. Study time allocation uses the Largest Remainder algorithm to guarantee no domain is silently zeroed out. 342 automated tests back it all up. This is what thoughtful multi-agent architecture looks like in practice. 💼 Enterprise Agents Winner: Whatever AI Assistant (WAIA) View project WAIA is a production-ready multi-agent system for Microsoft 365 Copilot Chat and Microsoft Teams. A workflow agent routes queries to specialized HR, IT, or Fallback agents, transparently to the user, handling both RAG-pattern Q&A and action automation — including IT ticket submission via a SharePoint list. Technically, it's a showcase of what serious enterprise agent development looks like: a custom MCP server secured with OAuth Identity Passthrough, streaming responses via the OpenAI Responses API, Adaptive Cards for human-in-the-loop approval flows, a debug mode accessible directly from Teams or Copilot, and full OpenTelemetry integration visible in the Foundry portal. Franck also shipped end-to-end automated Bicep deployment so the solution can land in any Azure environment. It's polished, thoroughly documented, and built to be replicated. Thank you To every developer who submitted and shipped projects during Agents League: thank you 💜 Your creativity and innovation brought Agents League to life! 👉 Browse all submissions on GitHubBuild and Deploy a Microsoft Foundry Hosted Agent: A Hands-On Workshop
Agents are easy to demo, hard to ship. Most teams can put together a convincing prototype quickly. The harder part starts afterwards: shaping deterministic tools, validating behaviour with tests, building a CI path, packaging for deployment, and proving the experience through a user-facing interface. That is where many promising projects slow down. This workshop helps you close that gap without unnecessary friction. You get a guided path from local run to deployment handoff, then complete the journey with a working chat UI that calls your deployed hosted agent through the project endpoint. What You Will Build This is a hands-on, end-to-end learning experience for building and deploying AI agents with Microsoft Foundry. The lab provides a guided and practical journey through hosted-agent development, including deterministic tool design, prompt-guided workflows, CI validation, deployment preparation, and UI integration. It’s designed to reduce setup friction with a ready-to-run experience. It is a prompt-based development lab using Copilot guidance and MCP-assisted workflow options during deployment. It’s a .NET 10 workshop that includes local development, Copilot-assisted coding, CI, secure deployment to Azure, and a working chat UI. A local hosted agent that responds on the responses contract Deterministic tool improvements in core logic with xUnit coverage A GitHub Actions CI workflow for restore, build, test, and container validation An Azure-ready deployment path using azd, ACR image publishing, and Foundry manifest apply A Blazor chat UI that calls openai/v1/responses with agent_reference A repeatable implementation shape that teams can adapt to real projects Who This Lab Is For AI developers and software engineers who prefer learning by building Motivated beginners who want a guided, step-by-step path Experienced developers who want a practical hosted-agent reference implementation Architects evaluating deployment shape, validation strategy, and operational readiness Technical decision-makers who need to see how demos become deployable systems Why Hosted Agents Hosted agents run your code in a managed environment. That matters because it reduces the amount of infrastructure plumbing you need to manage directly, while giving you a clearer path to secure, observable, team-friendly deployments. Prompt-only demos are still useful. They are quick, excellent for ideation, and often the right place to start. Hosted agents complement that approach when you need custom code, tool-backed logic, and a deployment process that can be repeated by a team. Think of this lab as the bridge: you keep the speed of prompt-based iteration, then layer in the real-world patterns needed to run reliably. What You Will Learn 1) Orchestration You will practise workflow-oriented reasoning through implementation-shape recommendations and multi-step readiness scenarios. The lab introduces orchestration concepts at a practical level, rather than as a dedicated orchestration framework deep dive. 2) Tool Integration You will connect deterministic tools and understand how tool calls fit into predictable execution paths. This is a core focus of the workshop and is backed by tests in the solution. 3) Retrieval Patterns (What This Lab Covers Today) This workshop does not include a full RAG implementation with embeddings and vector search. Instead, it focuses on deterministic local tools and hosted-agent response flow, giving you a strong foundation before adding retrieval infrastructure in a follow-on phase. 4) Observability You will see light observability foundations through OpenTelemetry usage in the host and practical verification during local and deployed checks. This is introductory coverage intended to support debugging and confidence building. 5) Responsible AI You will apply production-minded safety basics, including secure secret handling and review hygiene. A full Responsible AI policy and evaluation framework is not the primary goal of this workshop, but the workflow does encourage safe habits from the start. 6) Secure Deployment Path You will move from local implementation to Azure deployment with a secure, practical workflow: azd provisioning, ACR publishing, manifest deployment, hosted-agent start, status checks, and endpoint validation. The Learning Journey The overall flow is simple and memorable: clone, open, run, iterate, deploy, observe. clone -> open -> run -> iterate -> deploy -> observe You are not expected to memorize every command. The lab is structured to help you learn through small, meaningful wins that build confidence. Your First 15 Minutes: Quick Wins Open the repo and understand the lab structure in a few minutes Set project endpoint and model deployment environment variables Run the host locally and validate the responses endpoint Inspect the deterministic tools in WorkshopLab.Core Run tests and see how behaviour changes are verified Review the deployment path so local work maps to Azure steps Understand how the UI validates end-to-end behaviour after deployment Leave the first session with a working baseline and a clear next step That first checkpoint is important. Once you see a working loop on your own machine, the rest of the workshop becomes much easier to finish. Using Copilot and MCP in the Workflow This lab emphasises prompt-based development patterns that help you move faster while still learning the underlying architecture. You are not only writing code, you are learning to describe intent clearly, inspect generated output, and iterate with discipline. Copilot supports implementation and review in the coding labs. MCP appears as a practical deployment option for hosted-agent lifecycle actions, provided your tools are authenticated to the correct tenant and project context. Together, this creates a development rhythm that is especially useful for learning: Define intent with clear prompts Generate or adjust implementation details Validate behaviour through tests and UI checks Deploy and observe outcomes in Azure Refine based on evidence, not guesswork That same rhythm transfers well to real projects. Even if your production environment differs, the patterns from this workshop are adaptable. Production-Minded Tips As you complete the lab, keep a production mindset from day one: Reliability: keep deterministic logic small, testable, and explicit Security: Treat secrets, identity, and access boundaries as first-class concerns Observability: use telemetry and status checks to speed up debugging Governance: keep deployment steps explicit so teams can review and repeat them You do not need to solve everything in one pass. The goal is to build habits that make your agent projects safer and easier to evolve. Start Today: If you have been waiting for the right time to move from “interesting demo” to “practical implementation”, this is the moment. The workshop is structured for self-study, and the steps are designed to keep your momentum high. Start here: https://github.com/microsoft/Hosted_Agents_Workshop_Lab Want deeper documentation while you go? These official guides are great companions: Hosted agent quickstart Hosted agent deployment guide When you finish, share what you built. Post a screenshot or short write-up in a GitHub issue/discussion, on social, or in comments with one lesson learned. Your example can help the next developer get unstuck faster. Copy/Paste Progress Checklist [ ] Clone the workshop repo [ ] Complete local setup and run the agent [ ] Make one prompt-based behaviour change [ ] Validate with tests and chat UI [ ] Run CI checks [ ] Provision and deploy via Azure and Foundry workflow [ ] Review observability signals and refine [ ] Share what I built + one takeaway Common Questions How long does it take? Most developers can complete a meaningful pass in a few focused sessions of 60-75 mins. You can get the first local success quickly, then continue through deployment and refinement at your own pace. Do I need an Azure subscription? Yes, for provisioning and deployment steps. You can still begin local development and testing before completing all Azure activities. Is it beginner-friendly? Yes. The labs are written for beginners, run in sequence, and include expected outcomes for each stage. Can I adapt it beyond .NET? Yes. The implementation in this workshop is .NET 10, but the architecture and development patterns can be adapted to other stacks. What if I am evaluating for a team? This lab is a strong team evaluation asset because it demonstrates end-to-end flow: local dev, integration patterns, CI, secure deployment, and operational visibility. Closing This workshop gives you more than theory. It gives you a practical path from first local run to deployed hosted agent, backed by tests, CI, and a user-facing UI validation loop. If you want a build-first route into Microsoft Foundry hosted-agent development, this is an excellent place to start. Begin now: https://github.com/microsoft/Hosted_Agents_Workshop_Lab411Views0likes0CommentsBuilding a Smart Building HVAC Digital Twin with AI Copilot Using Foundry Local
Introduction Building operations teams face a constant challenge: optimizing HVAC systems for energy efficiency while maintaining occupant comfort and air quality. Traditional building management systems display raw sensor data, temperatures, pressures, CO₂ levels—but translating this into actionable insights requires deep HVAC expertise. What if operators could simply ask "Why is the third floor so warm?" and get an intelligent answer grounded in real building state? This article demonstrates building a sample smart building digital twin with an AI-powered operations copilot, implemented using DigitalTwin, React, Three.js, and Microsoft Foundry Local. You'll learn how to architect physics-based simulators that model thermal dynamics, implement 3D visualizations of building systems, integrate natural language AI control, and design fault injection systems for testing and training. Whether you're building IoT platforms for commercial real estate, designing energy management systems, or implementing predictive maintenance for building automation, this sample provides proven patterns for intelligent facility operations. Why Digital Twins Matter for Building Operations Physical buildings generate enormous operational data but lack intelligent interpretation layers. A 50,000 square foot office building might have 500+ sensors streaming metrics every minute, zone temperatures, humidity levels, equipment runtimes, energy consumption. Traditional BMS (Building Management Systems) visualize this data as charts and gauges, but operators must manually correlate patterns, diagnose issues, and predict failures. Digital twins solve this through physics-based simulation coupled with AI interpretation. Instead of just displaying current temperature readings, a digital twin models thermal dynamics, heat transfer rates, HVAC response characteristics, occupancy impacts. When conditions deviate from expectations, the twin compares observed versus predicted states, identifying root causes. Layer AI on top, and operators get natural language explanations: "The conference room is 3 degrees too warm because the VAV damper is stuck at 40% open, reducing airflow by 60%." This application focuses on HVAC, the largest building energy consumer, typically 40-50% of total usage. Optimizing HVAC by just 10% through better controls can save thousands of dollars monthly while improving occupant satisfaction. The digital twin enables "what-if" scenarios before making changes: "What happens to energy consumption and comfort if we raise the cooling setpoint by 2 degrees during peak demand response events?" Architecture: Three-Tier Digital Twin System The application implements a clean three-tier architecture separating visualization, simulation, and state management: The frontend uses React with Three.js for 3D visualization. Users see an interactive 3D model of the three-floor building with color-coded zones indicating temperature and CO₂ levels. Click any equipment, AHUs, VAVs, chillers, to see detailed telemetry. The control panel enables adjusting setpoints, running simulation steps, and activating demand response scenarios. Real-time charts display KPIs: energy consumption, comfort compliance, air quality levels. The backend Node.js/Express server orchestrates simulation and state management. It maintains the digital twin state as JSON, the single source of truth for all equipment, zones, and telemetry. REST API endpoints handle control requests, simulation steps, and AI copilot queries. WebSocket connections push real-time updates to the frontend for live monitoring. The HVAC simulator implements physics-based models: 1R1C thermal models for zones, affinity laws for fan power, chiller COP calculations, CO₂ mass balance equations. Foundry Local provides AI copilot capabilities. The backend uses foundry-local-sdk to query locally running models. Natural language queries ("How's the lobby temperature?") get answered with building state context. The copilot can explain anomalies, suggest optimizations, and even execute commands when explicitly requested. Implementing Physics-Based HVAC Simulation Accurate simulation requires modeling actual HVAC physics. The simulator implements several established building energy models: // backend/src/simulator/thermal-model.js class ZoneThermalModel { // 1R1C (one resistance, one capacitance) thermal model static calculateTemperatureChange(zone, delta_t_seconds) { const C_thermal = zone.volume * 1.2 * 1000; // Heat capacity (J/K) const R_thermal = zone.r_value * zone.envelope_area; // Thermal resistance // Internal heat gains (occupancy, equipment, lighting) const Q_internal = zone.occupancy * 100 + // 100W per person zone.equipment_load + zone.lighting_load; // Cooling/heating from HVAC const airflow_kg_s = zone.vav.airflow_cfm * 0.0004719; // CFM to kg/s const c_p_air = 1006; // Specific heat of air (J/kg·K) const Q_hvac = airflow_kg_s * c_p_air * (zone.vav.supply_temp - zone.temperature); // Envelope losses const Q_envelope = (zone.outdoor_temp - zone.temperature) / R_thermal; // Net energy balance const Q_net = Q_internal + Q_hvac + Q_envelope; // Temperature change: Q = C * dT/dt const dT = (Q_net / C_thermal) * delta_t_seconds; return zone.temperature + dT; } } This model captures essential thermal dynamics while remaining computationally fast enough for real-time simulation. It accounts for internal heat generation from occupants and equipment, HVAC cooling/heating contributions, and heat loss through the building envelope. The CO₂ model uses mass balance equations: class AirQualityModel { static calculateCO2Change(zone, delta_t_seconds) { // CO₂ generation from occupants const G_co2 = zone.occupancy * 0.0052; // L/s per person at rest // Outdoor air ventilation rate const V_oa = zone.vav.outdoor_air_cfm * 0.000471947; // CFM to m³/s // CO₂ concentration difference (indoor - outdoor) const delta_CO2 = zone.co2_ppm - 400; // Outdoor ~400ppm // Mass balance: dC/dt = (G - V*ΔC) / Volume const dCO2_dt = (G_co2 - V_oa * delta_CO2) / zone.volume; return zone.co2_ppm + (dCO2_dt * delta_t_seconds); } } These models execute every simulation step, updating the entire building state: async function simulateStep(twin, timestep_minutes) { const delta_t = timestep_minutes * 60; // Convert to seconds // Update each zone for (const zone of twin.zones) { zone.temperature = ZoneThermalModel.calculateTemperatureChange(zone, delta_t); zone.co2_ppm = AirQualityModel.calculateCO2Change(zone, delta_t); } // Update equipment based on zone demands for (const vav of twin.vavs) { updateVAVOperation(vav, twin.zones); } for (const ahu of twin.ahus) { updateAHUOperation(ahu, twin.vavs); } updateChillerOperation(twin.chiller, twin.ahus); updateBoilerOperation(twin.boiler, twin.ahus); // Calculate system KPIs twin.kpis = calculateSystemKPIs(twin); // Detect alerts twin.alerts = detectAnomalies(twin); // Persist updated state await saveTwinState(twin); return twin; } 3D Visualization with React and Three.js The frontend renders an interactive 3D building view that updates in real-time as conditions change. Using React Three Fiber simplifies Three.js integration with React's component model: // frontend/src/components/BuildingView3D.jsx import { Canvas } from '@react-three/fiber'; import { OrbitControls } from '@react-three/drei'; export function BuildingView3D({ twinState }) { return ( {/* Render building floors */} {twinState.zones.map(zone => ( selectZone(zone.id)} /> ))} {/* Render equipment */} {twinState.ahus.map(ahu => ( ))} ); } function ZoneMesh({ zone, onClick }) { const color = getTemperatureColor(zone.temperature, zone.setpoint); return ( ); } function getTemperatureColor(current, setpoint) { const deviation = current - setpoint; if (Math.abs(deviation) < 1) return '#00ff00'; // Green: comfortable if (Math.abs(deviation) < 3) return '#ffff00'; // Yellow: acceptable return '#ff0000'; // Red: uncomfortable } This visualization immediately shows building state at a glance, operators see "hot spots" in red, comfortable zones in green, and can click any area for detailed metrics. Integrating AI Copilot for Natural Language Control The AI copilot transforms building data into conversational insights. Instead of navigating multiple screens, operators simply ask questions: // backend/src/routes/copilot.js import { FoundryLocalClient } from 'foundry-local-sdk'; const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); router.post('/api/copilot/chat', async (req, res) => { const { message } = req.body; // Load current building state const twin = await loadTwinState(); // Build context for AI const context = buildBuildingContext(twin); const completion = await foundry.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are an HVAC operations assistant for a 3-floor office building. Current Building State: ${context} Answer questions about equipment status, comfort conditions, and energy usage. Provide specific, actionable information based on the current data. Do not speculate beyond provided information.` }, { role: 'user', content: message } ], temperature: 0.3, max_tokens: 300 }); res.json({ response: completion.choices[0].message.content, model: 'phi-4', timestamp: new Date().toISOString() }); }); function buildBuildingContext(twin) { const alerts = twin.alerts.filter(a => a.active); const zoneStatus = twin.zones.map(z => `${z.name}: ${z.temperature.toFixed(1)}°F (setpoint: ${z.setpoint}°F), CO₂: ${z.co2_ppm}ppm, ${z.occupancy} people` ); return ` Zones: ${zoneStatus.join('\n')} Active Alerts: ${alerts.length} ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Energy: ${twin.kpis.total_power_kw.toFixed(1)} kW Chiller: ${twin.chiller.percent_load}% load, ${twin.chiller.cop.toFixed(2)} COP `.trim(); } Example interactions: User: "Why is the third floor conference room warm?" AI: "The Executive Suite on Floor 3 is currently 75.2°F, 3 degrees above the 72°F setpoint. This is due to high occupancy (8 people) and VAV-F3-EXEC operating at reduced airflow (285 CFM vs 450 CFM capacity). AHU-2 is supplying adequate cool air at 55°F. Recommend checking VAV damper position—it may be partially stuck." User: "What's our current energy usage?" AI: "Total building power consumption is 47.3 kW. Breakdown: Chiller (24.1 kW, 51%), AHU fans (12.8 kW, 27%), Pumps (7.2 kW, 15%), Other (3.2 kW, 7%). Chiller is operating at 62% load with a COP of 4.8, which is efficient for current conditions." User: "Activate demand response level 2" AI: "I cannot directly control building systems for safety reasons. To activate DR Level 2, use the Control Panel and click 'Demand Response' → 'Level 2'. This will raise cooling setpoints by 3°F and reduce auxiliary loads, targeting 15% energy reduction." The AI provides grounded, specific answers citing actual equipment IDs and metrics. It refuses to directly execute control commands, instead guiding operators to explicit control interfaces, a critical safety pattern for building systems. Fault Injection for Testing and Training Real building operations experience equipment failures, stuck dampers, sensor drift, communication losses. The digital twin includes comprehensive fault injection capabilities to train operators and test control logic: // backend/src/simulator/fault-injector.js const FAULT_CATALOG = { chillerFailure: { description: 'Chiller compressor failure', apply: (twin) => { twin.chiller.status = 'FAULT'; twin.chiller.cooling_output = 0; twin.alerts.push({ id: 'chiller-fault', severity: 'CRITICAL', message: 'Chiller compressor failure - no cooling available', equipment: 'CHILLER-01' }); } }, stuckVAVDamper: { description: 'VAV damper stuck at current position', apply: (twin, vavId) => { const vav = twin.vavs.find(v => v.id === vavId); vav.damper_stuck = true; vav.damper_position_fixed = vav.damper_position; twin.alerts.push({ id: `vav-stuck-${vavId}`, severity: 'HIGH', message: `VAV ${vavId} damper stuck at ${vav.damper_position}%`, equipment: vavId }); } }, sensorDrift: { description: 'Temperature sensor reading 5°F high', apply: (twin, zoneId) => { const zone = twin.zones.find(z => z.id === zoneId); zone.sensor_drift = 5.0; zone.temperature_measured = zone.temperature_actual + 5.0; } }, communicationLoss: { description: 'Equipment communication timeout', apply: (twin, equipmentId) => { const equipment = findEquipmentById(twin, equipmentId); equipment.comm_status = 'OFFLINE'; equipment.stale_data = true; twin.alerts.push({ id: `comm-loss-${equipmentId}`, severity: 'MEDIUM', message: `Lost communication with ${equipmentId}`, equipment: equipmentId }); } } }; router.post('/api/twin/fault', async (req, res) => { const { faultType, targetEquipment } = req.body; const twin = await loadTwinState(); const fault = FAULT_CATALOG[faultType]; if (!fault) { return res.status(400).json({ error: 'Unknown fault type' }); } fault.apply(twin, targetEquipment); await saveTwinState(twin); res.json({ message: `Applied fault: ${fault.description}`, affectedEquipment: targetEquipment, timestamp: new Date().toISOString() }); }); Operators can inject faults to practice diagnosis and response. Training scenarios might include: "The chiller just failed during a heat wave, how do you maintain comfort?" or "Multiple VAV dampers are stuck, which zones need immediate attention?" Key Takeaways and Production Deployment Building a physics-based digital twin with AI capabilities requires balancing simulation accuracy with computational performance, providing intuitive visualization while maintaining technical depth, and enabling AI assistance without compromising safety. Key architectural lessons: Physics models enable prediction: Comparing predicted vs observed behavior identifies anomalies that simple thresholds miss 3D visualization improves spatial understanding: Operators immediately see which floors or zones need attention AI copilots accelerate diagnosis: Natural language queries get answers in seconds vs. minutes of manual data examination Fault injection validates readiness: Testing failure scenarios prepares operators for real incidents JSON state enables integration: Simple file-based state makes connecting to real BMS systems straightforward For production deployment, connect the twin to actual building systems via BACnet, Modbus, or MQTT integrations. Replace simulated telemetry with real sensor streams. Calibrate model parameters against historical building performance. Implement continuous learning where the twin's predictions improve as it observes actual building behavior. The complete implementation with simulation engine, 3D visualization, AI copilot, and fault injection system is available at github.com/leestott/DigitalTwin. Clone the repository and run the startup scripts to explore the digital twin, no building hardware required. Resources and Further Reading Smart Building HVAC Digital Twin Repository - Complete source code and simulation engine Setup and Quick Start Guide - Installation instructions and usage examples Microsoft Foundry Local Documentation - AI integration reference HVAC Simulation Documentation - Physics model details and calibration Three.js Documentation - 3D visualization framework ASHRAE Standards - Building energy modeling standardsAI‑Powered Troubleshooting for Microsoft Purview Data Lifecycle Management
Announcing the DLM Diagnostics MCP Server! Microsoft Purview Data Lifecycle Management (DLM) policies are critical for meeting compliance and governance requirements across Microsoft 365 workloads. However, when something goes wrong – such as retention policies not applying, archive mailboxes not expanding, or inactive mailboxes not getting purged – diagnosing the issue can be challenging and time‑consuming. To simplify and accelerate this process, we are excited to announce the open‑source release of the DLM Diagnostics Model Context Protocol (MCP) Server, an AI‑powered diagnostic server that allows AI assistants to safely investigate Microsoft Purview DLM issues using read‑only PowerShell diagnostics. GitHub repository: https://github.com/microsoft/purview-dlm-mcp The troubleshooting challenge When you notice issues such as: “Retention policy shows Success, but content isn’t being deleted” “Archiving is enabled, but items never move to the archive mailbox” The investigation typically involves: Connecting to Exchange Online and Security & Compliance PowerShell sessions Running 5–15 diagnostic cmdlets in a specific order Interpreting command output using multiple troubleshooting reference guides (TSGs) Correlating policy distribution, holds, archive configuration, and workload behavior Producing a root‑cause summary and recommended remediation steps This workflow requires deep familiarity with DLM internals and is largely manual. Introducing the DLM Diagnostics MCP Server The DLM Diagnostics MCP Server automates this diagnostic workflow by allowing AI assistants – such as GitHub Copilot, Claude Desktop, and other MCP‑compatible clients – to investigate DLM issues step by step. An administrator simply describes the symptom in natural language. The AI assistant then: Executes read‑only PowerShell diagnostics Evaluates results against known troubleshooting patterns Identifies likely root causes Presents recommended remediation steps (never executed automatically) Produces a complete audit trail of the investigation All diagnostics are performed under a strict security model to ensure safety and auditability. What is the Model Context Protocol (MCP)? The Model Context Protocol (MCP) is an open standard that enables AI assistants to interact with external tools and data sources in a secure and structured way. You can think of MCP as a “USB port for AI”: Any MCP‑compatible client can connect to an MCP server The server exposes well‑defined tools The AI can use those tools safely and deterministically The DLM Diagnostics MCP Server exposes Purview DLM diagnostics as MCP tools, enabling AI assistants to run PowerShell diagnostics, retrieve execution logs, and surface Microsoft Learn documentation. More information: https://modelcontextprotocol.io Diagnostic tools exposed by the server The server exposes four MCP tools. 1. Run read‑only PowerShell diagnostics This tool executes PowerShell commands against Exchange Online and Security & Compliance sessions using a strict allow list. Only read‑only cmdlets are permitted: Allowed verbs: Get-*, Test-*, Export-* Blocked verbs: Set-*, New-*, Remove-*, Enable-*, Invoke-*, and others Every command is validated before execution. Example: Archive mailbox not working Admin: “Archiving is not working for john.doe@contoso.com” The AI follows the archive troubleshooting guide: 1 Step 1 – Check archive mailbox status 2 Get-Mailbox -Identity john.doe@contoso.com | 3 Format-List ArchiveStatus, ArchiveState 4 5 Step 2 – Check archive mailbox size 6 Get-MailboxStatistics -Identity john.doe@contoso.com -Archive | 7 Format-List TotalItemSize, ItemCount 8 9 Step 3 – Check auto-expanding archive 10 Get-Mailbox -Identity john.doe@contoso.com | 11 Format-List AutoExpandingArchiveEnabled Finding The archive mailbox is not enabled. Recommended action (not executed automatically): 1 Enable-Mailbox <user mailbox> –Archive All remediation steps are presented as text only for administrator review. 2. Retrieve the execution log Every diagnostic session is fully logged, including: Command executed Timestamp Duration Status Output Admins can retrieve the complete investigation as a Markdown‑formatted audit trail, making it easy to attach to incident records or compliance documentation. 3. Microsoft Learn documentation lookup If a question does not match a diagnostic scenario – such as “How do I create a retention policy?” – the server falls back to curated Microsoft Learn documentation. The documentation lookup covers 11 Purview areas, including: Retention policies and labels Archive and inactive mailboxes eDiscovery Audit Communication compliance Records management Adaptive scopes 4. Create a GitHub issue (create_issue) create_issue lets the assistant open a feature request in the project’s GitHub repo and attach key session details (such as the commands run and any failures) to help maintainers reproduce and prioritize the request. Example: File a feature request from a failed diagnostic ✅ Created GitHub issue #42 Title: Allowlist should allow Get-ComplianceTag cmdlet Category: feature request Labels: enhancement URL: https://github.com/microsoft/purview-dlm-mcp/issues/42 Session context included: 3 commands executed, 1 failure Security and safety model Security is enforced at multiple layers: Read‑only allow list: Only approved diagnostic cmdlets can run No stored credentials: Authentication uses MSAL interactive sign‑in Session isolation: Each server instance runs in its own PowerShell process Full audit trail: Every command and result is logged No automatic remediation: Fixes are never executed by the server This design ensures diagnostics are safe to run even in sensitive compliance environments. Supported diagnostic scenarios The server currently includes 12 troubleshooting reference guides, covering common DLM issues such as: Retention policy shows Success but content is not retained or deleted Policy status shows Error or PolicySyncTimeout Items do not move to archive mailbox Auto‑expanding archive not triggering Inactive mailbox creation failures SubstrateHolds and Recoverable Items growth Teams messages not deleting Conflicts between MRM and Purview retention Adaptive scope misconfiguration Auto‑apply label failures SharePoint site deletion blocked by retention Unified Audit Configuration validation Each guide maps symptoms to diagnostic checks and remediation guidance. Getting started Prerequisites Node.js 18 or later PowerShell 7 ExchangeOnlineManagement module (v3.4+) Exchange Online administrator permissions Required permissions Option Roles Notes Least-privilege Global Reader + Compliance Administrator Recommended, covers both EXO and S&C read access. Single role group Organization Management Covers both workloads but broader than necessary. Full admin Global Administrator Works but overly broad, not recommended. Exchange Online (Connect-ExchangeOnline): cmdlets like Get-Mailbox, Get-MailboxStatistics, Export-MailboxDiagnosticLogs, Get-OrganizationConfig Security & Compliance (Connect-IPPSSession): cmdlets like Get-RetentionCompliancePolicy, Get-RetentionComplianceRule, Get-AdaptiveScope, Get-ComplianceTag Exchange cmdlets require EXO roles; compliance cmdlets require S&C roles. Without both, some diagnostics will fail with permission errors. Why both workloads? The server connects to two PowerShell sessions: The authenticating user (DLM_UPN) needs read access to both Exchange Online and Security & Compliance PowerShell sessions. MCP client configuration The server can be connected to IDE like Claude Desktop or Visual Studio Code (GitHub Copilot) using MCP configuration. Include this configuration in your MCP config JSON file (for VS Code, use .vscode/mcp.json; for Claude Desktop, use claude_desktop_config.json) { "mcpServers": { "dlm-diagnostics": { "command": "npx", "args": [ "-y", "@microsoft/purview-dlm-mcp" ], "env": { "DLM_UPN": "admin@yourtenant.onmicrosoft.com", "DLM_ORGANIZATION": "yourtenant.onmicrosoft.com", "DLM_COMMAND_TIMEOUT_MS": "180000" } } } } Summary The DLM Diagnostics MCP Server brings AI‑assisted, auditable, and safe troubleshooting to Microsoft Purview Data Lifecycle Management. By combining structured troubleshooting guides with read‑only PowerShell diagnostics and MCP, it significantly reduces the time and expertise required to diagnose complex DLM issues. We invite you to try it out, provide feedback, and contribute to the project via GitHub. GitHub repository: https://github.com/microsoft/purview-dlm-mcp Rishabh Kumar, Victor Legat & Purview Data Lifecycle Management Team1.6KViews2likes0CommentsMicrosoft Agent Framework, Microsoft Foundry, MCP, Aspire를 활용한 실전 예제 만들기
AI 에이전트를 개발하는 것은 점점 쉬워지고 있습니다. 하지만 여러 서비스, 상태 관리, 프로덕션 인프라를 갖춘 실제 애플리케이션의 일부로 배포하는 것은 여전히 복잡합니다. 실제로 .NET 개발자 커뮤니티에서는 로컬 머신과 클라우드 네이티브 방식의 클라우드 환경 모두에서 실제로 동작하는 실전 예제에 대한 요구가 많았습니다. 그래서 준비했습니다! Microsoft Agent Framework과 Microsoft Foundry, MCP(Model Context Protocol), Aspire등을 어떻게 프로덕션 상황에서 조합할 수 있는지를 보여주는 오픈소스 Interview Coach 샘플입니다. AI 코치가 인성 면접 질문과 기술 면접 질문을 안내한 후, 요약을 제공하는 효율적인 면접 시뮬레이터입니다. 이 포스트에서는 어떤 패턴을 사용했고 해당 패턴이 해결할 수 있는 문제를 다룹니다. Interview Coach 데모 앱을 방문해 보세요. 왜 Microsoft Agent Framework을 써야 하나요? .NET으로 AI 에이전트를 구축해 본 적이 있다면, Semantic Kernel이나 AutoGen, 또는 두 가지 모두를 사용해 본 적이 있을 겁니다. Microsoft Agent Framework는 그 다음 단계로서, 각각의 프로젝트에서 효과적이었던 부분을 하나의 프레임워크로 통합했습니다. AutoGen의 에이전트 추상화와 Semantic Kernel의 엔터프라이즈 기능(상태 관리, 타입 안전성, 미들웨어, 텔레메트리 등)을 하나로 통합했습니다. 또한 멀티 에이전트 오케스트레이션을 위한 그래프 기반 워크플로우도 추가했습니다. 그렇다면 .NET 개발자에게 이것이 어떤 의미로 다가올까요? 하나의 프레임워크. Semantic Kernel과 AutoGen 사이에서 더 이상 고민할 필요가 없습니다. 익숙한 패턴. 에이전트는 의존성 주입, IChatClient , 그리고 ASP.NET 앱과 동일한 호스팅 모델을 사용합니다. 프로덕션을 위한 설계. OpenTelemetry, 미들웨어 파이프라인, Aspire 통합이 포함되어 있습니다. 멀티 에이전트 오케스트레이션. 순차 실행, 동시 실행, 핸드오프 패턴, 그룹 채팅 등 다양한 멀티 에이전트 오케스트레이션 패턴을 지원합니다. Interview Coach는 이 모든 것을 Hello World가 아닌 실제 애플리케이션에 적용합니다. 왜 Microsoft Foundry를 써야 하나요? AI 에이전트에는 모델 말고도 더 많은 무언가가 필요합니다. 우선 인프라가 필요하겠죠. Microsoft Foundry는 AI 애플리케이션을 구축하고 관리하기 위한 Azure 플랫폼이며, Microsoft Agent Framework의 권장 백엔드입니다. Foundry는 자체 포털에서 아래와 같은 내용을 제공합니다: 모델 액세스. OpenAI, Meta, Mistral 등의 모델 카탈로그를 하나의 엔드포인트로 제공합니다. 콘텐츠 세이프티. 에이전트가 벗어나지 않도록 기본으로 제공하는 콘텐츠 조정 및 PII 감지 기능이 있습니다. 비용 최적화 라우팅. 에이전트의 요청을 자동으로 최적의 모델로 라우팅합니다. 평가 및 파인튜닝. 에이전트 품질을 측정하고 시간이 지남에 따라 개선할 수 있습니다. 엔터프라이즈 거버넌스. Entra ID와 Microsoft Defender를 통한 ID, 액세스 제어, 규정 준수를 지원합니다. Interview Coach에서 Foundry는 에이전트를 구동하는 모델 엔드포인트를 제공합니다. 에이전트 코드가 IChatClient 인터페이스를 사용하기 때문에, Foundry는 LLM 선택을 위한 설정에 불과할 수도 있겠지만, 에이전트가 필요로 하는 가장 많은 도구를 기본적으로 제공하는 선택지입니다. Interview Coach는 무엇을 하나요? Interview Coach는 모의 면접을 진행하는 대화형 AI입니다. 이력서와 채용 공고를 제공하면, 에이전트가 나머지를 처리합니다: 접수. 이력서와 목표 직무 설명을 수집합니다. 행동 면접. 경험에 맞춘 STAR 기법 질문을 합니다. 기술 면접. 직무별 기술 질문을 합니다. 요약. 구체적인 피드백과 함께 성과 리뷰를 생성합니다. Blazor 웹 UI를 통해 실시간으로 응답 스트리밍을 제공하며 사용자와 에이전트간 상호작용합니다. 아키텍처 개요 애플리케이션은 Aspire를 통해 다양한 서비스를 오케스트레이션합니다: LLM 제공자. 다양한 모델 액세스를 위한 Microsoft Foundry (권장). WebUI. 면접 대화를 위한 Blazor 채팅 인터페이스. 에이전트. Microsoft Agent Framework로 구축된 면접 로직. MarkItDown MCP 서버. Microsoft의 MarkItDown을 통해 이력서(PDF, DOCX)를 마크다운으로 변환합니다. InterviewData MCP 서버. SQLite에 세션을 저장하는 .NET MCP 서버. Aspire가 서비스 디스커버리, 상태 확인, 텔레메트리를 처리합니다. 각 컴포넌트는 별도의 프로세스로 실행시키며, 하나의 커맨드 만으로 전체를 시작할 수 있습니다. 패턴 1: 멀티 에이전트 핸드오프 이 샘플에서 가장 흥미로운 부분이기도 한 핸드오프 패턴으로 멀티 에이전트 시나리오를 구성했습니다. 하나의 에이전트가 모든 것을 처리하는 대신, 면접은 다섯 개의 전문 에이전트로 나뉩니다: 에이전트 역할 도구 Triage 메시지를 적절한 전문가에게 라우팅 없음 (순수 라우팅) Receptionist 세션 생성, 이력서 및 채용 공고 수집 MarkItDown + InterviewData Behavioral Interviewer STAR 기법을 활용한 행동 면접 질문 진행 InterviewData Technical Interviewer 직무별 기술 질문 진행 InterviewData Summarizer 최종 면접 요약 생성 InterviewData 핸드오프 패턴에서는 하나의 에이전트가 대화의 전체 제어권을 다음 에이전트에게 넘깁니다. 그러면 넘겨 받는 에이전트가 모든 제어권을 인수합니다. 이는 주 에이전트가 다른 에이전트를 도우미로 호출하면서도 제어권을 유지하는 "agent-as-tools(도구로서의 에이전트)" 방식과는 다릅니다. 핸드오프 워크플로우를 어떻게 구성하는지 살펴보시죠: var workflow = AgentWorkflowBuilder .CreateHandoffBuilderWith(triageAgent) .WithHandoffs(triageAgent, [receptionistAgent, behaviouralAgent, technicalAgent, summariserAgent]) .WithHandoffs(receptionistAgent, [behaviouralAgent, triageAgent]) .WithHandoffs(behaviouralAgent, [technicalAgent, triageAgent]) .WithHandoffs(technicalAgent, [summariserAgent, triageAgent]) .WithHandoff(summariserAgent, triageAgent) .Build(); 면접 상황을 상상해 본다면 기본적으로 순차적인 방식으로 진행합니다: Receptionist → Behavioral → Technical → Summarizer. 각 전문가가 직접 다음으로 핸드오프합니다. 예상치 못한 상황이 발생하면, 에이전트는 재라우팅을 위해 Triage로 돌아갑니다. 이 샘플에는 더 간단한 배포를 위한 단일 에이전트 모드도 포함하고 있어, 두 가지 접근 방식을 나란히 비교할 수 있습니다. 패턴 2: 도구 통합을 위한 MCP 이 프로젝트에서 도구는 에이전트 내부에 구현하는 대신 MCP(Model Context Protocol) 서버를 통해 통합합니다. 동일한 MarkItDown 서버가 완전히 다른 에이전트 프로젝트에서도 쓰일 수 있으며, 도구 개발팀은 에이전트 개발팀과 독립적으로 배포할 수 있습니다. MCP는 또한 언어에 구애받지 않으므로, 이 샘플 앱에서 쓰인 MarkItDown은 Python 기반의 서버이고, 에이전트는 .NET 기반으로 동작합니다. 에이전트는 시작 시 MCP 클라이언트를 통해 도구를 발견하고, 적절한 에이전트에게 전달합니다: var receptionistAgent = new ChatClientAgent( chatClient: chatClient, name: "receptionist", instructions: "You are the Receptionist. Set up sessions and collect documents...", tools: [.. markitdownTools, .. interviewDataTools]); 각 에이전트는 필요한 도구만 받습니다. Triage는 도구를 받지 않고(라우팅만 수행), 면접관은 세션 액세스를, Receptionist는 문서 파싱과 세션 액세스를 받습니다. 이는 최소 권한 원칙을 따릅니다. 패턴 3: Aspire 오케스트레이션 Aspire가 모든 것을 하나로 연결합니다. 앱 호스트는 서비스 토폴로지를 정의합니다: 어떤 서비스가 존재하는지, 서로 어떻게 의존하는지, 어떤 구성을 받는지. 다음을 제공합니다: 서비스 디스커버리. 서비스가 하드코딩된 URL이 아닌 이름으로 서로를 찾습니다. 상태 확인. Aspire 대시보드에서 모든 컴포넌트의 상태를 보여줍니다. 분산 추적. 공유 서비스 기본값을 통해 OpenTelemetry가 연결됩니다. 단일 커맨드 시작. aspire run --file ./apphost.cs 로 모든 것을 시작합니다. 배포 시, azd up 으로 전체 애플리케이션을 Azure Container Apps에 푸시합니다. 시작하기 사전 요구 사항 .NET 10 SDK 이상 Azure 구독 Microsoft Foundry 프로젝트 Docker Desktop 또는 기타 컨테이너 런타임 로컬에서 실행하기 git clone https://github.com/Azure-Samples/interview-coach-agent-framework.git cd interview-coach-agent-framework # 자격 증명 구성 dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:Endpoint "<your-endpoint>" dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:ApiKey "<your-key>" # 모든 서비스 시작 aspire run --file ./apphost.cs Aspire 대시보드를 열고, 모든 서비스가 Running으로 표시될 때까지 기다린 후, WebUI 엔드포인트를 클릭하여 모의 면접을 시작하세요. 핸드오프 패턴이 어떻게 동작하는지 DevUI에서 시각화한 모습입니다. 이 채팅 UI를 사용하여 면접 후보자로서 에이전트와 상호작용할 수 있습니다. Azure에 배포하기 azd auth login azd up 배포를 위해서는 이게 사실상 전부입니다! Aspire와 azd 가 나머지를 처리합니다. 배포와 테스트를 완료한 후, 다음 명령어를 실행하여 모든 리소스를 안전하게 삭제할 수 있습니다: azd down --force --purge 이 샘플에서 배울 수 있는 것 Interview Coach를 통해 다음을 경험하게 됩니다: Microsoft Foundry를 모델 백엔드로 사용하기 Microsoft Agent Framework로 단일 에이전트 및 멀티 에이전트 시스템 구축하기 핸드오프 오케스트레이션으로 전문 에이전트 간 워크플로우 분할하기 에이전트 코드와 독립적으로 MCP 도구 서버 생성 및 사용하기 Aspire로 멀티 서비스 애플리케이션 오케스트레이션하기 일관되고 구조화된 동작을 생성하는 프롬프트 작성하기 azd up 으로 모든 것 배포하기 사용해 보세요 전체 소스 코드는 GitHub에 있습니다: Azure-Samples/interview-coach-agent-framework Microsoft Agent Framework가 처음이라면, 프레임워크 문서와 Hello World 샘플부터 시작하세요. 그런 다음 여기로 돌아와서 더 큰 프로젝트에서 각 부분이 어떻게 결합되는지 확인하세요. 이러한 패턴으로 무언가를 만들었다면, 이슈를 열어 알려주세요. 다음 계획 다음과 같은 통합 시나리오를 현재 작업 중입니다. 작업이 끝나는 대로 이 샘플 앱을 업데이트 하도록 하겠습니다. Microsoft Foundry Agent Service GitHub Copilot A2A 참고 자료 Microsoft Agent Framework 문서 Microsoft Agent Framework 프리뷰 소개 Microsoft Agent Framework, 릴리스 후보 도달 Microsoft Foundry 문서 Microsoft Foundry Agent Service Microsoft Foundry 포털 Microsoft.Extensions.AI Model Context Protocol 사양 Aspire 문서 ASP.NET BlazorMCP vs mcp-cli: Dynamic Tool Discovery for Token-Efficient AI Agents
Introduction The AI agent ecosystem is evolving rapidly, and with it comes a scaling challenge that many developers are hitting context window bloat. When building systems that integrate with multiple MCP (Model Context Protocol) servers, you're forced to load all tool definitions upfront—consuming thousands of tokens just to describe what tools could be available. mcp-cli: a lightweight tool that changes how we interact with MCP servers. But before diving into mcp-cli, it's essential to understand the foundational protocol itself, the design trade-offs between static and dynamic approaches, and how they differ fundamentally. Part 1: Understanding MCP (Model Context Protocol) What is MCP? The Model Context Protocol (MCP) is an open standard for connecting AI agents and applications to external tools, APIs, and data sources. Think of it as a universal interface that allows: AI Agents (Claude, Gemini, etc.) to discover and call tools Tool Providers to expose capabilities in a standardized way Seamless Integration between diverse systems without custom adapters New to MCP see https://aka.ms/mcp-for-beginners How MCP Works MCP operates on a simple premise: define tools with clear schemas and let clients discover and invoke them. Basic MCP Flow: Tool Provider (MCP Server) ↓ [Tool Definitions + Schemas] ↓ AI Agent / Client ↓ [Discover Tools] → [Invoke Tools] → [Get Results] Example: A GitHub MCP server exposes tools like: search_repositories - Search GitHub repos create_issue - Create a GitHub issue list_pull_requests - List open PRs Each tool comes with a JSON schema describing its parameters, types, and requirements. The Static Integration Problem Traditionally, MCP integration works like this: Startup: Load ALL tool definitions from all servers Context Window: Send every tool schema to the AI model Invocation: Model chooses which tool to call Execution: Tool is invoked and result returned The Problem: When you have multiple MCP servers, the token cost becomes substantial: Scenario Token Count 6 MCP Servers, 60 tools (static loading) ~47,000 tokens After dynamic discovery ~400 tokens Token Reduction 99% 🚀 For a production system with 10+ servers exposing 100+ tools, you're burning through thousands of tokens just describing capabilities, leaving less context for actual reasoning and problem-solving. Key Issues: ❌ Reduced effective context length for actual work ❌ More frequent context compactions ❌ Hard limits on simultaneous MCP servers ❌ Higher API costs Part 2: Enter mcp-cli – Dynamic Context Discovery What is mcp-cli? mcp-cli is a lightweight CLI tool (written in Bun, compiled to a single binary) that implements dynamic context discovery for MCP servers. Instead of loading everything upfront, it pulls in information only when needed. Static vs. Dynamic: The Paradigm Shift Traditional MCP (Static Context): AI Agent Says: "Load all tool definitions from all servers" ↓ Context Window Bloat ❌ ↓ Limited space for reasoning mcp-cli (Dynamic Discovery): AI Agent Says: "What servers exist?" ↓ mcp-cli responds AI Agent Says: "What are the params for tool X?" ↓ mcp-cli responds AI Agent Says: "Execute tool X" mcp-cli executes and responds Result: You only pay for information you actually use. ✅ Core Capabilities mcp-cli provides three primary commands: 1. Discover - What servers and tools exist? mcp-cli Lists all configured MCP servers and their tools. 2. Inspect - What does a specific tool do? mcp-cli info <server> <tool> Returns the full JSON schema for a tool (parameters, descriptions, types). 3. Execute - Run a tool mcp-cli call <server> <tool> '{"arg": "value"}' Executes the tool and returns results. Key Features of mcp-cli Feature Benefit Stdio & HTTP Support Works with both local and remote MCP servers Connection Pooling Lazy-spawn daemon avoids repeated startup overhead Tool Filtering Control which tools are available via allowedTools/disabledTools Glob Searching Find tools matching patterns: mcp-cli grep "*mail*" AI Agent Ready Designed for use in system instructions and agent skills Lightweight Single binary, minimal dependencies Part 3: Detailed Comparison Table Aspect Traditional MCP mcp-cli Protocol HTTP/REST or Stdio Stdio/HTTP (via CLI) Context Loading Static (upfront) Dynamic (on-demand) Tool Discovery All at once Lazy enumeration Schema Inspection Pre-loaded On-request Token Usage High (~47k for 60 tools) Low (~400 for 60 tools) Best For Direct server integration AI agent tool use Implementation Server-side focus CLI-side focus Complexity Medium Low (CLI handles it) Startup Time One call Multiple calls (optimized) Scaling Limited by context Unlimited (pay per use) Integration Custom implementation Pre-built mcp-cli Part 4: When to Use Each Approach Use Traditional MCP (HTTP Endpoints) when: ✅ Building a direct server integration ✅ You have few tools (< 10) and don't care about context waste ✅ You need full control over HTTP requests/responses ✅ You're building a specialized integration (not AI agents) ✅ Real-time synchronous calls are required Use mcp-cli when: ✅ Integrating with AI agents (Claude, Gemini, etc.) ✅ You have multiple MCP servers (> 2-3) ✅ Token efficiency is critical ✅ You want a standardized, battle-tested tool ✅ You prefer CLI-based automation ✅ Connection pooling and lazy loading are beneficial ✅ You're building agent skills or system instructions Conclusion MCP (Model Context Protocol) defines the standard for tool sharing and discovery. mcp-cli is the practical tool that makes MCP efficient for AI agents by implementing dynamic context discovery. The fundamental difference: MCP mcp-cli What The protocol standard The CLI tool Where Both server and client Client-side CLI Problem Solved Tool standardization Context bloat Architecture Protocol Implementation Think of it this way: MCP is the language, mcp-cli is the interpreter that speaks fluently. For AI agent systems, dynamic discovery via mcp-cli is becoming the standard. For direct integrations, traditional MCP HTTP endpoints work fine. The choice depends on your use case, but increasingly, the industry is trending toward mcp-cli for its efficiency and scalability. Resources MCP Specification mcp-cli GitHub New to MCP see https://aka.ms/mcp-for-beginners Practical demo: AnveshMS/mcp-cli-example1.1KViews0likes0CommentsGiving Your AI Agents Reliable Skills with the Agent Skills SDK
AI agents are becoming increasingly capable, but they often do not have the context they need to do real work reliably. Your agent can reason well, but it does not actually know how to do the specific things your team needs it to do. For example, it cannot follow your company's incident response playbook, it does not know your escalation policy, and it has no idea how to page the on-call engineer at 3 AM. There are many ways to close this gap, from RAG to custom tool implementations. Agent Skills is one approach that stands out because it is designed around portability and progressive disclosure, keeping context window usage minimal while giving agents access to deep expertise on demand. What is Agent Skills? Agent Skills is an open format for giving agents new capabilities and expertise. The format was originally developed by Anthropic and released as an open standard. It is now supported by a growing list of agent products including Claude Code, VS Code, GitHub, OpenAI Codex, Cursor, Gemini CLI, and many others. As defined in the spec, a skill is a folder on disk containing a SKILL.md file with metadata and instructions, plus optional scripts, references, and assets: incident-response/ SKILL.md # Required: instructions + metadata references/ # Optional: additional documentation severity-levels.md escalation-policy.md scripts/ # Optional: executable code page-oncall.sh assets/ # Optional: templates, diagrams, data files The SKILL.md file has YAML frontmatter with a name and description (so agents know when the skill is relevant), followed by markdown instructions that tell the agent how to perform the task. The format is intentionally simple: self-documenting, extensible, and portable. What makes this design practical is progressive disclosure. The spec is built around the idea that agents should not load everything at once. It works in three stages: Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant. Activation: When a task matches a skill's description, the agent reads the full SKILL.md instructions into context. Execution: The agent follows the instructions, optionally loading referenced files or executing bundled scripts as needed. This keeps agents fast while giving them access to deep context on demand. The format is well-designed and widely adopted, but if you want to use skills from your own agents, there is a gap between the spec and a working implementation. The Agent Skills SDK Conceptually, a skill is more than a folder. It is a unit of expertise: a name, a description, a body of instructions, and a set of supporting resources. The file layout is one way to represent that, but there is nothing about the concept that requires a filesystem. The Agent Skills SDK is an open-source Python library built around that idea, treating skills as abstract units of expertise that can be stored anywhere and consumed by any agent framework. It does this by addressing two challenges that come up when you try to use the format from your own agents. The first is where skills live. The spec defines skills as folders on disk, and the tools that support the format today all assume skills are local files. Files are inherently portable, and that is one of the format's strengths. But in the real world, not every team can or wants to serve skills from the filesystem. Maybe your team keeps them in an S3 bucket. Maybe they are in Azure Blob Storage behind your CDN. Maybe they live in a database alongside the rest of your application data. At the moment, if your skills are not on the local filesystem, you are on your own. The SDK changes where skills are served from, not how they are authored. The content and format stay the same regardless of the storage backend, so skills remain portable across providers. The second is how agents consume them. The spec defines the progressive disclosure pattern but actually implementing it in your agent requires real work. You need to figure out how to validate skills against the spec, generate a catalog for the system prompt, expose the right tools for on-demand content retrieval, and handle the back-and-forth of the agent requesting metadata, then the body, then individual references or scripts. That is a lot of plumbing regardless of where the skills are stored, and the work multiplies if you want to support more than one agent framework. The SDK solves both by separating where skills come from (providers) from how agents use them (integrations), so you can mix and match freely. Load skills from the filesystem today, move them to an HTTP server tomorrow, swap in a custom database provider next month, and your agent code does not change at all. How the SDK works The SDK is a set of Python packages organized around two ideas: storage-agnostic providers and progressive disclosure. The provider abstraction means your skills can live anywhere. The SDK ships with providers for the local filesystem and static HTTP servers, but the SkillProvider interface is simple enough that you can write your own in a few methods. A Cosmos DB provider, a Git provider, a SharePoint provider, whatever makes sense for your team. The rest of the SDK does not care where the data comes from. On top of that, the SDK implements the progressive disclosure pattern from the spec as a set of tools that any LLM agent can use. At startup, the SDK generates a skills catalog containing each skill's name and description. Your agent injects this catalog into its system prompt so it knows what is available. Then, during a conversation, the agent calls tools to retrieve content on demand, following the same discovery-activation-execution flow the spec describes. Here is the flow in practice: You register skills from any source (local files, an HTTP server, your own database). The SDK generates a catalog and tool usage instructions, which you inject into the system prompt. The agent calls tools to retrieve content on demand. This matters because context windows are finite. An incident response skill might have a main body, three reference documents, two scripts, and a flowchart. The agent should not load all of that upfront. It should read the body first, then pull the escalation policy only when the conversation actually gets to escalation. A quick example Here is what it looks like in practice. Start by loading a skill from the filesystem: from pathlib import Path from agentskills_core import SkillRegistry from agentskills_fs import LocalFileSystemSkillProvider provider = LocalFileSystemSkillProvider(Path("my-skills")) registry = SkillRegistry() await registry.register("incident-response", provider) Now wire it into a LangChain agent: from langchain.agents import create_agent from agentskills_langchain import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = create_agent( llm, tools, system_prompt=system_prompt, ) That is it. The agent now knows what skills are available and has tools to fetch their content. When a user asks "How do I handle a SEV1 incident?", the agent will call get_skill_body to read the instructions, then get_skill_reference to pull the severity levels document, all without you writing any of that retrieval logic. The same pattern works with Microsoft Agent Framework: from agentskills_agentframework import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = Agent( client=client, instructions=system_prompt, tools=tools, ) What is in the SDK The SDK is split into small, composable packages so you only install what you need: agentskills-core handles registration, validation, the skills catalog, and the progressive disclosure API. It also defines the SkillProvider interface that all providers implement. agentskills-fs and agentskills-http are the two built-in providers. The filesystem provider loads skills from local directories. The HTTP provider loads them from any static file host: S3, Azure Blob Storage, GitHub Pages, a CDN, or anything that serves files over HTTP. agentskills-langchain and agentskills-agentframework generate framework-native tools and tool usage instructions from a skill registry. agentskills-mcp-server spins up an MCP server that exposes skill tool access and usage as tools and resources, so any MCP-compatible client can use them. Because providers and integrations are separate packages, you can combine them however you want. Use the filesystem provider during development, switch to the HTTP provider in production, or write a custom provider that reads skills from your own database. The integration layer does not need to know or care. Where to go from here The full source, working examples, and detailed API docs are on GitHub: github.com/pratikxpanda/agentskills-sdk The repo includes end-to-end examples for both LangChain and Microsoft Agent Framework, covering filesystem providers, HTTP providers, and MCP. There is also a sample incident-response skill you can use to try things out. A proposal to contribute this SDK to the official agentskills repository has been submitted. If you find it useful, feel free to show your support on the GitHub issue. To learn more about the Agent Skills format itself: What are skills? covers the format and why it matters. Specification is the complete format reference for SKILL.md files. Integrate skills explains how to add skills support to your agent. Example skills on GitHub are a good starting point for writing your own. The SDK is MIT licensed and contributions are welcome. If you have questions or ideas, post a question here or open an issue on the repo.