mcp
45 TopicsBuilding a Smart Building HVAC Digital Twin with AI Copilot Using Foundry Local
Introduction Building operations teams face a constant challenge: optimizing HVAC systems for energy efficiency while maintaining occupant comfort and air quality. Traditional building management systems display raw sensor data, temperatures, pressures, CO₂ levels—but translating this into actionable insights requires deep HVAC expertise. What if operators could simply ask "Why is the third floor so warm?" and get an intelligent answer grounded in real building state? This article demonstrates building a sample smart building digital twin with an AI-powered operations copilot, implemented using DigitalTwin, React, Three.js, and Microsoft Foundry Local. You'll learn how to architect physics-based simulators that model thermal dynamics, implement 3D visualizations of building systems, integrate natural language AI control, and design fault injection systems for testing and training. Whether you're building IoT platforms for commercial real estate, designing energy management systems, or implementing predictive maintenance for building automation, this sample provides proven patterns for intelligent facility operations. Why Digital Twins Matter for Building Operations Physical buildings generate enormous operational data but lack intelligent interpretation layers. A 50,000 square foot office building might have 500+ sensors streaming metrics every minute, zone temperatures, humidity levels, equipment runtimes, energy consumption. Traditional BMS (Building Management Systems) visualize this data as charts and gauges, but operators must manually correlate patterns, diagnose issues, and predict failures. Digital twins solve this through physics-based simulation coupled with AI interpretation. Instead of just displaying current temperature readings, a digital twin models thermal dynamics, heat transfer rates, HVAC response characteristics, occupancy impacts. When conditions deviate from expectations, the twin compares observed versus predicted states, identifying root causes. Layer AI on top, and operators get natural language explanations: "The conference room is 3 degrees too warm because the VAV damper is stuck at 40% open, reducing airflow by 60%." This application focuses on HVAC, the largest building energy consumer, typically 40-50% of total usage. Optimizing HVAC by just 10% through better controls can save thousands of dollars monthly while improving occupant satisfaction. The digital twin enables "what-if" scenarios before making changes: "What happens to energy consumption and comfort if we raise the cooling setpoint by 2 degrees during peak demand response events?" Architecture: Three-Tier Digital Twin System The application implements a clean three-tier architecture separating visualization, simulation, and state management: The frontend uses React with Three.js for 3D visualization. Users see an interactive 3D model of the three-floor building with color-coded zones indicating temperature and CO₂ levels. Click any equipment, AHUs, VAVs, chillers, to see detailed telemetry. The control panel enables adjusting setpoints, running simulation steps, and activating demand response scenarios. Real-time charts display KPIs: energy consumption, comfort compliance, air quality levels. The backend Node.js/Express server orchestrates simulation and state management. It maintains the digital twin state as JSON, the single source of truth for all equipment, zones, and telemetry. REST API endpoints handle control requests, simulation steps, and AI copilot queries. WebSocket connections push real-time updates to the frontend for live monitoring. The HVAC simulator implements physics-based models: 1R1C thermal models for zones, affinity laws for fan power, chiller COP calculations, CO₂ mass balance equations. Foundry Local provides AI copilot capabilities. The backend uses foundry-local-sdk to query locally running models. Natural language queries ("How's the lobby temperature?") get answered with building state context. The copilot can explain anomalies, suggest optimizations, and even execute commands when explicitly requested. Implementing Physics-Based HVAC Simulation Accurate simulation requires modeling actual HVAC physics. The simulator implements several established building energy models: // backend/src/simulator/thermal-model.js class ZoneThermalModel { // 1R1C (one resistance, one capacitance) thermal model static calculateTemperatureChange(zone, delta_t_seconds) { const C_thermal = zone.volume * 1.2 * 1000; // Heat capacity (J/K) const R_thermal = zone.r_value * zone.envelope_area; // Thermal resistance // Internal heat gains (occupancy, equipment, lighting) const Q_internal = zone.occupancy * 100 + // 100W per person zone.equipment_load + zone.lighting_load; // Cooling/heating from HVAC const airflow_kg_s = zone.vav.airflow_cfm * 0.0004719; // CFM to kg/s const c_p_air = 1006; // Specific heat of air (J/kg·K) const Q_hvac = airflow_kg_s * c_p_air * (zone.vav.supply_temp - zone.temperature); // Envelope losses const Q_envelope = (zone.outdoor_temp - zone.temperature) / R_thermal; // Net energy balance const Q_net = Q_internal + Q_hvac + Q_envelope; // Temperature change: Q = C * dT/dt const dT = (Q_net / C_thermal) * delta_t_seconds; return zone.temperature + dT; } } This model captures essential thermal dynamics while remaining computationally fast enough for real-time simulation. It accounts for internal heat generation from occupants and equipment, HVAC cooling/heating contributions, and heat loss through the building envelope. The CO₂ model uses mass balance equations: class AirQualityModel { static calculateCO2Change(zone, delta_t_seconds) { // CO₂ generation from occupants const G_co2 = zone.occupancy * 0.0052; // L/s per person at rest // Outdoor air ventilation rate const V_oa = zone.vav.outdoor_air_cfm * 0.000471947; // CFM to m³/s // CO₂ concentration difference (indoor - outdoor) const delta_CO2 = zone.co2_ppm - 400; // Outdoor ~400ppm // Mass balance: dC/dt = (G - V*ΔC) / Volume const dCO2_dt = (G_co2 - V_oa * delta_CO2) / zone.volume; return zone.co2_ppm + (dCO2_dt * delta_t_seconds); } } These models execute every simulation step, updating the entire building state: async function simulateStep(twin, timestep_minutes) { const delta_t = timestep_minutes * 60; // Convert to seconds // Update each zone for (const zone of twin.zones) { zone.temperature = ZoneThermalModel.calculateTemperatureChange(zone, delta_t); zone.co2_ppm = AirQualityModel.calculateCO2Change(zone, delta_t); } // Update equipment based on zone demands for (const vav of twin.vavs) { updateVAVOperation(vav, twin.zones); } for (const ahu of twin.ahus) { updateAHUOperation(ahu, twin.vavs); } updateChillerOperation(twin.chiller, twin.ahus); updateBoilerOperation(twin.boiler, twin.ahus); // Calculate system KPIs twin.kpis = calculateSystemKPIs(twin); // Detect alerts twin.alerts = detectAnomalies(twin); // Persist updated state await saveTwinState(twin); return twin; } 3D Visualization with React and Three.js The frontend renders an interactive 3D building view that updates in real-time as conditions change. Using React Three Fiber simplifies Three.js integration with React's component model: // frontend/src/components/BuildingView3D.jsx import { Canvas } from '@react-three/fiber'; import { OrbitControls } from '@react-three/drei'; export function BuildingView3D({ twinState }) { return ( {/* Render building floors */} {twinState.zones.map(zone => ( selectZone(zone.id)} /> ))} {/* Render equipment */} {twinState.ahus.map(ahu => ( ))} ); } function ZoneMesh({ zone, onClick }) { const color = getTemperatureColor(zone.temperature, zone.setpoint); return ( ); } function getTemperatureColor(current, setpoint) { const deviation = current - setpoint; if (Math.abs(deviation) < 1) return '#00ff00'; // Green: comfortable if (Math.abs(deviation) < 3) return '#ffff00'; // Yellow: acceptable return '#ff0000'; // Red: uncomfortable } This visualization immediately shows building state at a glance, operators see "hot spots" in red, comfortable zones in green, and can click any area for detailed metrics. Integrating AI Copilot for Natural Language Control The AI copilot transforms building data into conversational insights. Instead of navigating multiple screens, operators simply ask questions: // backend/src/routes/copilot.js import { FoundryLocalClient } from 'foundry-local-sdk'; const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); router.post('/api/copilot/chat', async (req, res) => { const { message } = req.body; // Load current building state const twin = await loadTwinState(); // Build context for AI const context = buildBuildingContext(twin); const completion = await foundry.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are an HVAC operations assistant for a 3-floor office building. Current Building State: ${context} Answer questions about equipment status, comfort conditions, and energy usage. Provide specific, actionable information based on the current data. Do not speculate beyond provided information.` }, { role: 'user', content: message } ], temperature: 0.3, max_tokens: 300 }); res.json({ response: completion.choices[0].message.content, model: 'phi-4', timestamp: new Date().toISOString() }); }); function buildBuildingContext(twin) { const alerts = twin.alerts.filter(a => a.active); const zoneStatus = twin.zones.map(z => `${z.name}: ${z.temperature.toFixed(1)}°F (setpoint: ${z.setpoint}°F), CO₂: ${z.co2_ppm}ppm, ${z.occupancy} people` ); return ` Zones: ${zoneStatus.join('\n')} Active Alerts: ${alerts.length} ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Energy: ${twin.kpis.total_power_kw.toFixed(1)} kW Chiller: ${twin.chiller.percent_load}% load, ${twin.chiller.cop.toFixed(2)} COP `.trim(); } Example interactions: User: "Why is the third floor conference room warm?" AI: "The Executive Suite on Floor 3 is currently 75.2°F, 3 degrees above the 72°F setpoint. This is due to high occupancy (8 people) and VAV-F3-EXEC operating at reduced airflow (285 CFM vs 450 CFM capacity). AHU-2 is supplying adequate cool air at 55°F. Recommend checking VAV damper position—it may be partially stuck." User: "What's our current energy usage?" AI: "Total building power consumption is 47.3 kW. Breakdown: Chiller (24.1 kW, 51%), AHU fans (12.8 kW, 27%), Pumps (7.2 kW, 15%), Other (3.2 kW, 7%). Chiller is operating at 62% load with a COP of 4.8, which is efficient for current conditions." User: "Activate demand response level 2" AI: "I cannot directly control building systems for safety reasons. To activate DR Level 2, use the Control Panel and click 'Demand Response' → 'Level 2'. This will raise cooling setpoints by 3°F and reduce auxiliary loads, targeting 15% energy reduction." The AI provides grounded, specific answers citing actual equipment IDs and metrics. It refuses to directly execute control commands, instead guiding operators to explicit control interfaces, a critical safety pattern for building systems. Fault Injection for Testing and Training Real building operations experience equipment failures, stuck dampers, sensor drift, communication losses. The digital twin includes comprehensive fault injection capabilities to train operators and test control logic: // backend/src/simulator/fault-injector.js const FAULT_CATALOG = { chillerFailure: { description: 'Chiller compressor failure', apply: (twin) => { twin.chiller.status = 'FAULT'; twin.chiller.cooling_output = 0; twin.alerts.push({ id: 'chiller-fault', severity: 'CRITICAL', message: 'Chiller compressor failure - no cooling available', equipment: 'CHILLER-01' }); } }, stuckVAVDamper: { description: 'VAV damper stuck at current position', apply: (twin, vavId) => { const vav = twin.vavs.find(v => v.id === vavId); vav.damper_stuck = true; vav.damper_position_fixed = vav.damper_position; twin.alerts.push({ id: `vav-stuck-${vavId}`, severity: 'HIGH', message: `VAV ${vavId} damper stuck at ${vav.damper_position}%`, equipment: vavId }); } }, sensorDrift: { description: 'Temperature sensor reading 5°F high', apply: (twin, zoneId) => { const zone = twin.zones.find(z => z.id === zoneId); zone.sensor_drift = 5.0; zone.temperature_measured = zone.temperature_actual + 5.0; } }, communicationLoss: { description: 'Equipment communication timeout', apply: (twin, equipmentId) => { const equipment = findEquipmentById(twin, equipmentId); equipment.comm_status = 'OFFLINE'; equipment.stale_data = true; twin.alerts.push({ id: `comm-loss-${equipmentId}`, severity: 'MEDIUM', message: `Lost communication with ${equipmentId}`, equipment: equipmentId }); } } }; router.post('/api/twin/fault', async (req, res) => { const { faultType, targetEquipment } = req.body; const twin = await loadTwinState(); const fault = FAULT_CATALOG[faultType]; if (!fault) { return res.status(400).json({ error: 'Unknown fault type' }); } fault.apply(twin, targetEquipment); await saveTwinState(twin); res.json({ message: `Applied fault: ${fault.description}`, affectedEquipment: targetEquipment, timestamp: new Date().toISOString() }); }); Operators can inject faults to practice diagnosis and response. Training scenarios might include: "The chiller just failed during a heat wave, how do you maintain comfort?" or "Multiple VAV dampers are stuck, which zones need immediate attention?" Key Takeaways and Production Deployment Building a physics-based digital twin with AI capabilities requires balancing simulation accuracy with computational performance, providing intuitive visualization while maintaining technical depth, and enabling AI assistance without compromising safety. Key architectural lessons: Physics models enable prediction: Comparing predicted vs observed behavior identifies anomalies that simple thresholds miss 3D visualization improves spatial understanding: Operators immediately see which floors or zones need attention AI copilots accelerate diagnosis: Natural language queries get answers in seconds vs. minutes of manual data examination Fault injection validates readiness: Testing failure scenarios prepares operators for real incidents JSON state enables integration: Simple file-based state makes connecting to real BMS systems straightforward For production deployment, connect the twin to actual building systems via BACnet, Modbus, or MQTT integrations. Replace simulated telemetry with real sensor streams. Calibrate model parameters against historical building performance. Implement continuous learning where the twin's predictions improve as it observes actual building behavior. The complete implementation with simulation engine, 3D visualization, AI copilot, and fault injection system is available at github.com/leestott/DigitalTwin. Clone the repository and run the startup scripts to explore the digital twin, no building hardware required. Resources and Further Reading Smart Building HVAC Digital Twin Repository - Complete source code and simulation engine Setup and Quick Start Guide - Installation instructions and usage examples Microsoft Foundry Local Documentation - AI integration reference HVAC Simulation Documentation - Physics model details and calibration Three.js Documentation - 3D visualization framework ASHRAE Standards - Building energy modeling standardsAgents League: Meet the Winners
Agents League brought together developers from around the world to build AI agents using Microsoft's developer tools. With 100+ submissions across three tracks, choosing winners was genuinely difficult. Today, we're proud to announce the category champions. 🎨 Creative Apps Winner: CodeSonify View project CodeSonify turns source code into music. As a genuinely thoughtful system, its functions become ascending melodies, loops create rhythmic patterns, conditionals trigger chord changes, and bugs produce dissonant sounds. It supports 7 programming languages and 5 musical styles, with each language mapped to its own key signature and code complexity directly driving the tempo. What makes CodeSonify stand out is the depth of execution. CodeSonify team delivered three integrated experiences: a web app with real-time visualization and one-click MIDI export, an MCP server exposing 5 tools inside GitHub Copilot in VS Code Agent Mode, and a diff sonification engine that lets you hear a code review. A clean refactor sounds harmonious. A messy one sounds chaotic. The team even built the MIDI generator from scratch in pure TypeScript with zero external dependencies. Built entirely with GitHub Copilot assistance, this is one of those projects that makes you think about code differently. 🧠 Reasoning Agents Winner: CertPrep Multi-Agent System View project CertPrep Multi-Agent System team built a production-grade 8-agent system for personalized Microsoft certification exam preparation, supporting 9 exam families including AI-102, AZ-204, AZ-305, and more. Each agent has a distinct responsibility: profiling the learner, generating a week-by-week study schedule, curating learning paths, tracking readiness, running mock assessments, and issuing a GO / CONDITIONAL GO / NOT YET booking recommendation. The engineering behind the scene here is impressive. A 3-tier LLM fallback chain ensures the system runs reliably even without Azure credentials, with the full pipeline completing in under 1 second in mock mode. A 17-rule guardrail pipeline validates every agent boundary. Study time allocation uses the Largest Remainder algorithm to guarantee no domain is silently zeroed out. 342 automated tests back it all up. This is what thoughtful multi-agent architecture looks like in practice. 💼 Enterprise Agents Winner: Whatever AI Assistant (WAIA) View project WAIA is a production-ready multi-agent system for Microsoft 365 Copilot Chat and Microsoft Teams. A workflow agent routes queries to specialized HR, IT, or Fallback agents, transparently to the user, handling both RAG-pattern Q&A and action automation — including IT ticket submission via a SharePoint list. Technically, it's a showcase of what serious enterprise agent development looks like: a custom MCP server secured with OAuth Identity Passthrough, streaming responses via the OpenAI Responses API, Adaptive Cards for human-in-the-loop approval flows, a debug mode accessible directly from Teams or Copilot, and full OpenTelemetry integration visible in the Foundry portal. Franck also shipped end-to-end automated Bicep deployment so the solution can land in any Azure environment. It's polished, thoroughly documented, and built to be replicated. Thank you To every developer who submitted and shipped projects during Agents League: thank you 💜 Your creativity and innovation brought Agents League to life! 👉 Browse all submissions on GitHubAI‑Powered Troubleshooting for Microsoft Purview Data Lifecycle Management
Announcing the DLM Diagnostics MCP Server! Microsoft Purview Data Lifecycle Management (DLM) policies are critical for meeting compliance and governance requirements across Microsoft 365 workloads. However, when something goes wrong – such as retention policies not applying, archive mailboxes not expanding, or inactive mailboxes not getting purged – diagnosing the issue can be challenging and time‑consuming. To simplify and accelerate this process, we are excited to announce the open‑source release of the DLM Diagnostics Model Context Protocol (MCP) Server, an AI‑powered diagnostic server that allows AI assistants to safely investigate Microsoft Purview DLM issues using read‑only PowerShell diagnostics. GitHub repository: https://github.com/microsoft/purview-dlm-mcp The troubleshooting challenge When you notice issues such as: “Retention policy shows Success, but content isn’t being deleted” “Archiving is enabled, but items never move to the archive mailbox” The investigation typically involves: Connecting to Exchange Online and Security & Compliance PowerShell sessions Running 5–15 diagnostic cmdlets in a specific order Interpreting command output using multiple troubleshooting reference guides (TSGs) Correlating policy distribution, holds, archive configuration, and workload behavior Producing a root‑cause summary and recommended remediation steps This workflow requires deep familiarity with DLM internals and is largely manual. Introducing the DLM Diagnostics MCP Server The DLM Diagnostics MCP Server automates this diagnostic workflow by allowing AI assistants – such as GitHub Copilot, Claude Desktop, and other MCP‑compatible clients – to investigate DLM issues step by step. An administrator simply describes the symptom in natural language. The AI assistant then: Executes read‑only PowerShell diagnostics Evaluates results against known troubleshooting patterns Identifies likely root causes Presents recommended remediation steps (never executed automatically) Produces a complete audit trail of the investigation All diagnostics are performed under a strict security model to ensure safety and auditability. What is the Model Context Protocol (MCP)? The Model Context Protocol (MCP) is an open standard that enables AI assistants to interact with external tools and data sources in a secure and structured way. You can think of MCP as a “USB port for AI”: Any MCP‑compatible client can connect to an MCP server The server exposes well‑defined tools The AI can use those tools safely and deterministically The DLM Diagnostics MCP Server exposes Purview DLM diagnostics as MCP tools, enabling AI assistants to run PowerShell diagnostics, retrieve execution logs, and surface Microsoft Learn documentation. More information: https://modelcontextprotocol.io Diagnostic tools exposed by the server The server exposes four MCP tools. 1. Run read‑only PowerShell diagnostics This tool executes PowerShell commands against Exchange Online and Security & Compliance sessions using a strict allow list. Only read‑only cmdlets are permitted: Allowed verbs: Get-*, Test-*, Export-* Blocked verbs: Set-*, New-*, Remove-*, Enable-*, Invoke-*, and others Every command is validated before execution. Example: Archive mailbox not working Admin: “Archiving is not working for john.doe@contoso.com” The AI follows the archive troubleshooting guide: 1 Step 1 – Check archive mailbox status 2 Get-Mailbox -Identity john.doe@contoso.com | 3 Format-List ArchiveStatus, ArchiveState 4 5 Step 2 – Check archive mailbox size 6 Get-MailboxStatistics -Identity john.doe@contoso.com -Archive | 7 Format-List TotalItemSize, ItemCount 8 9 Step 3 – Check auto-expanding archive 10 Get-Mailbox -Identity john.doe@contoso.com | 11 Format-List AutoExpandingArchiveEnabled Finding The archive mailbox is not enabled. Recommended action (not executed automatically): 1 Enable-Mailbox <user mailbox> –Archive All remediation steps are presented as text only for administrator review. 2. Retrieve the execution log Every diagnostic session is fully logged, including: Command executed Timestamp Duration Status Output Admins can retrieve the complete investigation as a Markdown‑formatted audit trail, making it easy to attach to incident records or compliance documentation. 3. Microsoft Learn documentation lookup If a question does not match a diagnostic scenario – such as “How do I create a retention policy?” – the server falls back to curated Microsoft Learn documentation. The documentation lookup covers 11 Purview areas, including: Retention policies and labels Archive and inactive mailboxes eDiscovery Audit Communication compliance Records management Adaptive scopes 4. Create a GitHub issue (create_issue) create_issue lets the assistant open a feature request in the project’s GitHub repo and attach key session details (such as the commands run and any failures) to help maintainers reproduce and prioritize the request. Example: File a feature request from a failed diagnostic ✅ Created GitHub issue #42 Title: Allowlist should allow Get-ComplianceTag cmdlet Category: feature request Labels: enhancement URL: https://github.com/microsoft/purview-dlm-mcp/issues/42 Session context included: 3 commands executed, 1 failure Security and safety model Security is enforced at multiple layers: Read‑only allow list: Only approved diagnostic cmdlets can run No stored credentials: Authentication uses MSAL interactive sign‑in Session isolation: Each server instance runs in its own PowerShell process Full audit trail: Every command and result is logged No automatic remediation: Fixes are never executed by the server This design ensures diagnostics are safe to run even in sensitive compliance environments. Supported diagnostic scenarios The server currently includes 12 troubleshooting reference guides, covering common DLM issues such as: Retention policy shows Success but content is not retained or deleted Policy status shows Error or PolicySyncTimeout Items do not move to archive mailbox Auto‑expanding archive not triggering Inactive mailbox creation failures SubstrateHolds and Recoverable Items growth Teams messages not deleting Conflicts between MRM and Purview retention Adaptive scope misconfiguration Auto‑apply label failures SharePoint site deletion blocked by retention Unified Audit Configuration validation Each guide maps symptoms to diagnostic checks and remediation guidance. Getting started Prerequisites Node.js 18 or later PowerShell 7 ExchangeOnlineManagement module (v3.4+) Exchange Online administrator permissions Required permissions Option Roles Notes Least-privilege Global Reader + Compliance Administrator Recommended, covers both EXO and S&C read access. Single role group Organization Management Covers both workloads but broader than necessary. Full admin Global Administrator Works but overly broad, not recommended. Exchange Online (Connect-ExchangeOnline): cmdlets like Get-Mailbox, Get-MailboxStatistics, Export-MailboxDiagnosticLogs, Get-OrganizationConfig Security & Compliance (Connect-IPPSSession): cmdlets like Get-RetentionCompliancePolicy, Get-RetentionComplianceRule, Get-AdaptiveScope, Get-ComplianceTag Exchange cmdlets require EXO roles; compliance cmdlets require S&C roles. Without both, some diagnostics will fail with permission errors. Why both workloads? The server connects to two PowerShell sessions: The authenticating user (DLM_UPN) needs read access to both Exchange Online and Security & Compliance PowerShell sessions. MCP client configuration The server can be connected to IDE like Claude Desktop or Visual Studio Code (GitHub Copilot) using MCP configuration. Include this configuration in your MCP config JSON file (for VS Code, use .vscode/mcp.json; for Claude Desktop, use claude_desktop_config.json) { "mcpServers": { "dlm-diagnostics": { "command": "npx", "args": [ "-y", "@microsoft/purview-dlm-mcp" ], "env": { "DLM_UPN": "admin@yourtenant.onmicrosoft.com", "DLM_ORGANIZATION": "yourtenant.onmicrosoft.com", "DLM_COMMAND_TIMEOUT_MS": "180000" } } } } Summary The DLM Diagnostics MCP Server brings AI‑assisted, auditable, and safe troubleshooting to Microsoft Purview Data Lifecycle Management. By combining structured troubleshooting guides with read‑only PowerShell diagnostics and MCP, it significantly reduces the time and expertise required to diagnose complex DLM issues. We invite you to try it out, provide feedback, and contribute to the project via GitHub. GitHub repository: https://github.com/microsoft/purview-dlm-mcp Rishabh Kumar, Victor Legat & Purview Data Lifecycle Management Team1.3KViews2likes0CommentsMicrosoft Agent Framework, Microsoft Foundry, MCP, Aspire를 활용한 실전 예제 만들기
AI 에이전트를 개발하는 것은 점점 쉬워지고 있습니다. 하지만 여러 서비스, 상태 관리, 프로덕션 인프라를 갖춘 실제 애플리케이션의 일부로 배포하는 것은 여전히 복잡합니다. 실제로 .NET 개발자 커뮤니티에서는 로컬 머신과 클라우드 네이티브 방식의 클라우드 환경 모두에서 실제로 동작하는 실전 예제에 대한 요구가 많았습니다. 그래서 준비했습니다! Microsoft Agent Framework과 Microsoft Foundry, MCP(Model Context Protocol), Aspire등을 어떻게 프로덕션 상황에서 조합할 수 있는지를 보여주는 오픈소스 Interview Coach 샘플입니다. AI 코치가 인성 면접 질문과 기술 면접 질문을 안내한 후, 요약을 제공하는 효율적인 면접 시뮬레이터입니다. 이 포스트에서는 어떤 패턴을 사용했고 해당 패턴이 해결할 수 있는 문제를 다룹니다. Interview Coach 데모 앱을 방문해 보세요. 왜 Microsoft Agent Framework을 써야 하나요? .NET으로 AI 에이전트를 구축해 본 적이 있다면, Semantic Kernel이나 AutoGen, 또는 두 가지 모두를 사용해 본 적이 있을 겁니다. Microsoft Agent Framework는 그 다음 단계로서, 각각의 프로젝트에서 효과적이었던 부분을 하나의 프레임워크로 통합했습니다. AutoGen의 에이전트 추상화와 Semantic Kernel의 엔터프라이즈 기능(상태 관리, 타입 안전성, 미들웨어, 텔레메트리 등)을 하나로 통합했습니다. 또한 멀티 에이전트 오케스트레이션을 위한 그래프 기반 워크플로우도 추가했습니다. 그렇다면 .NET 개발자에게 이것이 어떤 의미로 다가올까요? 하나의 프레임워크. Semantic Kernel과 AutoGen 사이에서 더 이상 고민할 필요가 없습니다. 익숙한 패턴. 에이전트는 의존성 주입, IChatClient , 그리고 ASP.NET 앱과 동일한 호스팅 모델을 사용합니다. 프로덕션을 위한 설계. OpenTelemetry, 미들웨어 파이프라인, Aspire 통합이 포함되어 있습니다. 멀티 에이전트 오케스트레이션. 순차 실행, 동시 실행, 핸드오프 패턴, 그룹 채팅 등 다양한 멀티 에이전트 오케스트레이션 패턴을 지원합니다. Interview Coach는 이 모든 것을 Hello World가 아닌 실제 애플리케이션에 적용합니다. 왜 Microsoft Foundry를 써야 하나요? AI 에이전트에는 모델 말고도 더 많은 무언가가 필요합니다. 우선 인프라가 필요하겠죠. Microsoft Foundry는 AI 애플리케이션을 구축하고 관리하기 위한 Azure 플랫폼이며, Microsoft Agent Framework의 권장 백엔드입니다. Foundry는 자체 포털에서 아래와 같은 내용을 제공합니다: 모델 액세스. OpenAI, Meta, Mistral 등의 모델 카탈로그를 하나의 엔드포인트로 제공합니다. 콘텐츠 세이프티. 에이전트가 벗어나지 않도록 기본으로 제공하는 콘텐츠 조정 및 PII 감지 기능이 있습니다. 비용 최적화 라우팅. 에이전트의 요청을 자동으로 최적의 모델로 라우팅합니다. 평가 및 파인튜닝. 에이전트 품질을 측정하고 시간이 지남에 따라 개선할 수 있습니다. 엔터프라이즈 거버넌스. Entra ID와 Microsoft Defender를 통한 ID, 액세스 제어, 규정 준수를 지원합니다. Interview Coach에서 Foundry는 에이전트를 구동하는 모델 엔드포인트를 제공합니다. 에이전트 코드가 IChatClient 인터페이스를 사용하기 때문에, Foundry는 LLM 선택을 위한 설정에 불과할 수도 있겠지만, 에이전트가 필요로 하는 가장 많은 도구를 기본적으로 제공하는 선택지입니다. Interview Coach는 무엇을 하나요? Interview Coach는 모의 면접을 진행하는 대화형 AI입니다. 이력서와 채용 공고를 제공하면, 에이전트가 나머지를 처리합니다: 접수. 이력서와 목표 직무 설명을 수집합니다. 행동 면접. 경험에 맞춘 STAR 기법 질문을 합니다. 기술 면접. 직무별 기술 질문을 합니다. 요약. 구체적인 피드백과 함께 성과 리뷰를 생성합니다. Blazor 웹 UI를 통해 실시간으로 응답 스트리밍을 제공하며 사용자와 에이전트간 상호작용합니다. 아키텍처 개요 애플리케이션은 Aspire를 통해 다양한 서비스를 오케스트레이션합니다: LLM 제공자. 다양한 모델 액세스를 위한 Microsoft Foundry (권장). WebUI. 면접 대화를 위한 Blazor 채팅 인터페이스. 에이전트. Microsoft Agent Framework로 구축된 면접 로직. MarkItDown MCP 서버. Microsoft의 MarkItDown을 통해 이력서(PDF, DOCX)를 마크다운으로 변환합니다. InterviewData MCP 서버. SQLite에 세션을 저장하는 .NET MCP 서버. Aspire가 서비스 디스커버리, 상태 확인, 텔레메트리를 처리합니다. 각 컴포넌트는 별도의 프로세스로 실행시키며, 하나의 커맨드 만으로 전체를 시작할 수 있습니다. 패턴 1: 멀티 에이전트 핸드오프 이 샘플에서 가장 흥미로운 부분이기도 한 핸드오프 패턴으로 멀티 에이전트 시나리오를 구성했습니다. 하나의 에이전트가 모든 것을 처리하는 대신, 면접은 다섯 개의 전문 에이전트로 나뉩니다: 에이전트 역할 도구 Triage 메시지를 적절한 전문가에게 라우팅 없음 (순수 라우팅) Receptionist 세션 생성, 이력서 및 채용 공고 수집 MarkItDown + InterviewData Behavioral Interviewer STAR 기법을 활용한 행동 면접 질문 진행 InterviewData Technical Interviewer 직무별 기술 질문 진행 InterviewData Summarizer 최종 면접 요약 생성 InterviewData 핸드오프 패턴에서는 하나의 에이전트가 대화의 전체 제어권을 다음 에이전트에게 넘깁니다. 그러면 넘겨 받는 에이전트가 모든 제어권을 인수합니다. 이는 주 에이전트가 다른 에이전트를 도우미로 호출하면서도 제어권을 유지하는 "agent-as-tools(도구로서의 에이전트)" 방식과는 다릅니다. 핸드오프 워크플로우를 어떻게 구성하는지 살펴보시죠: var workflow = AgentWorkflowBuilder .CreateHandoffBuilderWith(triageAgent) .WithHandoffs(triageAgent, [receptionistAgent, behaviouralAgent, technicalAgent, summariserAgent]) .WithHandoffs(receptionistAgent, [behaviouralAgent, triageAgent]) .WithHandoffs(behaviouralAgent, [technicalAgent, triageAgent]) .WithHandoffs(technicalAgent, [summariserAgent, triageAgent]) .WithHandoff(summariserAgent, triageAgent) .Build(); 면접 상황을 상상해 본다면 기본적으로 순차적인 방식으로 진행합니다: Receptionist → Behavioral → Technical → Summarizer. 각 전문가가 직접 다음으로 핸드오프합니다. 예상치 못한 상황이 발생하면, 에이전트는 재라우팅을 위해 Triage로 돌아갑니다. 이 샘플에는 더 간단한 배포를 위한 단일 에이전트 모드도 포함하고 있어, 두 가지 접근 방식을 나란히 비교할 수 있습니다. 패턴 2: 도구 통합을 위한 MCP 이 프로젝트에서 도구는 에이전트 내부에 구현하는 대신 MCP(Model Context Protocol) 서버를 통해 통합합니다. 동일한 MarkItDown 서버가 완전히 다른 에이전트 프로젝트에서도 쓰일 수 있으며, 도구 개발팀은 에이전트 개발팀과 독립적으로 배포할 수 있습니다. MCP는 또한 언어에 구애받지 않으므로, 이 샘플 앱에서 쓰인 MarkItDown은 Python 기반의 서버이고, 에이전트는 .NET 기반으로 동작합니다. 에이전트는 시작 시 MCP 클라이언트를 통해 도구를 발견하고, 적절한 에이전트에게 전달합니다: var receptionistAgent = new ChatClientAgent( chatClient: chatClient, name: "receptionist", instructions: "You are the Receptionist. Set up sessions and collect documents...", tools: [.. markitdownTools, .. interviewDataTools]); 각 에이전트는 필요한 도구만 받습니다. Triage는 도구를 받지 않고(라우팅만 수행), 면접관은 세션 액세스를, Receptionist는 문서 파싱과 세션 액세스를 받습니다. 이는 최소 권한 원칙을 따릅니다. 패턴 3: Aspire 오케스트레이션 Aspire가 모든 것을 하나로 연결합니다. 앱 호스트는 서비스 토폴로지를 정의합니다: 어떤 서비스가 존재하는지, 서로 어떻게 의존하는지, 어떤 구성을 받는지. 다음을 제공합니다: 서비스 디스커버리. 서비스가 하드코딩된 URL이 아닌 이름으로 서로를 찾습니다. 상태 확인. Aspire 대시보드에서 모든 컴포넌트의 상태를 보여줍니다. 분산 추적. 공유 서비스 기본값을 통해 OpenTelemetry가 연결됩니다. 단일 커맨드 시작. aspire run --file ./apphost.cs 로 모든 것을 시작합니다. 배포 시, azd up 으로 전체 애플리케이션을 Azure Container Apps에 푸시합니다. 시작하기 사전 요구 사항 .NET 10 SDK 이상 Azure 구독 Microsoft Foundry 프로젝트 Docker Desktop 또는 기타 컨테이너 런타임 로컬에서 실행하기 git clone https://github.com/Azure-Samples/interview-coach-agent-framework.git cd interview-coach-agent-framework # 자격 증명 구성 dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:Endpoint "<your-endpoint>" dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:ApiKey "<your-key>" # 모든 서비스 시작 aspire run --file ./apphost.cs Aspire 대시보드를 열고, 모든 서비스가 Running으로 표시될 때까지 기다린 후, WebUI 엔드포인트를 클릭하여 모의 면접을 시작하세요. 핸드오프 패턴이 어떻게 동작하는지 DevUI에서 시각화한 모습입니다. 이 채팅 UI를 사용하여 면접 후보자로서 에이전트와 상호작용할 수 있습니다. Azure에 배포하기 azd auth login azd up 배포를 위해서는 이게 사실상 전부입니다! Aspire와 azd 가 나머지를 처리합니다. 배포와 테스트를 완료한 후, 다음 명령어를 실행하여 모든 리소스를 안전하게 삭제할 수 있습니다: azd down --force --purge 이 샘플에서 배울 수 있는 것 Interview Coach를 통해 다음을 경험하게 됩니다: Microsoft Foundry를 모델 백엔드로 사용하기 Microsoft Agent Framework로 단일 에이전트 및 멀티 에이전트 시스템 구축하기 핸드오프 오케스트레이션으로 전문 에이전트 간 워크플로우 분할하기 에이전트 코드와 독립적으로 MCP 도구 서버 생성 및 사용하기 Aspire로 멀티 서비스 애플리케이션 오케스트레이션하기 일관되고 구조화된 동작을 생성하는 프롬프트 작성하기 azd up 으로 모든 것 배포하기 사용해 보세요 전체 소스 코드는 GitHub에 있습니다: Azure-Samples/interview-coach-agent-framework Microsoft Agent Framework가 처음이라면, 프레임워크 문서와 Hello World 샘플부터 시작하세요. 그런 다음 여기로 돌아와서 더 큰 프로젝트에서 각 부분이 어떻게 결합되는지 확인하세요. 이러한 패턴으로 무언가를 만들었다면, 이슈를 열어 알려주세요. 다음 계획 다음과 같은 통합 시나리오를 현재 작업 중입니다. 작업이 끝나는 대로 이 샘플 앱을 업데이트 하도록 하겠습니다. Microsoft Foundry Agent Service GitHub Copilot A2A 참고 자료 Microsoft Agent Framework 문서 Microsoft Agent Framework 프리뷰 소개 Microsoft Agent Framework, 릴리스 후보 도달 Microsoft Foundry 문서 Microsoft Foundry Agent Service Microsoft Foundry 포털 Microsoft.Extensions.AI Model Context Protocol 사양 Aspire 문서 ASP.NET BlazorMCP vs mcp-cli: Dynamic Tool Discovery for Token-Efficient AI Agents
Introduction The AI agent ecosystem is evolving rapidly, and with it comes a scaling challenge that many developers are hitting context window bloat. When building systems that integrate with multiple MCP (Model Context Protocol) servers, you're forced to load all tool definitions upfront—consuming thousands of tokens just to describe what tools could be available. mcp-cli: a lightweight tool that changes how we interact with MCP servers. But before diving into mcp-cli, it's essential to understand the foundational protocol itself, the design trade-offs between static and dynamic approaches, and how they differ fundamentally. Part 1: Understanding MCP (Model Context Protocol) What is MCP? The Model Context Protocol (MCP) is an open standard for connecting AI agents and applications to external tools, APIs, and data sources. Think of it as a universal interface that allows: AI Agents (Claude, Gemini, etc.) to discover and call tools Tool Providers to expose capabilities in a standardized way Seamless Integration between diverse systems without custom adapters New to MCP see https://aka.ms/mcp-for-beginners How MCP Works MCP operates on a simple premise: define tools with clear schemas and let clients discover and invoke them. Basic MCP Flow: Tool Provider (MCP Server) ↓ [Tool Definitions + Schemas] ↓ AI Agent / Client ↓ [Discover Tools] → [Invoke Tools] → [Get Results] Example: A GitHub MCP server exposes tools like: search_repositories - Search GitHub repos create_issue - Create a GitHub issue list_pull_requests - List open PRs Each tool comes with a JSON schema describing its parameters, types, and requirements. The Static Integration Problem Traditionally, MCP integration works like this: Startup: Load ALL tool definitions from all servers Context Window: Send every tool schema to the AI model Invocation: Model chooses which tool to call Execution: Tool is invoked and result returned The Problem: When you have multiple MCP servers, the token cost becomes substantial: Scenario Token Count 6 MCP Servers, 60 tools (static loading) ~47,000 tokens After dynamic discovery ~400 tokens Token Reduction 99% 🚀 For a production system with 10+ servers exposing 100+ tools, you're burning through thousands of tokens just describing capabilities, leaving less context for actual reasoning and problem-solving. Key Issues: ❌ Reduced effective context length for actual work ❌ More frequent context compactions ❌ Hard limits on simultaneous MCP servers ❌ Higher API costs Part 2: Enter mcp-cli – Dynamic Context Discovery What is mcp-cli? mcp-cli is a lightweight CLI tool (written in Bun, compiled to a single binary) that implements dynamic context discovery for MCP servers. Instead of loading everything upfront, it pulls in information only when needed. Static vs. Dynamic: The Paradigm Shift Traditional MCP (Static Context): AI Agent Says: "Load all tool definitions from all servers" ↓ Context Window Bloat ❌ ↓ Limited space for reasoning mcp-cli (Dynamic Discovery): AI Agent Says: "What servers exist?" ↓ mcp-cli responds AI Agent Says: "What are the params for tool X?" ↓ mcp-cli responds AI Agent Says: "Execute tool X" mcp-cli executes and responds Result: You only pay for information you actually use. ✅ Core Capabilities mcp-cli provides three primary commands: 1. Discover - What servers and tools exist? mcp-cli Lists all configured MCP servers and their tools. 2. Inspect - What does a specific tool do? mcp-cli info <server> <tool> Returns the full JSON schema for a tool (parameters, descriptions, types). 3. Execute - Run a tool mcp-cli call <server> <tool> '{"arg": "value"}' Executes the tool and returns results. Key Features of mcp-cli Feature Benefit Stdio & HTTP Support Works with both local and remote MCP servers Connection Pooling Lazy-spawn daemon avoids repeated startup overhead Tool Filtering Control which tools are available via allowedTools/disabledTools Glob Searching Find tools matching patterns: mcp-cli grep "*mail*" AI Agent Ready Designed for use in system instructions and agent skills Lightweight Single binary, minimal dependencies Part 3: Detailed Comparison Table Aspect Traditional MCP mcp-cli Protocol HTTP/REST or Stdio Stdio/HTTP (via CLI) Context Loading Static (upfront) Dynamic (on-demand) Tool Discovery All at once Lazy enumeration Schema Inspection Pre-loaded On-request Token Usage High (~47k for 60 tools) Low (~400 for 60 tools) Best For Direct server integration AI agent tool use Implementation Server-side focus CLI-side focus Complexity Medium Low (CLI handles it) Startup Time One call Multiple calls (optimized) Scaling Limited by context Unlimited (pay per use) Integration Custom implementation Pre-built mcp-cli Part 4: When to Use Each Approach Use Traditional MCP (HTTP Endpoints) when: ✅ Building a direct server integration ✅ You have few tools (< 10) and don't care about context waste ✅ You need full control over HTTP requests/responses ✅ You're building a specialized integration (not AI agents) ✅ Real-time synchronous calls are required Use mcp-cli when: ✅ Integrating with AI agents (Claude, Gemini, etc.) ✅ You have multiple MCP servers (> 2-3) ✅ Token efficiency is critical ✅ You want a standardized, battle-tested tool ✅ You prefer CLI-based automation ✅ Connection pooling and lazy loading are beneficial ✅ You're building agent skills or system instructions Conclusion MCP (Model Context Protocol) defines the standard for tool sharing and discovery. mcp-cli is the practical tool that makes MCP efficient for AI agents by implementing dynamic context discovery. The fundamental difference: MCP mcp-cli What The protocol standard The CLI tool Where Both server and client Client-side CLI Problem Solved Tool standardization Context bloat Architecture Protocol Implementation Think of it this way: MCP is the language, mcp-cli is the interpreter that speaks fluently. For AI agent systems, dynamic discovery via mcp-cli is becoming the standard. For direct integrations, traditional MCP HTTP endpoints work fine. The choice depends on your use case, but increasingly, the industry is trending toward mcp-cli for its efficiency and scalability. Resources MCP Specification mcp-cli GitHub New to MCP see https://aka.ms/mcp-for-beginners Practical demo: AnveshMS/mcp-cli-exampleGiving Your AI Agents Reliable Skills with the Agent Skills SDK
AI agents are becoming increasingly capable, but they often do not have the context they need to do real work reliably. Your agent can reason well, but it does not actually know how to do the specific things your team needs it to do. For example, it cannot follow your company's incident response playbook, it does not know your escalation policy, and it has no idea how to page the on-call engineer at 3 AM. There are many ways to close this gap, from RAG to custom tool implementations. Agent Skills is one approach that stands out because it is designed around portability and progressive disclosure, keeping context window usage minimal while giving agents access to deep expertise on demand. What is Agent Skills? Agent Skills is an open format for giving agents new capabilities and expertise. The format was originally developed by Anthropic and released as an open standard. It is now supported by a growing list of agent products including Claude Code, VS Code, GitHub, OpenAI Codex, Cursor, Gemini CLI, and many others. As defined in the spec, a skill is a folder on disk containing a SKILL.md file with metadata and instructions, plus optional scripts, references, and assets: incident-response/ SKILL.md # Required: instructions + metadata references/ # Optional: additional documentation severity-levels.md escalation-policy.md scripts/ # Optional: executable code page-oncall.sh assets/ # Optional: templates, diagrams, data files The SKILL.md file has YAML frontmatter with a name and description (so agents know when the skill is relevant), followed by markdown instructions that tell the agent how to perform the task. The format is intentionally simple: self-documenting, extensible, and portable. What makes this design practical is progressive disclosure. The spec is built around the idea that agents should not load everything at once. It works in three stages: Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant. Activation: When a task matches a skill's description, the agent reads the full SKILL.md instructions into context. Execution: The agent follows the instructions, optionally loading referenced files or executing bundled scripts as needed. This keeps agents fast while giving them access to deep context on demand. The format is well-designed and widely adopted, but if you want to use skills from your own agents, there is a gap between the spec and a working implementation. The Agent Skills SDK Conceptually, a skill is more than a folder. It is a unit of expertise: a name, a description, a body of instructions, and a set of supporting resources. The file layout is one way to represent that, but there is nothing about the concept that requires a filesystem. The Agent Skills SDK is an open-source Python library built around that idea, treating skills as abstract units of expertise that can be stored anywhere and consumed by any agent framework. It does this by addressing two challenges that come up when you try to use the format from your own agents. The first is where skills live. The spec defines skills as folders on disk, and the tools that support the format today all assume skills are local files. Files are inherently portable, and that is one of the format's strengths. But in the real world, not every team can or wants to serve skills from the filesystem. Maybe your team keeps them in an S3 bucket. Maybe they are in Azure Blob Storage behind your CDN. Maybe they live in a database alongside the rest of your application data. At the moment, if your skills are not on the local filesystem, you are on your own. The SDK changes where skills are served from, not how they are authored. The content and format stay the same regardless of the storage backend, so skills remain portable across providers. The second is how agents consume them. The spec defines the progressive disclosure pattern but actually implementing it in your agent requires real work. You need to figure out how to validate skills against the spec, generate a catalog for the system prompt, expose the right tools for on-demand content retrieval, and handle the back-and-forth of the agent requesting metadata, then the body, then individual references or scripts. That is a lot of plumbing regardless of where the skills are stored, and the work multiplies if you want to support more than one agent framework. The SDK solves both by separating where skills come from (providers) from how agents use them (integrations), so you can mix and match freely. Load skills from the filesystem today, move them to an HTTP server tomorrow, swap in a custom database provider next month, and your agent code does not change at all. How the SDK works The SDK is a set of Python packages organized around two ideas: storage-agnostic providers and progressive disclosure. The provider abstraction means your skills can live anywhere. The SDK ships with providers for the local filesystem and static HTTP servers, but the SkillProvider interface is simple enough that you can write your own in a few methods. A Cosmos DB provider, a Git provider, a SharePoint provider, whatever makes sense for your team. The rest of the SDK does not care where the data comes from. On top of that, the SDK implements the progressive disclosure pattern from the spec as a set of tools that any LLM agent can use. At startup, the SDK generates a skills catalog containing each skill's name and description. Your agent injects this catalog into its system prompt so it knows what is available. Then, during a conversation, the agent calls tools to retrieve content on demand, following the same discovery-activation-execution flow the spec describes. Here is the flow in practice: You register skills from any source (local files, an HTTP server, your own database). The SDK generates a catalog and tool usage instructions, which you inject into the system prompt. The agent calls tools to retrieve content on demand. This matters because context windows are finite. An incident response skill might have a main body, three reference documents, two scripts, and a flowchart. The agent should not load all of that upfront. It should read the body first, then pull the escalation policy only when the conversation actually gets to escalation. A quick example Here is what it looks like in practice. Start by loading a skill from the filesystem: from pathlib import Path from agentskills_core import SkillRegistry from agentskills_fs import LocalFileSystemSkillProvider provider = LocalFileSystemSkillProvider(Path("my-skills")) registry = SkillRegistry() await registry.register("incident-response", provider) Now wire it into a LangChain agent: from langchain.agents import create_agent from agentskills_langchain import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = create_agent( llm, tools, system_prompt=system_prompt, ) That is it. The agent now knows what skills are available and has tools to fetch their content. When a user asks "How do I handle a SEV1 incident?", the agent will call get_skill_body to read the instructions, then get_skill_reference to pull the severity levels document, all without you writing any of that retrieval logic. The same pattern works with Microsoft Agent Framework: from agentskills_agentframework import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = Agent( client=client, instructions=system_prompt, tools=tools, ) What is in the SDK The SDK is split into small, composable packages so you only install what you need: agentskills-core handles registration, validation, the skills catalog, and the progressive disclosure API. It also defines the SkillProvider interface that all providers implement. agentskills-fs and agentskills-http are the two built-in providers. The filesystem provider loads skills from local directories. The HTTP provider loads them from any static file host: S3, Azure Blob Storage, GitHub Pages, a CDN, or anything that serves files over HTTP. agentskills-langchain and agentskills-agentframework generate framework-native tools and tool usage instructions from a skill registry. agentskills-mcp-server spins up an MCP server that exposes skill tool access and usage as tools and resources, so any MCP-compatible client can use them. Because providers and integrations are separate packages, you can combine them however you want. Use the filesystem provider during development, switch to the HTTP provider in production, or write a custom provider that reads skills from your own database. The integration layer does not need to know or care. Where to go from here The full source, working examples, and detailed API docs are on GitHub: github.com/pratikxpanda/agentskills-sdk The repo includes end-to-end examples for both LangChain and Microsoft Agent Framework, covering filesystem providers, HTTP providers, and MCP. There is also a sample incident-response skill you can use to try things out. A proposal to contribute this SDK to the official agentskills repository has been submitted. If you find it useful, feel free to show your support on the GitHub issue. To learn more about the Agent Skills format itself: What are skills? covers the format and why it matters. Specification is the complete format reference for SKILL.md files. Integrate skills explains how to add skills support to your agent. Example skills on GitHub are a good starting point for writing your own. The SDK is MIT licensed and contributions are welcome. If you have questions or ideas, post a question here or open an issue on the repo.Level up your Python + AI skills with our complete series
We've just wrapped up our live series on Python + AI, a comprehensive nine-part journey diving deep into how to use generative AI models from Python. The series introduced multiple types of models, including LLMs, embedding models, and vision models. We dug into popular techniques like RAG, tool calling, and structured outputs. We assessed AI quality and safety using automated evaluations and red-teaming. Finally, we developed AI agents using popular Python agents frameworks and explored the new Model Context Protocol (MCP). To help you apply what you've learned, all of our code examples work with GitHub Models, a service that provides free models to every GitHub account holder for experimentation and education. Even if you missed the live series, you can still access all the material using the links below! If you're an instructor, feel free to use the slides and code examples in your own classes. If you're a Spanish speaker, check out the Spanish version of the series. Python + AI: Large Language Models 📺 Watch recording In this session, we explore Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We use Python to interact with LLMs using popular packages like the OpenAI SDK and LangChain. We experiment with prompt engineering and few-shot examples to improve outputs. We also demonstrate how to build a full-stack app powered by LLMs and explain the importance of concurrency and streaming for user-facing AI apps. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vector embeddings 📺 Watch recording In our second session, we dive into a different type of model: the vector embedding model. A vector embedding is a way to encode text or images as an array of floating-point numbers. Vector embeddings enable similarity search across many types of content. In this session, we explore different vector embedding models, such as the OpenAI text-embedding-3 series, through both visualizations and Python code. We compare distance metrics, use quantization to reduce vector size, and experiment with multimodal embedding models. Slides for this session Code repository with examples: vector-embedding-demos Python + AI: Retrieval Augmented Generation 📺 Watch recording In our third session, we explore one of the most popular techniques used with LLMs: Retrieval Augmented Generation. RAG is an approach that provides context to the LLM, enabling it to deliver well-grounded answers for a particular domain. The RAG approach works with many types of data sources, including CSVs, webpages, documents, and databases. In this session, we walk through RAG flows in Python, starting with a simple flow and culminating in a full-stack RAG application based on Azure AI Search. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vision models 📺 Watch recording Our fourth session is all about vision models! Vision models are LLMs that can accept both text and images, such as GPT-4o and GPT-4o mini. You can use these models for image captioning, data extraction, question answering, classification, and more! We use Python to send images to vision models, build a basic chat-with-images app, and create a multimodal search engine. Slides for this session Code repository with examples: openai-chat-vision-quickstart Python + AI: Structured outputs 📺 Watch recording In our fifth session, we discover how to get LLMs to output structured responses that adhere to a schema. In Python, all you need to do is define a Pydantic BaseModel to get validated output that perfectly meets your needs. We focus on the structured outputs mode available in OpenAI models, but you can use similar techniques with other model providers. Our examples demonstrate the many ways you can use structured responses, such as entity extraction, classification, and agentic workflows. Slides for this session Code repository with examples: python-openai-demos Python + AI: Quality and safety 📺 Watch recording This session covers a crucial topic: how to use AI safely and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. We focus on Azure tools that make it easier to deploy safe AI systems into production. We demonstrate how to configure the Azure AI Content Safety system when working with Azure AI models and how to handle errors in Python code. Then we use the Azure AI Evaluation SDK to evaluate the safety and quality of output from your LLM. Slides for this session Code repository with examples: ai-quality-safety-demos Python + AI: Tool calling 📺 Watch recording In the final part of the series, we focus on the technologies needed to build AI agents, starting with the foundation: tool calling (also known as function calling). We define tool call specifications using both JSON schema and Python function definitions, then send these definitions to the LLM. We demonstrate how to properly handle tool call responses from LLMs, enable parallel tool calling, and iterate over multiple tool calls. Understanding tool calling is absolutely essential before diving into agents, so don't skip over this foundational session. Slides for this session Code repository with examples: python-openai-demos Python + AI: Agents 📺 Watch recording In the penultimate session, we build AI agents! We use Python AI agent frameworks such as the new agent-framework from Microsoft and the popular LangGraph framework. Our agents start simple and then increase in complexity, demonstrating different architectures such as multiple tools, supervisor patterns, graphs, and human-in-the-loop workflows. Slides for this session Code repository with examples: python-ai-agent-frameworks-demos Python + AI: Model Context Protocol 📺 Watch recording In the final session, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like LangGraph and Microsoft agent-framework to MCP servers. With great power comes great responsibility, so we briefly discuss the security risks that come with MCP, both as a user and as a developer. Slides for this session Code repository with examples: python-mcp-demo7.9KViews3likes0CommentsAgents League: Two Weeks, Three Tracks, One Challenge
We're inviting all developers to join Agents League, running February 16-27. It's a two-week challenge where you'll build AI agents using production-ready tools, learn from live coding sessions, and get feedback directly from Microsoft product teams. We've put together starter kits for each track to help you get up and running quickly that also includes requirements and guidelines. Whether you want to explore what GitHub Copilot can do beyond autocomplete, build reasoning agents on Microsoft Foundry, or create enterprise integrations for Microsoft 365 Copilot, we have a track for you. Important: Register first to be eligible for prizes and your digital badge. Without registration, you won't qualify for awards or receive a badge when you submit. What Is Agents League? It's a 2-week competition that combines learning with building: 📽️ Live coding battles – Watch Product teams, MVPs and community members tackle challenges in real-time on Microsoft Reactor 💻 Async challenges – Build at your own pace, on your schedule 💬 Discord community – Connect with other participants, join AMAs, and get help when you need it 🏆 Prizes – $500 per track winner, plus GitHub Copilot Pro subscriptions for top picks The Three Tracks 🎨 Creative Apps — Build with GitHub Copilot (Chat, CLI, or SDK) 🧠 Reasoning Agents — Build with Microsoft Foundry 💼 Enterprise Agents — Build with M365 Agents Toolkit (or Copilot Studio) More details on each track below, or jump straight to the starter kits. The Schedule Agents League starts on February 16th and runs through Feburary 27th. Within 2 weeks, we host live battles on Reactor and AMA sessions on Discord. Week 1: Live Battles (Feb 17-19) We're kicking off with live coding battles streamed on Microsoft Reactor. Watch experienced developers compete in real-time, explaining their approach and architectural decisions as they go. Tue Feb 17, 9 AM PT — 🎨 Creative Apps battle Wed Feb 18, 9 AM PT — 🧠 Reasoning Agents battle Thu Feb 19, 9 AM PT — 💼 Enterprise Agents battle All sessions are recorded, so you can watch on your own schedule. Week 2: Build + AMAs (Feb 24-26) This is your time to build and ask questions on Discord. The async format means you work when it suits you, evenings, weekends, whatever fits your schedule. We're also hosting AMAs on Discord where you can ask questions directly to Microsoft experts and product teams: Tue Feb 24, 9 AM PT — 🎨 Creative Apps AMA Wed Feb 25, 9 AM PT — 🧠 Reasoning Agents AMA Thu Feb 26, 9 AM PT — 💼 Enterprise Agents AMA Bring your questions, get help when you're stuck, and share what you're building with the community. Pick Your Track We've created a starter kit for each track with setup guides, project ideas, and example scenarios to help you get started quickly. 🎨 Creative Apps Tool: GitHub Copilot (Chat, CLI, or SDK) Build innovative, imaginative applications that showcase the potential of AI-assisted development. All application types are welcome, web apps, CLI tools, games, mobile apps, desktop applications, and more. The starter kit walks you through GitHub Copilot's different modes and provides prompting tips to get the best results. View the Creative Apps starter kit. 🧠 Reasoning Agents Tool: Microsoft Foundry (UI or SDK) and/or Microsoft Agent Framework Build a multi-agent system that leverages advanced reasoning capabilities to solve complex problems. This track focuses on agents that can plan, reason through multi-step problems, and collaborate. The starter kit includes architecture patterns, reasoning strategies (planner-executor, critic/verifier, self-reflection), and integration guides for tools and MCP servers. View the Reasoning Agents starter kit. 💼 Enterprise Agents Tool: M365 Agents Toolkit or Copilot Studio Create intelligent agents that extend Microsoft 365 Copilot to address real-world enterprise scenarios. Your agent must work on Microsoft 365 Copilot Chat. Bonus points for: MCP server integration, OAuth security, Adaptive Cards UI, connected agents (multi-agent architecture). View the Enterprise Agents starter kit. Prizes & Recognition To be eligible for prizes and your digital badge, you must register before submitting your project. Category Winners ($500 each): 🎨 Creative Apps winner 🧠 Reasoning Agents winner 💼 Enterprise Agents winner GitHub Copilot Pro subscriptions: Community Favorite (voted by participants on Discord) Product Team Picks (selected by Microsoft product teams) Everyone who registers and submits a project wins: A digital badge to showcase their participation. Beyond the prizes, every participant gets feedback from the teams who built these tools, a valuable opportunity to learn and improve your approach to AI agent development. How to Get Started Register first — This is required to be eligible for prizes and to receive your digital badge. Without registration, your submission won't qualify for awards or a badge. Pick a track — Choose one track. Explore the starter kits to help you decide. Watch the battles — See how experienced developers approach these challenges. Great for learning even if you're still deciding whether to compete. Build your project — You have until Feb 27. Work on your own schedule. Submit via GitHub — Open an issue using the project submission template. Join us on Discord — Get help, share your progress, and vote for your favorite projects on Discord. Links Register: https://aka.ms/agentsleague/register Starter Kits: https://github.com/microsoft/agentsleague/starter-kits Discord: https://aka.ms/agentsleague/discord Live Battles: https://aka.ms/agentsleague/battles Submit Project: Project submission templateUsing on-behalf-of flow for Entra-based MCP servers
In December, we presented a series about MCP, culminating in a session about adding authentication to MCP servers. I demoed a Python MCP server that uses Microsoft Entra for authentication, requiring users to first login to the Microsoft tenant before they could use a tool. Many developers asked how they could take the Entra integration further, like to check the user's group membership or query their OneDrive. That requires using an "on-behalf-of" flow, where the MCP server uses the user's Entra identity to call another API, like the Microsoft Graph API. In this blog post, I will explain how to use Entra with OBO flow in a Python FastMCP server. How MCP servers can use Entra authentication The MCP authorization specification is based on OAuth2, with some additional features tacked on top. Every MCP client is actually an OAuth2 client, and each MCP server is an OAuth2 resource server: MCP auth adds these features to help clients determine how to authorize a server: Protected resource metadata (PRM): Implemented on the MCP server, provides details about the authorization server and method Authorization server metadata: Implemented on the authorization server, gives URLs for OAuth2 endpoints Additionally, to allow MCP servers to work with arbitrary MCP clients, MCP auth supports either of these client registration methods: Dynamic Client Registration (DCR): Implemented on the authorization server, it can register new MCP clients as OAuth2 clients, even if it hasn't seen them before. Client ID Metadata Documents (CIMD): An alternative to DCR, this requires both the MCP client to make a CIMD document available on a server, and requires the authorization server to fetch the CIMD document for details about the client. Microsoft Entra does support authorization server metadata, but it does not support either DCR or CIMD. That's actually fine if you are building an MCP server that's only going to be used with pre-authorized clients, like if the server will only be used with VS Code or with a specific internal MCP client. But, if you are building an MCP server that can be used with arbitrary MCP clients, then either DCR or CIMD is required. So what do we do? Fortunately, the FastMCP SDK implements DCR on top of Entra using an OAuth proxy pattern. FastMCP acts as the authorization server, intercepting requests and forwarding to Entra when needed, and storing OAuth client information in a designated database (like in-memory or Cosmos DB). ⚠️ Warning: This proxy approach is intended only for development and testing scenarios. For production deployments, Microsoft recommends using pre‑registered client applications where client identifiers and permissions are explicitly created, reviewed, and approved on a per-app basis. Let's walk through the steps to set that up. Registering the server with Entra Before the server can use Entra to authorize users, we need to register the server with Entra via an app registration. We can do registration using the Azure Portal, Azure CLI, Microsoft Graph SDK, or even Bicep. In this demo, I use the Python MS Graph SDK as it allows me to specify everything programmatically. First, I create the Entra app registration, specifying the sign-in audience (single-tenant), redirect URIs (including local MCP server, deployed MCP server, and VS Code redirect URIs), and the scopes for the exposed API. request_app = Application( display_name="FastMCP Server App", sign_in_audience="AzureADMyOrg", # Single tenant web=WebApplication( redirect_uris=[ "http://localhost:8000/auth/callback", "https://vscode.dev/redirect", "http://127.0.0.1:33418", "https://deployedurl.com/auth/callback" ], ), api=ApiApplication( oauth2_permission_scopes=[ PermissionScope( id=uuid.UUID("{" + str(uuid.uuid4()) + "}"), admin_consent_display_name="Access FastMCP Server", admin_consent_description="Allows access to the FastMCP server as the signed-in user.", user_consent_display_name="Access FastMCP Server", user_consent_description="Allow access to the FastMCP server on your behalf", is_enabled=True, value="mcp-access", type="User", )], requested_access_token_version=2, # Required by FastMCP ) ) app = await graph_client.applications.post(request_app) await graph_client.applications.by_application_id(app.id).patch( Application(identifier_uris=[f"api://{app.app_id}"])) Thanks to that configuration, when an MCP client like VS Code requests an OAuth2 token, it will request a token with the scope "api://{app.app_id}/mcp-access", and the FastMCP server will validate that incoming tokens contain that scope. Next, I create a Service Principal for that Entra app registration, which represents the Entra app in my tenant request_principal = ServicePrincipal(app_id=app.app_id, display_name=app.display_name) await graph_client.service_principals.post(request_principal) I need a way for the server to prove that it can use that Entra app registration, so I register a secret: password_credential = await graph_client.applications.by_application_id(app.id).add_password.post( AddPasswordPostRequestBody( password_credential=PasswordCredential(display_name="FastMCPSecret"))) Ideally, I would like to move away from secrets, as Entra now has support for using federated identity credentials for Entra app registrations instead, but that form of credential isn't supported yet in the FastMCP SDK. If you choose to use a secret, make sure that you store the secret securely. Granting admin consent This next step is only necessary when our MCP server wants to use an OBO flow to exchange access tokens for other resource server tokens (Graph API tokens, in this case). For the OBO flow to work, the Entra app registration needs permission to call the Graph API on behalf of users. If we controlled the client, we could force it to request the required scopes as part of the initial login dialog. However, since we are configuring this server to work with arbitrary MCP clients, we don't have that option. Instead, we grant admin consent to the Entra app for the necessary scopes, such that no Graph API consent dialog is needed. This code grants admin consent to the associated service principal for the Graph API resource and scopes: server_principal = await graph_client.service_principals_with_app_id(app.app_id).get() grant = GrantDefinition( principal_id=server_principal.id, resource_app_id="00000003-0000-0000-c000-000000000000", # Graph API scopes=["User.Read", "email", "offline_access", "openid", "profile"], target_label="server application") resource_principal = await graph_client.service_principals_with_app_id(grant.resource_app_id).get() desired_scope = grant.scope_string() await graph_client.oauth2_permission_grants.post( OAuth2PermissionGrant( client_id=grant.principal_id, consent_type="AllPrincipals", resource_id=resource_principal.id, scope=desired_scope)) If our MCP server needed to use an OBO flow with another resource server, we could request additional grants for those resources and scopes. Our Entra app registration is now ready for the MCP server, so let's move on to see the server code. Using FastMCP servers with Entra In our MCP server code, we configure FastMCP's built in AzureProvider based off the details from the Entra app registration process: auth = AzureProvider( client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"], client_secret=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"], tenant_id=os.environ["AZURE_TENANT_ID"], base_url=entra_base_url, # MCP server URL required_scopes=["mcp-access"], client_storage=oauth_client_store, # in-memory or Cosmos DB ) To make it easy for our MCP tools to access an identifier for the currently logged in user, we define a middleware that inspects the claims of the current token using FastMCP's get_access_token() and sets the "oid" (Entra object identifier) in the state: class UserAuthMiddleware(Middleware): def _get_user_id(self): token = get_access_token() if not (token and hasattr(token, "claims")): return None return token.claims.get("oid") async def on_call_tool(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) async def on_read_resource(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) When we initialize the FastMCP server, we set the auth provider and include that middleware: mcp = FastMCP("Expenses Tracker", auth=auth, middleware=[UserAuthMiddleware()]) Now, every request made to the MCP server will require authentication. The server will return a 401 if a valid token isn't provided, and that 401 will prompt the MCP client to kick off the MCP authorization flow. Inside each tool, we can grab the user id from the state, and use that to customize the response for the user, like to store or query items in a database. .tool async def add_user_expense( date: Annotated[date, "Date of the expense in YYYY-MM-DD format"], amount: Annotated[float, "Positive numeric amount of the expense"], description: Annotated[str, "Human-readable description of the expense"], ctx: Context, ): """Add a new expense to Cosmos DB.""" user_id = ctx.get_state("user_id") if not user_id: return "Error: Authentication required (no user_id present)" expense_item = { "id": str(uuid.uuid4()), "user_id": user_id, "date": date.isoformat(), "amount": amount, "description": description } await cosmos_container.create_item(body=expense_item) Using OBO flow in FastMCP server Now we can move on to using an OBO flow inside an MCP tool, to access the Graph API on behalf of the user. To make it easy to exchange Entra tokens for Graph tokens, we use the Python MSAL SDK, configuring a ConfidentialClientApplication based on our Entra app registration details: confidential_client = ConfidentialClientApplication( client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"], client_credential=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"], authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}", token_cache=TokenCache(), ) Inside the tool that requires OBO, we ask MSAL to exchange the MCP access token for a Graph API access token: access_token = get_access_token() graph_resource_access_token = confidential_client.acquire_token_on_behalf_of( user_assertion=access_token.token, scopes=["https://graph.microsoft.com/.default"] ) graph_token = graph_resource_access_token["access_token"] Once we successfully acquire the token, we can use that token with the Graph API, for any operations permitted by the scopes in the admin consent granted earlier. For this example, we call the Graph API to check whether the logged in user is a member of a particular Entra group, and restrict tool usage if not: async with httpx.AsyncClient() as client: url = ("https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group" f"?$filter=id eq '{group_id}'&$count=true") response = await client.get( url, headers={ "Authorization": f"Bearer {graph_token}", "ConsistencyLevel": "eventual", }) data = response.json() membership_count = data.get("@odata.count", 0) You could imagine many other ways to use an OBO flow however, like to query for more details from the Graph API, upload documents to OneDrive/SharePoint/Notes, send emails, and more! All together now For the full code, check out the open source python-mcp-demos repository, and follow the deployment steps for Entra. The most relevant code files are: auth_init.py: Creates the Entra app registration, service principal, client secret, and grants admin consent for OBO flow. auth_update.py: Updates the app registration's redirect URIs after deployment, adding the deployed server URL. auth_entra_mcp.py: The MCP server itself, configured with FastMCP's AzureProvider and tools that use OBO for group membership checks. I want to reiterate once more that the OAuth proxy approach is intended only for development and testing scenarios. For production deployments, Microsoft recommends using pre‑registered client applications where client identifiers and permissions are explicitly created, reviewed, and approved on a per-app basis. I hope that in the future, Entra will formally support MCP authorization via the CIMD protocol, so that we can build MCP servers with Entra auth that work with MCP clients in a fully secure and production-ready way. As always, please let us know if you have further questions or ideas for other Entra integrations. Acknowledgements: Thank you to Matt Gotteiner for his guidance in implementing the OBO flow and review of the blog post.Season of AI - MCP
Join us for an exciting deep dive into the Model Context Protocol (MCP) - a revolutionary open protocol that's transforming how AI systems interact with data sources and tools. This 90-minute session is part of our Season of AI series, designed to keep you at the forefront of AI innovation. #### What You'll Learn Discover how MCP is standardizing AI-data source connections and enabling more intelligent, context-aware AI applications. ### Event Highlights Understanding MCP Fundamentals: Learn the core concepts and architecture of the Model Context Protocol Real-World Applications: Explore practical use cases and implementation scenarios Hands-On Demonstration: See MCP in action with live coding examples Integration Strategies: Discover how to integrate MCP into your AI workflows Best Practices: Learn industry-standard approaches for MCP implementation Q&A Session: Get your questions answered by an expert practitioner146Views0likes0Comments