agents
107 TopicsMicrosoft 365 & Power Platform Community call
💡 Microsoft 365 & Power Platform Development bi-weekly community call focuses on different use cases and features within the Microsoft 365 and Power Platform - across Microsoft 365 Copilot, Copilot Studio, SharePoint, Power Apps and more. 👏 Looking to catch up on the latest news and updates, including cool community demos, this call is for you! 📅 On 12th of February we'll have following agenda: Copilot prompt of the week CommunityDays.org update Microsoft 365 Maturity model Latest on PnP Framework and Core SDK extension Latest on PnP PowerShell Latest on script samples Latest Copilot pro dev samples Latest on Power Platform samples Picture time with the Together Mode! Mohammed Amer (Atea Global Services) – Reverse Engineering: Teaching GitHub Copilot to Configure Vitest Unit Testing for your SPFx apps Peter Paul Kirschner (ACP Cubido) – Creating React Office Breakout game with SPFx - Vision, Motion, and a Little Chaos 📅 Download recurrent invite from https://aka.ms/community/m365-powerplat-dev-call-invite 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/m365-powerplat-dev-call-join 👋 See you in the call! 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!58Views0likes0CommentsMicrosoft 365 & Power Platform product updates call
💡Microsoft 365 & Power Platform product updates call concentrates on the different use cases and features within the Microsoft 365 and in Power Platform. Call includes topics like Microsoft 365 Copilot, Copilot Studio, Microsoft Teams, Power Platform, Microsoft Graph, Microsoft Viva, Microsoft Search, Microsoft Lists, SharePoint, Power Automate, Power Apps and more. 👏 Weekly Tuesday call is for all community members to see Microsoft PMs, engineering and Cloud Advocates showcasing the art of possible with Microsoft 365 and Power Platform. 📅 On the 10th of February we'll have following agenda: News and updates from Microsoft Together mode group photo Aaron Glick – Introduction to Microsoft Teams on personal devices for frontline employees Steve Pucelik – Add AI capabilities to your SharePoint Embedded app Mithuna Soundararaj & Akash Ravi – Getting started with Agents in OneDrive 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/ms-speakers-call-join 🗓️ Download recurrent invite for this weekly call from https://aka.ms/community/ms-speakers-call-invite 👋 See you in the call! 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!73Views0likes0CommentsHow Sales Development Agent Helps Teams Scale Outbound Outreach Without Sacrificing Quality
Enterprise sales organizations face a persistent challenge: scaling outbound operations while maintaining message quality, brand consistency, and conversion performance. As teams grow and lead volumes increase, the gap between strategic intent and execution widens; making it harder for sellers to spend time on higher-quality leads and opportunities. The Sales Development Agent (SDA) addresses this gap through a fundamentally different approach. Rather than relying on sellers to manually handle repetitive qualification and early-stage outreach, SDA consistently executes your defined playbook at scale, freeing your team to focus on what they do best: building relationships and closing deals with pre-qualified, high-intent prospects. This post examines how SDA systematizes best practices, enables responsive two-way engagement, and delivers measurable performance improvements. It also includes a rigorous, transparent comparison of SDA performance against ChatGPT using identical inputs and evaluation criteria. Operationalizing Strategy Across Enterprise In most large organizations today, outbound quality depends heavily on individual execution. Sellers must: Adapt messaging frameworks to specific contexts under time constraints Maintain brand voice and positioning consistency across thousands of interactions Personalize outreach while balancing speed and quality These manual processes compound across teams, geographies, and business units, making consistency difficult to achieve and nearly impossible to maintain during periods of growth or organizational change. The Sales Development Agent reduces this operational complexity by embedding your outbound strategy into interactions. Performance Validation: Early Deployment Results at Microsoft Microsoft’s Small and Medium Enterprises & Channel (SME&C) organization served as an early adopter for the Sales Development Agent, focused on engaging underserved SMB customers with limited prior Microsoft engagement with the goal of driving a hyperpersonalized, high-quality experience and relationship with Microsoft and its cloud solutions. The Sales Development Agent is reframing how Microsoft deploys its sales capacity. Rather than requiring sellers to handle repetitive qualification work across thousands of early-stage leads, SDA absorbs this foundational activity, enabling sellers to focus their expertise on pre-qualified opportunities further down the funnel where their strategic judgment and relationship-building skills drive the greatest impact. During a 20-week pilot starting in February and concluding in June 2025, the Sales Development Agent engaged more than 70,000 existing Microsoft SMB customers. Customers engaged by the Sales Development Agent showed an 8-percentage point increase in opportunity conversion rate, effectively doubling the opportunity yield, compared to manual seller-led outreach using the same lead pools, timeframes, and follow-up processes. Starting with Microsoft's smallest customers provides an opportunity to refine the approach before expanding to larger segments, ultimately transforming how sales capacity is allocated across the entire customer base, moving sellers from repetitive qualification to high-value activities like opportunity management and deal closure. Note: Results from pilot deployments may not be representative of all use cases or implementations. Performance may vary based on industry, lead quality, organizational context, and implementation approach. How SDA Works at Scale Centralized Strategy Definition Organizations provide SDA with value propositions, brand guidelines, proven messaging examples, guardrails, and CTAs. This creates a single source of truth for outbound communications. Configurable Quality Standards SDA adapts to your organization's definition of effective outreach, including personalization, email structure, and your messaging priorities. Consistent Application Across All Touchpoints Whether managing 100 or 1,000 outbound interactions, across multiple teams or markets, SDA maintains strategic alignment without variance in quality or brand representation. Strategic Impact Consistency at scale: Every message reflects organizational strategy, regardless of volume or team composition Operational efficiency: Reduced time spent on repetitive personalization and message iteration Predictable performance: Quality remains stable during high-volume periods, organizational transitions, or rapid scaling SDA functions as an operational layer that helps ensure strategic decisions translate into consistent execution, allowing sales professionals to spend less time on repetitive qualification and more time on high-intent opportunities. Beyond Initial Outreach: Managing Full Conversation Cycles Most AI-assisted email solutions generate single outbound messages. SDA extends beyond initial contact to manage complete conversation cycles within the guardrails defined by sales leadership. Intelligent Two-Way Engagement When prospects respond, SDA maintains conversation continuity by: Addressing clarifying questions with accurate, contextually relevant information Providing appropriate details drawn from organizational playbooks and documentation Maintaining tone, positioning, and brand voice throughout the exchange This enables organizations to maintain response velocity and engagement quality without proportional increases in headcount. Governance-Based Escalation SDA automatically routes conversations to human sales professionals when it identifies: High intent buying signals requiring strategic engagement Sentiment shifts or concerns requiring nuanced handling Complex scenarios demanding human judgment and relationship building Leadership teams define escalation thresholds and autonomy boundaries, ensuring SDA augments conventional sales expertise. The result is increased conversation capacity without degradation in response quality, prospect experience, or conversion performance. Results and quality We’ve recently announced the Microsoft Sales Bench, a new collection of evaluation benchmarks designed to assess the performance of AI-powered sales agents across real-world scenarios. This framework brings together purpose-built metrics, hundreds of sales-specific scenarios, and composite scoring validated by both human and AI judges. Today, we’re extending the Microsoft Sales Bench with an additional benchmark: the Microsoft Sales Development Agent Bench, focused on measuring how effectively AI agents scale sales team’s capacity, systematize best practices, enable responsive two-way engagement and qualify leads. SDA vs. ChatGPT To understand how SDA performs in real-world outbound scenarios, we conducted a controlled comparison against ChatGPT under strictly identical conditions. The purpose of this evaluation was straightforward: to determine whether a sales-tuned agent meaningfully outperforms a general-purpose model when both are given exactly the same inputs. Sales teams need clarity on whether SDA’s grounding, structure, and playbook integration translate into better outreach in practice, and our early results show that they do. This evaluation was completed on 11/24/2025 using Version 1 of the Sales Development Agent and ChatGPT (GPT-4.1, accessed via the ChatGPT UI). Evaluation Methodology Systems Evaluated: Sales Development Agent (SDA): Version 1 (November 2025) ChatGPT (GPT-4.1): Accessed through the ChatGPT web UI Both models were required to follow the same output schema and receive the same contextual inputs. Test Dataset: The evaluation was run on early scenarios which reflects real-world enterprise sales conditions. These give us a grounded, realistic environment to compare personalization depths, recency integration and structural consistency across models. The evaluation included 390 test scenarios spanning 35 industries and company sizes ranging 55-1.2M employees. Evaluation Process: We designed the evaluation to ensure both systems were tested under identical conditions. 1. Identical Input Payload: received the same structured context based on the SDA evaluation framework: Prospect profile Company and industry context Product knowledge Sales playbook guidance Tone and brand guidelines Required email schema + HTML formatting rules (subject + body paragraphs) This removed any advantage from model-specific prior knowledge. 2.Shared System Prompt Requirements: Both models used a system prompt which enforces: A concise, personalized outreach email No invented facts A consistent email structure with paragraph boundaries This removed prompt-engineering differences and ensured alignment in expectations. 3. Blinded evaluation: Evaluators scored all outputs blindly, without knowing which system generated which email.This eliminated potential bias in scoring. 4. Scoring Rubric(1-10) Emails were evaluated on five quality dimensions: Clarity: Assesses whether the email communicates its message precisely and without unnecessary complexity, avoiding jargon and ensuring each sentence adds value. Personalization: Evaluates how specifically the email is tailored to the target company by referencing concrete details from their context (e.g., initiatives, recent events, or specific goals). Recency: Assesses whether the email draws on events, updates, or announcements from the context provided, and whether those are recent relative to date email was generated. Relevance: Evaluates how directly and realistically the solution in the email addresses a plausible, active business challenge or opportunity for the target company. Structure: Evaluates the logical organization of the email, ensuring it flows smoothly from hook to problem to solution to call-to-action (CTA) with coherent transitions. Each dimension was scored from 1 (poor) to 10 (excellent). Scores we then combined into an overall composite score using the weighted average across dimensions. Quantitative Performance Results Across all quality dimensions, SDA delivered improved results over ChatGPT, in particular with Recency which can drive outbound performance. Metric ChatGPT SDA Difference Clarity 8.95 8.99 +0.04 Personalization 8.56 8.84 +0.28 Recency 3.50 7.60 +4.10 Relevance 8.69 8.99 +0.30 Structure 8.77 8.99 +0.23 Overall 7.69 8.68 +0.99 Qualitative Performance Observations Why Recency Matters Most: In sales outreach, incorporated the prospect’s latest activity dramatically increases relevance and response rates. SDA’s strong performance on Recency reflects its ability to systematically surface and integrate these critical signal while general-purpose models often overlook them when provided the same information. Beyond the quantitative scores, evaluators noted several consistent patterns: SDA grounded recency more reliably SDA consistently incorporated the latest prospect activity and marketing interactions; ChatGPT often overlooked them. SDA delivered deeper, more accurate personalization It aligned messaging tightly to the prospect’s role, industry, and context. ChatGPT tended to generalize, even with identical inputs. SDAmaintainedstricter structure SDA’s outputs consistently followed paragraph boundaries and clean sequencing; ChatGPT occasionally drifted. SDA avoided introducing unsupported details Its grounding constraints ensured messages stayed tied to provided inputs. ChatGPT sometimes generalized or hallucinated and introduced details not present. Future Development These results represent our initial evaluation baseline, but the consistently high scores indicate that our current framework isn’t yet challenging enough to drive the next wave of quality improvements. Our early rubric was designed to validate foundational outbound quality but as the product matures we will introduce more rigorous scenarios, sharper scoring criteria, and additional dimensions to better distinguish strong performance from exceptional performance. High early scores do not signal that SDA has reached its quality ceiling, they simply show that our evaluation framework must mature as the product does. Commitment to Transparency and Independent Validation Microsoft intends to make the full evaluation framework available in the coming months, enabling customers to replicate these results, benchmark SDA against their own playbooks and data, and independently validate performance in their environments. For Enterprise Decision-Makers: This will enable you to validate SDA performance against your specific use cases, lead profiles, and quality standards before deployment decisions, using your own data and success criteria. For Development Teams: You will be able to access the evaluation methodology, run comparative tests with your playbooks and data, and measure performance differences in your operational environment. Strategic Value for Enterprise Sales Organizations SDA enables sales organizations to: Maintain quality at scale: Deliver consistent, high-quality outreach across expanding operations without proportional resource increases Reduce operational friction: Eliminate repetitive personalization and message iteration, reallocating time to high-value activities Increase response capacity: Manage higher conversation volumes while maintaining response quality and velocity Optimize how teams spend their time: Ensure sales professionals engage at moments requiring expertise, relationship building, and strategic judgment Systematize institutional knowledge: Transform playbooks and best practices from static documentation into operational reality When best practices become systematic rather than aspirational, sales teams can redirect their expertise toward the activities that truly differentiate enterprise sales performance: relationship development, strategic account management, and closing deals with pre-qualified, high-intent prospects. Important Disclaimers Performance Results: Quality scores reflect results from controlled pilot deployments and evaluations with specific customer environments and use cases. Actual results may vary significantly based on industry vertical, lead quality, organizational context, implementation approach, existing sales processes, and numerous other factors. These results should not be considered guaranteed or typical outcomes. Competitive Comparison: The ChatGPT evaluation was conducted on November 2025 using GPT-4.1 accessed via the ChatGPT web UI. ChatGPT capabilities, features, and performance may have changed since this evaluation. The comparison reflects performance under specific test conditions and may not represent performance across all possible use cases or implementations. Product Evolution: Both SDA and competitive solutions continue to evolve. Evaluation results represent a point-in-time comparison and should be periodically reassessed as products develop. If you’re interested in learning more: Check out this article Use and collaborate with agents | Microsoft Learn Read the D365 blog Powering Frontier Firms with agentic business applications Watch this demo video299Views1like0CommentsHow to Build Safe Natural Language-Driven APIs
TL;DR Building production natural language APIs requires separating semantic parsing from execution. Use LLMs to translate user text into canonical structured requests (via schemas), then execute those requests deterministically. Key patterns: schema completion for clarification, confidence gates to prevent silent failures, code-based ontologies for normalization, and an orchestration layer. This keeps language as input, not as your API contract. Introduction APIs that accept natural language as input are quickly becoming the norm in the age of agentic AI apps and LLMs. From search and recommendations to workflows and automation, users increasingly expect to "just ask" and get results. But treating natural language as an API contract introduces serious risks in production systems: Nondeterministic behavior Prompt-driven business logic Difficult debugging and replay Silent failures that are hard to detect In this post, I'll describe a production-grade architecture for building safe, natural language-driven APIs: one that embraces LLMs for intent discovery and entity extraction while preserving the determinism, observability, and reliability that backend systems require. This approach is based on building real systems using Azure OpenAI and LangGraph, and on lessons learned the hard way. The Core Problem with Natural Language APIs Natural language is an excellent interface for humans. It is a poor interface for systems. When APIs accept raw text directly and execute logic based on it, several problems emerge: The API contract becomes implicit and unversioned Small prompt changes cause behavioral changes Business logic quietly migrates into prompts In short: language becomes the contract, and that's fragile. The solution is not to avoid natural language, but to contain it. A Key Principle: Natural Language Is Input, Not a Contract So how do we contain it? The answer lies in treating natural language fundamentally differently than we treat traditional API inputs. The most important design decision we made was this: Natural language should be translated into structure, not executed directly. That single principle drives the entire architecture. Instead of building "chatty APIs," we split responsibilities clearly: Natural language is used for intent discovery and entity extraction Structured data is used for execution Two Explicit API Layers This principle translates into a concrete architecture with two distinct API layers, each with a single, clear responsibility. 1. Semantic Parse API (Natural Language → Structure) This API: Accepts user text Extracts intent and entities using LLMs Completes a predefined schema Asks clarifying questions when required Returns a canonical, structured request Does not execute business logic Think of this as a compiler, not an engine. 2. Structured Execution API (Structure → Action) This API: Accepts only structured input Calls downstream systems to process the request and get results Is deterministic and versioned Contains no natural language handling Is fully testable and replayable This is where execution happens. Why This Separation Matters Separating these layers gives you: A stable, versionable API contract Freedom to improve NLP without breaking clients Clear ownership boundaries Deterministic execution paths Most importantly, it prevents LLM behavior from leaking into core business logic. Canonical Schemas Are the Backbone Now that we've established the two-layer architecture, let's dive into what makes it work: canonical schemas. Each supported intent is defined by a canonical schema that lives in code. Example (simplified): This schema is used when a user is looking for similar product recommendations. The entities capture which product to use as reference and how to bias the recommendations toward price or quality. { "intent": "recommend_similar", "entities": { "reference_product_id": "string", "price_bias": "number (-1 to 1)", "quality_bias": "number (-1 to 1)" } } Schemas define: Required vs optional fields Allowed ranges and types Validation rules They are the contract, not the prompt. When a user says "show me products like the blue backpack but cheaper", the LLM extracts: Intent: recommend_similar reference_product_id: "blue_backpack_123" price_bias: -0.8 (strongly prefer cheaper) quality_bias: 0.0 (neutral) The schema ensures that even if the user phrased it as "find alternatives to item 123 with better pricing" or "cheaper versions of that blue bag", the output is always the same structure. The natural language variation is absorbed at the semantic layer. The execution layer receives a consistent, validated request every time. This decoupling is what makes the system maintainable. Schema Completion, Not Free-Form Chat But what happens when the user's input doesn't contain all the information needed to complete the schema? This is where structured clarification comes in. A common misconception is that clarification means "chatting until it feels right." In production systems, clarification is schema completion. If required fields are missing or ambiguous, the semantic API responds with: What information is missing A targeted clarification question The current schema state Example response: { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } The state object is the memory. The API itself remains stateless. A Complete Conversation Flow To illustrate how schema completion works in practice, here's a full conversation flow where the user's initial request is missing required information: Initial Request: User: "Show me cheaper alternatives with good quality" API Response (needs clarification): { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } Follow-up Request: User: "The blue backpack" Client sends: { "user_input": "The blue backpack", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } API Response (complete): { "status": "complete", "canonical_request": { "intent": "recommend_similar", "entities": { "reference_product_id": "blue_backpack_123", "price_bias": -0.3, "quality_bias": 0.4 } } } The client passes the state back with each clarification. The API remains stateless, while the client manages the conversation context. Once complete, the canonical_request can be sent directly to the execution API. Why LangGraph Fits This Problem Perfectly With schemas and clarification flows defined, we need a way to orchestrate the semantic parsing workflow reliably. This is where LangGraph becomes valuable. LangGraph allows semantic parsing to be modeled as a structured, deterministic workflow with explicit decision points: Classify intent: Determine what the user wants to do from a predefined set of supported actions Extract candidate entities: Pull out relevant parameters from the natural language input using the LLM Merge into schema state: Map the extracted values into the canonical schema structure Validate required fields: Check if all mandatory fields are present and values are within acceptable ranges Either complete or request clarification: Return the canonical request if complete, or ask a targeted question if information is missing Each node has a single responsibility. Validation and routing are done in code, not by the LLM. LangGraph provides: Explicit state transitions Deterministic routing Observable execution Safe retries Used this way, it becomes a powerful orchestration tool, not a conversational agent. Confidence Gates Prevent Silent Failures Structured workflows handle the process, but there's another critical safety mechanism we need: knowing when the LLM isn't confident about its extraction. Even when outputs are structurally valid, they may not be reliable. We require the semantic layer to emit a confidence score. If confidence falls below a threshold, execution is blocked and clarification is requested. This simple rule eliminates an entire class of silent misinterpretations that are otherwise very hard to detect. Example: When a user says "Show me items similar to the bag", the LLM might extract: { "intent": "recommend_similar", "confidence": 0.55, "entities": { "reference_product_id": "generic_bag_001", "confidence_scores": { "reference_product_id": 0.4 } } } The overall confidence is low (0.55), and the entity confidence for reference_product_id is very low (0.4) because "the bag" is ambiguous. There might be hundreds of bags in the catalog. Instead of proceeding with a potentially wrong guess, the API responds: { "status": "needs_clarification", "reason": "low_confidence", "question": "I found multiple bags. Did you mean the blue backpack, the leather tote, or the travel duffel?", "confidence": 0.55 } This prevents the system from silently executing the wrong recommendation and provides a better user experience. Lightweight Ontologies (Keep Them in Code) Beyond confidence scoring, we need a way to normalize the variety of terms users might use into consistent canonical values. We also introduced lightweight, code-level ontologies: Allowed intents Required entities per intent Synonym-to-canonical mappings Cross-field validation rules These live in code and configuration, not in prompts. LLMs propose values. Code enforces meaning. Example: Consider these user inputs that all mean the same thing: "Show me cheaper options" "Find budget-friendly alternatives" "I want something more affordable" "Give me lower-priced items" The LLM might extract different values: "cheaper", "budget-friendly", "affordable", "lower-priced". The ontology maps all of these to a canonical value: PRICE_BIAS_SYNONYMS = { "cheaper": -0.7, "budget-friendly": -0.7, "affordable": -0.7, "lower-priced": -0.7, "expensive": 0.7, "premium": 0.7, "high-end": 0.7 } When the LLM extracts "budget-friendly", the code normalizes it to -0.7 for the price_bias field. Similarly, cross-field validation catches logical inconsistencies: if entities["price_bias"] < -0.5 and entities["quality_bias"] > 0.5: return clarification("You want cheaper items with higher quality. This might be difficult. Should I prioritize price or quality?") The LLM proposes. The ontology normalizes. The validation enforces business rules. What About Latency? A common concern with multi-step semantic parsing is performance. In practice, we observed: Intent classification: ~40 ms Entity extraction: ~200 ms Validation and routing: ~1 ms Total overhead: ~250–300 ms. For chat-driven user experiences, this is well within acceptable bounds and far cheaper than incorrect or inconsistent execution. Key Takeaways Let's bring it all together. If you're building APIs that accept natural language in production: Do not make language your API contract Translate language into canonical structure Own schema completion server-side Use LLMs for discovery and extraction, not execution Treat safety and determinism as first-class requirements Natural language is an input format. Structure is the contract. Closing Thoughts LLMs make it easy to build impressive demos. Building safe, reliable systems with them requires discipline. By separating semantic interpretation from execution, and by using tools like Azure OpenAI and LangGraph thoughtfully, you can build natural language-driven APIs that scale, evolve, and behave predictably in production. Hopefully, this architecture saves you a few painful iterations.The Perfect Fusion of GitHub Copilot SDK and Cloud Native
In today's rapidly evolving AI landscape, we've witnessed the transformation from simple chatbots to sophisticated agent systems. As a developer and technology evangelist, I've observed an emerging trend—it's not about making AI omnipotent, but about enabling each AI Agent to achieve excellence in specific domains. Today, I want to share an exciting technology stack: GitHub Copilot SDK (a development toolkit that embeds production-grade agent engines into any application) + Agent-to-Agent (A2A) Protocol (a communication standard enabling standardized agent collaboration) + Cloud Native Deployment (the infrastructure foundation for production systems). Together, these three components enable us to build truly collaborative multi-agent systems. 1. From AI Assistants to Agent Engines: Redefining Capability Boundaries Traditional AI assistants often pursue "omnipotence"—attempting to answer any question you throw at them. However, in real production environments, this approach faces serious challenges: Inconsistent Quality: A single model trying to write code, perform data analysis, and generate creative content struggles to achieve professional standards in each domain Context Pollution: Mixing prompts from different tasks leads to unstable model outputs Difficult Optimization: Adjusting prompts for one task type may negatively impact performance on others High Development Barrier: Building agents from scratch requires handling planning, tool orchestration, context management, and other complex logic GitHub proposed a revolutionary approach—instead of forcing developers to build agent frameworks from scratch, provide a production-tested, programmable agent engine. This is the core value of the GitHub Copilot SDK. Evolution from Copilot CLI to SDK GitHub Copilot CLI is a powerful command-line tool that can: Plan projects and features Modify files and execute commands Use custom agents Delegate tasks to cloud execution Integrate with MCP servers The GitHub Copilot SDK extracts the agentic core behind Copilot CLI and offers it as a programmable layer for any application. This means: You're no longer confined to terminal environments You can embed this agent engine into GUI applications, web services, and automation scripts You gain access to the same execution engine validated by millions of users Just like in the real world, we don't expect one person to be a doctor, lawyer, and engineer simultaneously. Instead, we provide professional tools and platforms that enable professionals to excel in their respective domains. 2. GitHub Copilot SDK: Embedding Copilot CLI's Agentic Core into Any App Before diving into multi-agent systems, we need to understand a key technology: GitHub Copilot SDK. What is GitHub Copilot SDK? GitHub Copilot SDK (now in technical preview) is a programmable agent execution platform. It allows developers to embed the production-tested agentic core from GitHub Copilot CLI directly into any application. Simply put, the SDK provides: Out-of-the-box Agent Loop: No need to build planners, tool orchestration, or context management from scratch Multi-model Support: Choose different AI models (like GPT-4, Claude Sonnet) for different task phases Tool and Command Integration: Built-in file editing, command execution, and MCP server integration capabilities Streaming Real-time Responses: Support for progress updates on long-running tasks Multi-language Support: SDKs available for Node.js, Python, Go, and .NET Why is the SDK Critical for Building Agents? Building an agentic workflow from scratch is extremely difficult. You need to handle: Context management across multiple conversation turns Orchestration of tools and commands Routing between different models MCP server integration Permission control, safety boundaries, and error handling GitHub Copilot SDK abstracts away all this underlying complexity. You only need to focus on: Defining agent professional capabilities (through Skill files) Providing domain-specific tools and constraints Implementing business logic SDK Usage Examples Python Example (from actual project implementation): from copilot import CopilotClient # Initialize client copilot_client = CopilotClient() await copilot_client.start() # Create session and load Skill session = await copilot_client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": ["/path/to/skills/blog/SKILL.md"] }) # Send task await session.send_and_wait({ "prompt": "Write a technical blog about multi-agent systems" }, timeout=600) Skill System: Professionalizing Agents While the SDK provides a powerful execution engine, how do we make agents perform professionally in specific domains? The answer is Skill files. A Skill file is a standardized capability definition containing: Capability Declaration: Explicitly tells the system "what I can do" (e.g., blog generation, PPT creation) Domain Knowledge: Preset best practices, standards, and terminology guidelines Workflow: Defines the complete execution path from input to output Output Standards: Ensures generated content meets format and quality requirements Through the combination of Skill files + SDK, we can build truly professional agents rather than generic "jack-of-all-trades assistants." 3. A2A Protocol: Enabling Seamless Agent Collaboration Once we have professional agents, the next challenge is: how do we make them work together? This is the core problem the Agent-to-Agent (A2A) Protocol aims to solve. Three Core Mechanisms of A2A Protocol 1. Agent Discovery (Service Discovery) Each agent exposes its capability card through the standardized /.well-known/agent-card.json endpoint, acting like a business card that tells other agents "what I can do": { "name": "blog_agent", "description": "Blog generation with DeepSearch", "primaryKeywords": ["blog", "article", "write"], "skills": [{ "id": "blog_generation", "tags": ["blog", "writing"], "examples": ["Write a blog about..."] }], "capabilities": { "streaming": true } } 2. Intelligent Routing The Orchestrator matches tasks with agent capabilities through scoring. The project's routing algorithm implements keyword matching and exclusion detection: Positive Matching: If a task contains an agent's primaryKeywords, score +0.5 Negative Exclusion: If a task contains other agents' keywords, score -0.3 This way, when users say "write a blog about cloud native," the system automatically selects the Blog Agent; when they say "create a tech presentation PPT," it routes to the PPT Agent. 3. SSE Streaming (Real-time Streaming) For time-consuming tasks (like generating a 5000-word blog), A2A uses Server-Sent Events to push real-time progress, allowing users to see the agent working instead of just waiting. This is crucial for user experience. 4. Cloud Native Deployment: Making Agent Systems Production-Ready Even the most powerful technology is just a toy if it can't be deployed to production environments. This project demonstrates a complete deployment of a multi-agent system to a cloud-native platform (Azure Container Apps). Why Choose Cloud Native? Elastic Scaling: When blog generation requests surge, the Blog Agent can auto-scale; it scales down to zero during idle times to save costs Independent Evolution: Each agent has its own Docker image and deployment pipeline; updating the Blog Agent doesn't affect the PPT Agent Fault Isolation: If one agent crashes, it won't bring down the entire system; the Orchestrator automatically degrades Global Distribution: Through Azure Container Apps, agents can be deployed across multiple global regions to reduce latency Container Deployment Essentials Each agent in the project has a standardized Dockerfile: FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8001 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"] Combined with the deploy-to-aca.sh script, one-click deployment to Azure: # Build and push image az acr build --registry myregistry --image blog-agent:latest . # Deploy to Container Apps az containerapp create \ --name blog-agent \ --resource-group my-rg \ --environment my-env \ --image myregistry.azurecr.io/blog-agent:latest \ --secrets github-token=$COPILOT_TOKEN \ --env-vars COPILOT_GITHUB_TOKEN=secretref:github-token 5. Real-World Results: From "Works" to "Works Well" Let's see how this system performs in real scenarios. Suppose a user initiates a request: "Write a technical blog about Kubernetes multi-tenancy security, including code examples and best practices" System Execution Flow: Orchestrator receives the request and scans all agents' capability cards Keyword matching: "write" + "blog" → Blog Agent scores 1.0, PPT Agent scores 0.0 Routes to Blog Agent, loads technical writing Skill Blog Agent initiates DeepSearch to collect latest K8s security materials SSE real-time push: "Collecting materials..." → "Generating outline..." → "Writing content..." Returns complete blog after 5 minutes, including code highlighting, citation sources, and best practices summary Compared to traditional "omnipotent" AI assistants, this system's advantages: ✅ Professionalism: Blog Agent trained with technical writing Skills produces content with clear structure, accurate terminology, and executable code ✅ Visibility: Users see progress throughout, knowing what the AI is doing ✅ Extensibility: Adding new agents (video script, data analysis) in the future requires no changes to existing architecture 6. Key Technical Challenges and Solutions Challenge 1: Inaccurate Agent Capability Descriptions Leading to Routing Errors Solution: Define clear primaryKeywords and examples in Agent Cards Implement exclusion detection mechanism to prevent tasks from being routed to unsuitable agents Challenge 2: Poor User Experience for Long-Running Tasks Solution: Fully adopt SSE streaming, pushing working/completed/error status in real-time Display progress hints in status messages so users know what the system is doing Challenge 3: Sensitive Information Leakage Risk Solution: Use Azure Key Vault or Container Apps Secrets to manage GitHub Tokens Inject via environment variables, never hardcode in code or images Check required environment variables in deployment scripts to prevent configuration errors 7. Future Outlook: SDK-Driven Multi-Agent Ecosystem This project is just the beginning. As GitHub Copilot SDK and A2A Protocol mature, we can build richer agent ecosystems: Actual SDK Application Scenarios According to GitHub's official blog, development teams have already used the Copilot SDK to build: YouTube Chapter Generator: Automatically generates timestamped chapter markers for videos Custom Agent GUIs: Visual agent interfaces for specific business scenarios Speech-to-Command Workflows: Control desktop applications through voice AI Battle Games: Interactive competitive experiences with AI Intelligent Summary Tools: Automatic extraction and summarization of key information Multi-Agent System Evolution Directions 🏪 Agent Marketplace: Developers can publish specialized agents (legal documents, medical reports, etc.) that plug-and-play via A2A protocol 🔗 Cascade Orchestration: Orchestrator automatically breaks down complex tasks, calling multiple agents collaboratively (e.g., "write blog + generate images + create PPT") 🌐 Cross-Platform Interoperability: Based on A2A standards, agents developed by different companies can call each other, breaking down data silos ⚙️ Automated Workflows: Delegate routine repetitive work to agent chains, letting humans focus on creative work 🎯 Vertical Domain Specialization: Combined with Skill files, build high-precision agents in professional fields like finance, healthcare, and legal Core Value of the SDK The significance of GitHub Copilot SDK lies in: it empowers every developer to become a builder of agent systems. You don't need deep learning experts, you don't need to implement agent frameworks yourself, and you don't even need to manage GPU clusters. You only need to: Install the SDK (npm install github/copilot-sdk) Define your business logic and tools Write Skill files describing professional capabilities Call the SDK's execution engine And you can build production-grade intelligent agent applications. Summary: From Demo to Production GitHub Copilot SDK + A2A + Cloud Native isn't three independent technology stacks, but a complete methodology: GitHub Copilot SDK provides an out-of-the-box agent execution engine—handling planning, tool orchestration, context management, and other underlying complexity Skill files enable agents with domain-specific professional capabilities—defining best practices, workflows, and output standards A2A Protocol enables standardized communication and collaboration between agents—implementing service discovery, intelligent routing, and streaming Cloud Native makes the entire system production-ready—containerization, elastic scaling, fault isolation For developers, this means we no longer need to build agent frameworks from scratch or struggle with the black magic of prompt engineering. We only need to: Use GitHub Copilot SDK to obtain a production-grade agent execution engine Write domain-specific Skill files to define professional capabilities Follow A2A protocol to implement standard interfaces between agents Deploy to cloud platforms through containerization And we can build AI Agent systems that are truly usable, well-designed, and production-ready. 🚀 Start Building Complete project code is open source: https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotAgents_A2A Follow the README guide and deploy your first Multi-Agent system in 30 minutes! References GitHub Copilot SDK Official Announcement - Build an agent into any app with the GitHub Copilot SDK GitHub Copilot SDK Repository - github.com/github/copilot-sdk A2A Protocol Official Specification - a2a-protocol.org/latest/ Project Source Code - Multi-AI-Agents-Cloud-Native Azure Container Apps Documentation - learn.microsoft.com/azure/container-apps472Views0likes0CommentsBuilding Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.Building Agents with GitHub Copilot SDK: A Practical Guide to Automated Tech Update Tracking
Introduction In the rapidly evolving tech landscape, staying on top of key project updates is crucial. This article explores how to leverage GitHub's newly released Copilot SDK to build intelligent agent systems, featuring a practical case study on automating daily update tracking and analysis for Microsoft's Agent Framework. GitHub Copilot SDK: Embedding AI Capabilities into Any Application SDK Overview On January 22, 2026, GitHub officially launched the GitHub Copilot SDK technical preview, marking a new era in AI agent development. The SDK provides these core capabilities: Production-grade execution loop: The same battle-tested agentic engine powering GitHub Copilot CLI Multi-language support: Node.js, Python, Go, and .NET Multi-model routing: Flexible model selection for different tasks MCP server integration: Native Model Context Protocol support Real-time streaming: Support for streaming responses and live interactions Tool orchestration: Automated tool invocation and command execution Core Advantages Building agentic workflows from scratch presents numerous challenges: Context management across conversation turns Orchestrating tools and commands Routing between models Handling permissions, safety boundaries, and failure modes The Copilot SDK encapsulates all this complexity. As Mario Rodriguez, GitHub's Chief Product Officer, explains: "The SDK takes the agentic power of Copilot CLI and makes it available in your favorite programming language... GitHub handles authentication, model management, MCP servers, custom agents, and chat sessions plus streaming. That means you are in control of what gets built on top of those building blocks." Quick Start Examples Here's a simple TypeScript example using the Copilot SDK: import { CopilotClient } from "@github/copilot-sdk"; const client = new CopilotClient(); await client.start(); const session = await client.createSession({ model: "gpt-5", }); await session.send({ prompt: "Hello, world!" }); And in Python, it's equally straightforward: from copilot import CopilotClient client = CopilotClient() await client.start() session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": ["./.copilot_skills/pr-analyzer/SKILL.md"] }) await session.send_and_wait({ "prompt": "Analyze PRs from microsoft/agent-framework merged yesterday" }) Real-World Case Study: Automated Agent Framework Daily Updates Project Background agent-framework-update-everyday is an automated system built with GitHub Copilot SDK and CLI that tracks daily code changes in Microsoft's Agent Framework and generates high-quality technical blog posts. System Architecture The project leverages the following technology stack: GitHub Copilot CLI (@github/copilot): Command-line AI capabilities GitHub Copilot SDK (github-copilot-sdk): Programmatic AI interactions Copilot Skills: Custom PR analysis behaviors GitHub Actions: CI/CD automation pipeline Core Workflow The system runs fully automated via GitHub Actions, executing Monday through Friday at UTC 00:00 with these steps: Step Action Description 1 Checkout repository Clone the repo using actions/checkout@v4 2 Setup Node.js Configure Node.js 22 environment for Copilot CLI 3 Install Copilot CLI Install via npm i -g github/copilot 4 Setup Python Configure Python 3.11 environment 5 Install Python dependencies Install github-copilot-sdk package 6 Run PR Analysis Execute pr_trigger_v2.py with Copilot authentication 7 Commit and push Auto-commit generated blog posts to repository Technical Implementation Details 1. Copilot Skill Definition The project uses a custom Copilot Skill (.copilot_skills/pr-analyzer/SKILL.md) to define: PR analysis behavior patterns Blog post structure requirements Breaking changes priority strategy Code snippet extraction rules This skill-based approach enables the AI agent to focus on domain-specific tasks and produce higher-quality outputs. 2. Python SDK Integration The core script pr_trigger_v2.py demonstrates Python SDK usage: from copilot import CopilotClient # Initialize client client = CopilotClient() await client.start() # Create session with model and skill specification session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": ["./.copilot_skills/pr-analyzer/SKILL.md"] }) # Send analysis request await session.send_and_wait({ "prompt": "Analyze PRs from microsoft/agent-framework merged yesterday" }) 3. CI/CD Integration The GitHub Actions workflow (.github/workflows/daily-pr-analysis.yml) ensures automated execution: name: Daily PR Analysis on: schedule: - cron: '0 0 * * 1-5' # Monday-Friday at UTC 00:00 workflow_dispatch: # Support manual triggers jobs: analyze: runs-on: ubuntu-latest steps: - name: Setup and Run Analysis env: COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }} run: | npm i -g github/copilot pip install github-copilot-sdk --break-system-packages python pr_trigger_v2.py Output Results The system automatically generates structured blog posts saved in the blog/ directory with naming convention: blog/agent-framework-pr-summary-{YYYY-MM-DD}.md Each post includes: Breaking Changes (highlighted first) Major Updates (with code examples) Minor Updates and Bug Fixes Summary and impact assessment Latest Advancements in GitHub Copilot CLI Released alongside the SDK, Copilot CLI has also received major updates, making it an even more powerful development tool: Enhanced Core Capabilities Persistent Memory: Cross-session context retention and intelligent compaction Multi-Model Collaboration: Choose different models for explore, plan, and review workflows Autonomous Execution: Custom agent support Agent skill system Full MCP support Async task delegation Real-World Applications Development teams have already built innovative applications using the SDK: YouTube chapter generators Custom GUI interfaces for agents Speech-to-command workflows for desktop apps Games where you compete with AI Content summarization tools These examples showcase the flexibility and power of the Copilot SDK. SDK vs CLI: Complementary, Not Competing Understanding the relationship between SDK and CLI is important: CLI: An interactive tool for end users, providing a complete development experience SDK: A programmable layer for developers to build customized applications The SDK essentially provides programmatic access to the CLI's core capabilities, enabling developers to: Integrate Copilot agent capabilities into any environment Build graphical user interfaces Create personal productivity tools Run custom internal agents in enterprise workflows GitHub handles the underlying authentication, model management, MCP servers, and session management, while developers focus on building value on top of these building blocks. Best Practices and Recommendations Based on experience from the agent-framework-update-everyday project, here are practical recommendations: 1. Leverage Copilot Skills Effectively Define clear skill files that specify: Input and output formats for tasks Rules for handling edge cases Quality standards and priorities 2. Choose Models Wisely Use different models for different tasks: Exploratory tasks: Use more powerful models (e.g., GPT-5) Execution tasks: Use faster models (e.g., Claude Sonnet) Cost-sensitive tasks: Balance performance and budget 3. Implement Robust Error Handling AI calls in CI/CD environments need to consider: Network timeout and retry strategies API rate limit handling Output validation and fallback mechanisms 4. Secure Authentication Management Use fine-grained Personal Access Tokens (PAT): Create dedicated Copilot access tokens Set minimum permission scope (Copilot Requests: Read) Store securely using GitHub Secrets 5. Version Control and Traceability Automated systems should: Log metadata for each execution Preserve historical outputs for comparison Implement auditable change tracking Future Outlook The release of GitHub Copilot SDK marks the democratization of AI agent development. Developers can now: Lower Development Barriers: No need to deeply understand complex AI infrastructure Accelerate Innovation: Focus on business logic rather than underlying implementation Flexible Integration: Embed AI capabilities into any application scenario Production-Ready: Leverage proven execution loops and security mechanisms As the SDK moves from technical preview to general availability, we can expect: Official support for more languages Richer tool ecosystem More powerful MCP integration capabilities Community-driven best practice libraries Conclusion This article demonstrates how to build practical automation systems using GitHub Copilot SDK through the agent-framework-update-everyday project. This case study not only validates the SDK's technical capabilities but, more importantly, showcases a new development paradigm: Using AI agents as programmable building blocks, integrated into daily development workflows, to liberate developer creativity. Whether you want to build personal productivity tools, enterprise internal agents, or innovative AI applications, the Copilot SDK provides a solid technical foundation. Visit github/copilot-sdk to start your AI agent journey today! Reference Resources GitHub Copilot SDK Official Repository Agent Framework Update Everyday Project GitHub Copilot CLI Documentation Microsoft Agent Framework Build an agent into any app with the GitHub Copilot SDK2.8KViews2likes0Comments