llm
40 TopicsAI Toolkit Extension Pack for Visual Studio Code: Ignite 2025 Update
Unlock the Latest Agentic App Capabilities The Ignite 2025 update delivers a major leap forward for the AI Toolkit extension pack in VS Code, introducing a unified, end-to-end environment for building, visualizing, and deploying agentic applications to Microsoft Foundry, and the addition of Anthropic’s frontier Claude models in the Model Catalog! This release enables developers to build and debug locally in VS Code, then deploy to the cloud with a single click. Seamlessly switch between VS Code and the Foundry portal for visualization, orchestration, and evaluation, creating a smooth roundtrip workflow that accelerates innovation and delivers a truly unified AI development experience. Download the http://aka.ms/aitoolkit today and start building next-generation agentic apps in VS Code! What Can You Do with the AI Toolkit Extension Pack? Access Anthropic models in the Model Catalog Following the Microsoft, NVIDIA and Anthropic strategic partnerships announcement today, we are excited to share that Anthropic’s frontier Claude models including Claude Sonnet 4.5, Claude Opus 4.1, and Claude Haiku 4.5, are now integrated into the AI Toolkit, providing even more choices and flexibility when building intelligent applications and AI agents. Build AI Agents Using GitHub Copilot Scaffold agent applications using best-practice patterns, tool-calling examples, tracing hooks, and test scaffolds, all powered by Copilot and aligned with the Microsoft Agent Framework. Generate agent code in Python or .NET, giving you flexibility to target your preferred runtime. Build and Customize YAML Workflows Design YAML-based workflows in the Foundry portal, then continue editing and testing directly in VS Code. To customize your YAML-based workflows, instantly convert it to Agent Framework code using GitHub Copilot. Upgrade from declarative design to code-first customization without starting from scratch. Visualize Multi-Agent Workflows Envision your code-based agent workflows with an interactive graph visualizer that reveals each component and how they connect Watch in real-time how each node lights up as you run your agent. Use the visualizer to understand and debug complex agent graphs, making iteration fast and intuitive. Experiment, Debug, and Evaluate Locally Use the Hosted Agents Playground to quickly interact with your agents on your development machine. Leverage local tracing support to debug reasoning steps, tool calls, and latency hotspots—so you can quickly diagnose and fix issues. Define metrics, tasks, and datasets for agent evaluation, then implement metrics using the Foundry Evaluation SDK and orchestrate evaluations runs with the help of Copilot. Seamless Integration Across Environments Jump from Foundry Portal to VS Code Web for a development environment in your preferred code editor setting. Open YAML workflows, playgrounds, and agent templates directly in VS Code for editing and deployment. How to Get Started Install the AI Toolkit extension pack from the VS Code marketplace. Check out documentation. Get started with building workflows with Microsoft Foundry in VS Code 1. Work with Hosted (Pro-code) Agent workflows in VS Code 2. Work with Declarative (Low-code) Agent workflows in VS Code Feedback & Support Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension and AI Toolkit extension. Your input helps us make continuous improvements.2.7KViews4likes0CommentsGitHub Copilot SDK and Hybrid AI in Practice: Automating README to PPT Transformation
Introduction In today's rapidly evolving AI landscape, developers often face a critical choice: should we use powerful cloud-based Large Language Models (LLMs) that require internet connectivity, or lightweight Small Language Models (SLMs) that run locally but have limited capabilities? The answer isn't either-or—it's hybrid models—combining the strengths of both to create AI solutions that are secure, efficient, and powerful. This article explores hybrid model architectures through the lens of GenGitHubRepoPPT, demonstrating how to elegantly combine Microsoft Foundry Local, GitHub Copilot SDK, and other technologies to automatically generate professional PowerPoint presentations from GitHub README files. 1. Hybrid Model Scenarios and Value 1.1 What Are Hybrid Models? Hybrid AI Models strategically combine locally-running Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) within the same application, selecting the most appropriate model for each task based on its unique characteristics. Core Principles: Local Processing for Sensitive Data: Privacy-critical content analysis happens on-device Cloud for Value Creation: Complex reasoning and creative generation leverage cloud power Balancing Cost and Performance: High-frequency, simple tasks run locally to minimize API costs 1.2 Typical Hybrid Model Use Cases Use Case Local SLM Role Cloud LLM Role Value Proposition Intelligent Document Processing Text extraction, structural analysis Content refinement, format conversion Privacy protection + Professional output Code Development Assistant Syntax checking, code completion Complex refactoring, architecture advice Fast response + Deep insights Customer Service Systems Intent recognition, FAQ handling Complex issue resolution Reduced latency + Enhanced quality Content Creation Platforms Keyword extraction, outline generation Article writing, multilingual translation Cost control + Creative assurance 1.3 Why Choose Hybrid Models? Three Core Advantages: Privacy and Security Sensitive data never leaves local devices Compliant with GDPR, HIPAA, and other regulations Ideal for internal corporate documents and personal information Cost Optimization Reduces cloud API call frequency Local models have zero usage fees Predictable operational costs Performance and Reliability Local processing eliminates network latency Partial functionality in offline environments Cloud models ensure high-quality output 2. Core Technology Analysis 2.1 Large Language Models (LLMs): Cloud Intelligence Representatives What are LLMs? Large Language Models are deep learning-based natural language processing models, typically with billions to trillions of parameters. Through training on massive text datasets, they've acquired powerful language understanding and generation capabilities. Representative Models: Claude Sonnet 4.5: Anthropic's flagship model, excelling at long-context processing and complex reasoning GPT-5.2 Series: OpenAI's general-purpose language models Gemini: Google's multimodal large models LLM Advantages: ✅ Exceptional text generation quality ✅ Powerful contextual understanding ✅ Support for complex reasoning tasks ✅ Continuous model updates and optimization Typical Applications: Professional document writing (technical reports, business plans) Code generation and refactoring Multilingual translation Creative content creation 2.2 Small Language Models (SLMs) and Microsoft Foundry Local 2.2.1 SLM Characteristics Small Language Models typically have 1B-7B parameters, designed specifically for resource-constrained environments. Mainstream SLM Model Families: Microsoft Phi Family (Phi Family): Inference-optimized efficient models Alibaba Qwen Family (Qwen Family): Excellent Chinese language capabilities Mistral Series: Outstanding performance with small parameter counts SLM Advantages: ⚡ Low-latency response (millisecond-level) 💰 Zero API costs 🔒 Fully local, data stays on-device 📱 Suitable for edge device deployment 2.2.2 Microsoft Foundry Local: The Foundation of Local AI Foundry Local is Microsoft's local AI runtime tool, enabling developers to easily run SLMs on Windows or macOS devices. Core Features: OpenAI-Compatible API # Using Foundry Local is like using OpenAI API from openai import OpenAI from foundry_local import FoundryLocalManager manager = FoundryLocalManager("qwen2.5-7b-instruct") client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) Hardware Acceleration Support CPU: General computing support GPU: NVIDIA, AMD, Intel graphics acceleration NPU: Qualcomm, Intel AI-specific chips Apple Silicon: Neural Engine optimization Based on ONNX Runtime Cross-platform compatibility Highly optimized inference performance Supports model quantization (INT4, INT8) Convenient Model Management # View available models foundry model list # Run a model foundry model run qwen2.5-7b-instruct-generic-cpu:4 # Check running status foundry service ps Foundry Local Application Value: 🎓 Educational Scenarios: Students can learn AI development without cloud subscriptions 🏢 Enterprise Environments: Process sensitive data while maintaining compliance 🧪 R&D Testing: Rapid prototyping without API cost concerns ✈️ Offline Environments: Works on planes, subways, and other no-network scenarios 2.3 GitHub Copilot SDK: The Express Lane from Agent to Business Value 2.3.1 What is GitHub Copilot SDK? GitHub Copilot SDK, released as a technical preview on January 22, 2026, is a game-changer for AI Agent development. Unlike other AI SDKs, Copilot SDK doesn't just provide API calling interfaces—it delivers a complete, production-grade Agent execution engine. Why is it revolutionary? Traditional AI application development requires you to build: ❌ Context management systems (multi-turn conversation state) ❌ Tool orchestration logic (deciding when to call which tool) ❌ Model routing mechanisms (switching between different LLMs) ❌ MCP server integration ❌ Permission and security boundaries ❌ Error handling and retry mechanisms Copilot SDK provides all of this out-of-the-box, letting you focus on business logic rather than underlying infrastructure. 2.3.2 Core Advantages: The Ultra-Short Path from Concept to Code Production-Grade Agent Engine: Battle-Tested Reliability Copilot SDK uses the same Agent core as GitHub Copilot CLI, which means: ✅ Validated in millions of real-world developer scenarios ✅ Capable of handling complex multi-step task orchestration ✅ Automatic task planning and execution ✅ Built-in error recovery mechanisms Real-World Example: In the GenGitHubRepoPPT project, we don't need to hand-write the "how to convert outline to PPT" logic—we simply tell Copilot SDK the goal, and it automatically: Analyzes outline structure Plans slide layouts Calls file creation tools Applies formatting logic Handles multilingual adaptation # Traditional approach: requires hundreds of lines of code for logic def create_ppt_traditional(outline): slides = parse_outline(outline) for slide in slides: layout = determine_layout(slide) content = format_content(slide) apply_styling(content, layout) # ... more manual logic return ppt_file # Copilot SDK approach: focus on business intent session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": [skills_dir] }) session.send_and_wait({"prompt": prompt}, timeout=600) Custom Skills: Reusable Encapsulation of Business Knowledge This is one of Copilot SDK's most powerful features. In traditional AI development, you need to provide complete prompts and context with every call. Skills allow you to: Define once, reuse forever: # .copilot_skills/ppt/SKILL.md # PowerPoint Generation Expert Skill ## Expertise You are an expert in business presentation design, skilled at transforming technical content into easy-to-understand visual presentations. ## Workflow 1. **Structure Analysis** - Identify outline hierarchy (titles, subtitles, bullet points) - Determine topic and content density for each slide 2. **Layout Selection** - Title slide: Use large title + subtitle layout - Content slides: Choose single/dual column based on bullet count - Technical details: Use code block or table layouts 3. **Visual Optimization** - Apply professional color scheme (corporate blue + accent colors) - Ensure each slide has a visual focal point - Keep bullets to 5-7 items per page 4. **Multilingual Adaptation** - Choose appropriate fonts based on language (Chinese: Microsoft YaHei, English: Calibri) - Adapt text direction and layout conventions ## Output Requirements Generate .pptx files meeting these standards: - 16:9 widescreen ratio - Consistent visual style - Editable content (not images) - File size < 5MB Business Code Generation Capability This is the core value of this project. Unlike generic LLM APIs, Copilot SDK with Skills can generate truly executable business code. Comparison Example: Aspect Generic LLM API Copilot SDK + Skills Task Description Requires detailed prompt engineering Concise business intent suffices Output Quality May need multiple adjustments Professional-grade on first try Code Execution Usually example code Directly generates runnable programs Error Handling Manual implementation required Agent automatically handles and retries Multi-step Tasks Manual orchestration needed Automatic planning and execution Comparison of manual coding workload: Task Manual Coding Copilot SDK Processing logic code ~500 lines ~10 lines configuration Layout templates ~200 lines Declared in Skill Style definitions ~150 lines Declared in Skill Error handling ~100 lines Automatically handled Total ~950 lines ~10 lines + Skill file Tool Calling & MCP Integration: Connecting to the Real World Copilot SDK doesn't just generate code—it can directly execute operations: 🗃️ File System Operations: Create, read, modify files 🌐 Network Requests: Call external APIs 📊 Data Processing: Use pandas, numpy, and other libraries 🔧 Custom Tools: Integrate your business logic 3. GenGitHubRepoPPT Case Study 3.1 Project Overview GenGitHubRepoPPT is an innovative hybrid AI solution that combines local AI models with cloud-based AI agents to automatically generate professional PowerPoint presentations from GitHub repository README files in under 5 minutes. Technical Architecture: 3.2 Why Adopt a Hybrid Model? Stage 1: Local SLM Processes Sensitive Data Task: Analyze GitHub README, extract key information, generate structured outline Reasons for choosing Qwen-2.5-7B + Foundry Local: Privacy Protection README may contain internal project information Local processing ensures data doesn't leave the device Complies with data compliance requirements Cost Effectiveness Each analysis processes thousands of tokens Cloud API costs are significant in high-frequency scenarios Local models have zero additional fees Performance Qwen-2.5-7B excels at text analysis tasks Outstanding Chinese support Acceptable CPU inference latency (typically 2-3 seconds) Stage 2: Cloud LLM + Copilot SDK Creates Business Value Task: Create well-formatted PowerPoint files based on outline Reasons for choosing Claude Sonnet 4.5 + Copilot SDK: Automated Business Code Generation Traditional approach pain points: Need to hand-write 500+ lines of code for PPT layout logic Require deep knowledge of python-pptx library APIs Style and formatting code is error-prone Multilingual support requires additional conditional logic Copilot SDK solution: Declare business rules and best practices through Skills Agent automatically generates and executes required code Zero-code implementation of complex layout logic Development time reduced from 2-3 days to 2-3 hours Ultra-Short Path from Intent to Execution Comparison: Different ways to implement "Generate professional PPT" 3. Production-Grade Reliability and Quality Assurance Battle-tested Agent engine: Uses the same core as GitHub Copilot CLI Validated in millions of real-world scenarios Automatically handles edge cases and errors Consistent output quality: Professional standards ensured through Skills Automatic validation of generated files Built-in retry and error recovery mechanisms 4. Rapid Iteration and Optimization Capability Scenario: Client requests PPT style adjustment The GitHub Repo https://github.com/kinfey/GenGitHubRepoPPT 4. Summary 4.1 Core Value of Hybrid Models + Copilot SDK The GenGitHubRepoPPT project demonstrates how combining hybrid models with Copilot SDK creates a new paradigm for AI application development. Privacy and Cost Balance The hybrid approach allows sensitive README analysis to happen locally using Qwen-2.5-7B, ensuring data never leaves the device while incurring zero API costs. Meanwhile, the value-creating work—generating professional PowerPoint presentations—leverages Claude Sonnet 4.5 through Copilot SDK, delivering quality that justifies the per-use cost. From Code to Intent Traditional AI development required writing hundreds of lines of code to handle PPT generation logic, layout selection, style application, and error handling. With Copilot SDK and Skills, developers describe what they want in natural language, and the Agent automatically generates and executes the necessary code. What once took 3-5 days now takes 3-4 hours, with 95% less code to maintain. Automated Business Code Generation Copilot SDK doesn't just provide code examples—it generates complete, executable business logic. When you request a multilingual PPT, the Agent understands the requirement, selects appropriate fonts, generates the implementation code, executes it with error handling, validates the output, and returns a ready-to-use file. Developers focus on business intent rather than implementation details. 4.2 Technology Trends The Shift to Intent-Driven Development We're witnessing a fundamental change in how developers work. Rather than mastering every programming language detail and framework API, developers are increasingly defining what they want through declarative Skills. Copilot SDK represents this future: you describe capabilities in natural language, and AI Agents handle the code generation and execution automatically. Edge AI and Cloud AI Integration The evolution from pure cloud LLMs (powerful but privacy-concerning) to pure local SLMs (private but limited) has led to today's hybrid architectures. GenGitHubRepoPPT exemplifies this trend: local models handle data analysis and structuring, while cloud models tackle complex reasoning and professional output generation. This combination delivers fast, secure, and professional results. Democratization of Agent Development Copilot SDK dramatically lowers the barrier to building AI applications. Senior engineers see 10-20x productivity gains. Mid-level engineers can now build sophisticated agents that were previously beyond their reach. Even junior engineers and business experts can participate by writing Skills that capture domain knowledge without deep technical expertise. The future isn't about whether we can build AI applications—it's about how quickly we can turn ideas into reality. References Projects and Code GenGitHubRepoPPT GitHub Repository - Case study project Microsoft Foundry Local - Local AI runtime GitHub Copilot SDK - Agent development SDK Copilot SDK Getting Started Tutorial - Official quick start Deep Dive: Copilot SDK Build an Agent into Any App with GitHub Copilot SDK - Official announcement GitHub Copilot SDK Cookbook - Practical examples Copilot CLI Official Documentation - CLI tool documentation Learning Resources Edge AI for Beginners - Edge AI introductory course Azure AI Foundry Documentation - Azure AI documentation GitHub Copilot Extensions Guide - Extension development guideBuilding Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.Study Buddy: Learning Data Science and Machine Learning with an AI Sidekick
If you've ever wished for a friendly companion to guide you through the world of data science and machine learning, you're not alone. As part of the "For Beginners" curriculum, I recently built a Study Buddy Agent, an AI-powered assistant designed to help learners explore data science interactively, intuitively, and joyfully. Why a Study Buddy? Learning something new can be overwhelming, especially when you're navigating complex topics like machine learning, statistics, or Python programming. The Study Buddy Agent is here to change that. It brings the curriculum to life by answering questions, offering explanations, and nudging learners toward deeper understanding, all in a conversational format. Think of it as your AI-powered lab partner: always available, never judgmental, and endlessly curious. Built with chatmodes, Powered by Purpose The agent lives inside a .chatmodes file in the https://github.com/microsoft/Data-Science-For-Beginners/blob/main/.github/chatmodes/study-mode.chatmode.md. This file defines how the agent behaves, what tone it uses, and how it interacts with learners. I designed it to be friendly, encouraging, and beginner-first—just like the curriculum itself. It’s not just about answering questions. The Study Buddy is trained to: Reinforce key concepts from the curriculum Offer hints and nudges when learners get stuck Encourage exploration and experimentation Celebrate progress and milestones What’s Under the Hood? The agent uses GitHub Copilot's chatmode, which allows developers to define custom behaviors for AI agents. By aligning the agent’s responses with the curriculum’s learning objectives, we ensure that learners stay on track while enjoying the flexibility of conversational learning. How You Can Use It YouTube Video here: Study Buddy - Data Science AI Sidekick Clone the repo: Head to the https://github.com/microsoft/Data-Science-For-Beginners and clone it locally or use Codespaces. Open the GitHub Copilot Chat, and select Study Buddy: This will activate the Study Buddy. Start chatting: Ask questions, explore topics, and let the agent guide you. What’s Next? This is just the beginning. I’m exploring ways to: Expand the agent to other beginner curriculums (Web Dev, AI, IoT) Integrate feedback loops so learners can shape the agent’s evolution Final Thoughts In my role, I believe learning should be inclusive, empowering, and fun. The Study Buddy Agent is a small step toward that vision, a way to make data science feel less like a mountain and more like a hike with a good friend. Try it out, share your feedback, and let’s keep building tools that make learning magical. Join us on Discord to share your feedback.Is it a bug or a feature? Using Prompty to automatically track and tag issues.
Introduction You’ve probably noticed a theme in my recent posts: tackling challenges with AI-powered solutions. In my latest project, I needed a fast way to classify and categorize GitHub issues using a predefined set of tags. The tag data was there, but the connections between issues and tags weren’t. To bridge that gap, I combined Azure OpenAI Service, Prompty, and a GitHub to automatically extract and assign the right labels. By automating issue tagging, I was able to: Streamline contributor workflows with consistent, on-time labels that simplify triage Improve repository hygiene by keeping issues well-organized, searchable, and easy to navigate Eliminate repetitive maintenance so the team can focus on community growth and developer empowerment Scale effortlessly as the project expands, turning manual chores into intelligent automation Challenge: 46 issues, no tags The Prompty repository currently hosts 46 relevant, but untagged, issues. To automate labeling, I first defined a complete tag taxonomy. Then I built a solution using: Prompty for prompt templating and function calling Azure OpenAI (gpt-4o-mini) to classify each issue Azure AI Search for retrieval-augmented context (RAG) Python to orchestrate the workflow and integrate with GitHub By the end, you’ll have an autonomous agent that fetches open issues, matches them against your custom taxonomy, and applies labels back on GitHub. Prerequisites: An Azure account with Azure AI Search and Azure OpenAI enabled Python and Prompty installed Clone the repo and install dependencies: pip install -r requirements.txt Step 1: Define the prompt template We’ll use Prompty to structure our LLM instructions. If you haven’t yet, install the Prompty VS Code extension and refer to the Prompty docs to get started. Prompty combines: Tooling to configure and deploy models Runtime for executing prompts and function calls Specification (YAML) for defining prompts, inputs, and outputs Our Prompty is set to use gpt-4o-mini and below is our sample input: sample: title: Including Image in System Message tags: ${file:tags.json} description: An error arises in the flow, coming up starting from the "complete" block. It seems like it is caused by placing a static image in the system prompt, since removing it causes the issue to go away. Please let me know if I can provide additional context. The inputs will be the tags file implemented using RAG, then we will fetch the issue title and description from GitHub once a new issue is posted. Next, in our Prompty file, we gave instructions of how the LLLM should work as follows: system: You are an intelligent GitHub issue tagging assistant. Available tags: ${inputs} {% if tags.tags %} ## Available Tags {% for tag in tags.tags %} name: {{tag.name}} description: {{tag.description}} {% endfor %} {% endif %} Guidelines: 1. Only select tags that exactly match the provided list above 2. If no tags apply, return an empty array [] 3. Return ONLY a valid JSON array of strings, nothing else 4. Do not explain your choices or add any other text Use your understanding of the issue and refer to documentation at https://prompty.ai to match appropriate tags. Tags may refer to: - Issue type (e.g., bug, enhancement, documentation) - Tool or component (e.g., tool:cli, tracer:json-tracer) - Technology or integration (e.g., integration:azure, runtime:python) - Conceptual elements (e.g., asset:template-loading) Return only a valid JSON array of the issue title, description and tags. If the issue does not fit in any of the categories, return an empty array with: ["No tags apply to this issue. Please review the issue and try again."] Example: Issue Title: "App crashes when running in Azure CLI" Issue Body: "Running the generated code in Azure CLI throws a Python runtime error." Tag List: ["bug", "tool:cli", "runtime:python", "integration:azure"] Output: ["bug", "tool:cli", "runtime:python", "integration:azure"] user: Issue Title: {{title}} Issue Description: {{description}} Once the Prompty file was ready, I right clicked on the file and converted it to Prompty code, which provided a Python base code to get started from, instead of building from scratch. Step 2: enrich with context using Azure AI Search To be able to generate labels for our issues, I created a sample of tags, around 20, each with a title and a description of what it does. As a starting point, I started with Azure AI Foundry, where I uploaded the data and created an index. This typically takes about 1hr to successfully complete. Next, I implemented a retrieval function: def query_azure_search(query_text): """Query Azure AI Search for relevant documents and tags.""" search_client = SearchClient( endpoint=SEARCH_SERVICE_ENDPOINT, index_name=SEARCH_INDEX_NAME, credential=AzureKeyCredential(SEARCH_API_KEY) ) # Perform the search results = search_client.search( search_text=query_text, query_type=QueryType.SIMPLE, top=5 # Retrieve top 5 results ) # Extract content and tags from results documents = [doc["content"] for doc in results] tags = [doc.get("tags", []) for doc in results] # Assuming "tags" is a field in the index # Flatten and deduplicate tags unique_tags = list(set(tag for tag_list in tags for tag in tag_list)) return documents, unique_tags Step 3: Orchestrate the Workflow In addition, to adding RAG, I added functions in the basic.py file to: fetch_github_issues: calls the GitHub REST API to list open issues and filters out any that already have labels. run_with_rag: on the issues selected, calls the query_azure_search to append any retrieved docs, tags the issues and parses the JSON output from the prompt to a list for the labels label_issue: patches the issue to apply a list of labels. process_issues: this fetches all unlabelled issues, extracts the rag pipeline to generate the tags, and calls the labels_issue tag to apply the tags scheduler loop: this runs every so often to check if there's a new issue and apply a label Step 4: Validate and Run Ensure all .env variables are set (API keys, endpoints, token). Install dependencies and execute using: python basic.py Create a new GitHub issue and watch as your agent assigns tags in real time. Below is a short demo video here to illustrate the workflow. Next Steps Migrate from PATs to a GitHub App for tighter security Create multi-agent application and add an evaluator agent to review tags before publishing Integrate with GitHub Actions or Azure Pipelines for CI/CD Conclusion and Resources By combining Prompty, Azure AI Search, and Azure OpenAI, you can fully automate GitHub issue triage—improving consistency, saving time, and scaling effortlessly. Adapt this pattern to any classification task in your own workflows! You can learn more using the following resources: Prompty documentation to learn more on Prompty Agents for Beginners course to learn how you can build your own agentIntroducing langchain-azure-storage: Azure Storage integrations for LangChain
We're excited to introduce langchain-azure-storage , the first official Azure Storage integration package built by Microsoft for LangChain 1.0. As part of its launch, we've built a new Azure Blob Storage document loader (currently in public preview) that improves upon prior LangChain community implementations. This new loader unifies both blob and container level access, simplifying loader integration. More importantly, it offers enhanced security through default OAuth 2.0 authentication, supports reliably loading millions to billions of documents through efficient memory utilization, and allows pluggable parsing, so you can leverage other document loaders to parse specific file formats. What are LangChain document loaders? A typical Retrieval‑Augmented Generation (RAG) pipeline follows these main steps: Collect source content (PDFs, DOCX, Markdown, CSVs) — often stored in Azure Blob Storage. Parse into text and associated metadata (i.e., represented as LangChain Document objects). Chunk + embed those documents and store in a vector store (e.g., Azure AI Search, Postgres pgvector, etc.). At query time, retrieve the most relevant chunks and feed them to an LLM as grounded context. LangChain document loaders make steps 1–2 turnkey and consistent so the rest of the stack (splitters, vector stores, retrievers) “just works”. See this LangChain RAG tutorial for a full example of these steps when building a RAG application in LangChain. How can the Azure Blob Storage document loader help? The langchain-azure-storage package offers the AzureBlobStorageLoader , a document loader that simplifies retrieving documents stored in Azure Blob Storage for use in a LangChain RAG application. Key benefits of the AzureBlobStorageLoader include: Flexible loading of Azure Storage blobs to LangChain Document objects. You can load blobs as documents from an entire container, a specific prefix within a container, or by blob names. Each document loaded corresponds 1:1 to a blob in the container. Lazy loading support for improved memory efficiency when dealing with large document sets. Documents can now be loaded one-at-a-time as you iterate over them instead of all at once. Automatically uses DefaultAzureCredential to enable seamless OAuth 2.0 authentication across various environments, from local development to Azure-hosted services. You can also explicitly pass your own credential (e.g., ManagedIdentityCredential , SAS token). Pluggable parsing. Easily customize how documents are parsed by providing your own LangChain document loader to parse downloaded blob content. Using the Azure Blob Storage document loader Installation To install the langchain-azure-storage package, run: pip install langchain-azure-storage Loading documents from a container To load all blobs from an Azure Blob Storage container as LangChain Document objects, instantiate the AzureBlobStorageLoader with the Azure Storage account URL and container name: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>" ) # lazy_load() yields one Document per blob for all blobs in the container for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Loading documents by blob names To only load specific blobs as LangChain Document objects, you can additionally provide a list of blob names: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", ["<blob-name-1>", "<blob-name-2>"] ) # lazy_load() yields one Document per blob for only the specified blobs for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Pluggable parsing By default, loaded Document objects contain the blob's UTF-8 decoded content. To parse non-UTF-8 content (e.g., PDFs, DOCX, etc.) or chunk blob content into smaller documents, provide a LangChain document loader via the loader_factory parameter. When loader_factory is provided, the AzureBlobStorageLoader processes each blob with the following steps: Downloads the blob to a new temporary file Passes the temporary file path to the loader_factory callable to instantiate a document loader Uses that loader to parse the file and yield Document objects Cleans up the temporary file For example, below shows parsing PDF documents with the PyPDFLoader from the langchain-community package: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_community.document_loaders import PyPDFLoader # Requires langchain-community and pypdf packages loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", prefix="pdfs/", # Only load blobs that start with "pdfs/" loader_factory=PyPDFLoader # PyPDFLoader will parse each blob as a PDF ) # Each blob is downloaded to a temporary file and parsed by PyPDFLoader instance for doc in loader.lazy_load(): print(doc.page_content) # Content parsed by PyPDFLoader (yields one Document per page in the PDF) This file path-based interface allows you to use any LangChain document loader that accepts a local file path as input, giving you access to a wide range of parsers for different file formats. Migrating from community document loaders to langchain-azure-storage If you're currently using AzureBlobStorageContainerLoader or AzureBlobStorageFileLoader from the langchain-community package, the new AzureBlobStorageLoader provides an improved alternative. This section provides step-by-step guidance for migrating to the new loader. Steps to migrate To migrate to the new Azure Storage document loader, make the following changes: Depend on the langchain-azure-storage package Update import statements from langchain_community.document_loaders to langchain_azure_storage.document_loaders . Change class names from AzureBlobStorageFileLoader and AzureBlobStorageContainerLoader to AzureBlobStorageLoader . Update document loader constructor calls to: Use an account URL instead of a connection string. Specify UnstructuredLoader as the loader_factory to continue to use Unstructured for parsing documents. Enable Microsoft Entra ID authentication in environment (e.g., run az login or configure managed identity) instead of using connection string authentication. Migration samples Below shows code snippets of what usage patterns look like before and after migrating from langchain-community to langchain-azure-storage : Before migration from langchain_community.document_loaders import AzureBlobStorageContainerLoader, AzureBlobStorageFileLoader container_loader = AzureBlobStorageContainerLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", ) file_loader = AzureBlobStorageFileLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", "<blob>" ) After migration from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_unstructured import UnstructuredLoader # Requires langchain-unstructured and unstructured packages container_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) file_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", "<blob>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) What's next? We're excited for you to try the new Azure Blob Storage document loader and would love to hear your feedback! Here are some ways you can help shape the future of langchain-azure-storage : Show support for interface stabilization - The document loader is currently in public preview and the interface may change in future versions based on feedback. If you'd like to see the current interface marked as stable, upvote the proposal PR to show your support. Report issues or suggest improvements - Found a bug or have an idea to make the document loaders better? File an issue on our GitHub repository. Propose new LangChain integrations - Interested in other ways to use Azure Storage with LangChain (e.g., checkpointing for agents, persistent memory stores, retriever implementations)? Create a feature request or write to us to let us know. Your input is invaluable in making langchain-azure-storage better for the entire community! Resources langchain-azure GitHub repository langchain-azure-storage PyPI package AzureBlobStorageLoader usage guide AzureBlobStorageLoader documentation referenceOrchestrating Multi-Agent Intelligence: MCP-Driven Patterns in Agent Framework
Building reliable AI systems requires modular, stateful coordination and deterministic workflows that enable agents to collaborate seamlessly. The Microsoft Agent Framework provides these foundations, with memory, tracing, and orchestration built in. This implementation demonstrates four multi-agentic patterns — Single Agent, Handoff, Reflection, and Magentic Orchestration — showcasing different interaction models and collaboration strategies. From lightweight domain routing to collaborative planning and self-reflection, these patterns highlight the framework’s flexibility. At the core is Model Context Protocol (MCP), connecting agents, tools, and memory through a shared context interface. Persistent session state, conversation thread history, and checkpoint support are handled via Cosmos DB when configured, with an in-memory dictionary as a default fallback. This setup enables dynamic pattern swapping, performance comparison, and traceable multi-agent interactions — all within a unified, modular runtime. Business Scenario: Contoso Customer Support Chatbot Contoso’s chatbot handles multi-domain customer inquiries like billing anomalies, promotion eligibility, account locks, and data usage questions. These require combining structured data (billing, CRM, security logs, promotions) with unstructured policy documents processed via vector embeddings. Using MCP, the system orchestrates tool calls to fetch real-time structured data and relevant policy content, ensuring policy-aligned, auditable responses without exposing raw databases. This enables the assistant to explain anomalies, recommend actions, confirm eligibility, guide account recovery, and surface risk indicators—reducing handle time and improving first-contact resolution while supporting richer multi-agent reasoning. Architecture & Core Concepts The Contoso chatbot leverages the Microsoft Agent Framework to deliver a modular, stateful, and workflow-driven architecture. At its core, the system consists of: Base Agent: All agent patterns—single agent, reflection, handoff and magentic orchestration—inherit from a common base class, ensuring consistent interfaces for message handling, tool invocation, and state management. Backend: A FastAPI backend manages session routing, agent execution, and workflow orchestration. Frontend: A React-based UI (or Streamlit alternative) streams responses in real-time and visualizes agent reasoning and tool calls. Modular Runtime and Pattern Swapping One of the most powerful aspects of this implementation is its modular runtime design. Each agentic pattern—Single, Reflection, Handoff, and Magnetic—plugs into a shared execution pipeline defined by the base agent and MCP integration. By simply updating the .env configuration (e.g., agent_module=handoff), developers can swap in and out entire coordination strategies without touching the backend, frontend, or memory layers. This makes it easy to compare agent styles side by side, benchmark reasoning behaviors, and experiment with orchestration logic—all while maintaining a consistent, deterministic runtime. The same MCP connectors, FastAPI backend, and Cosmos/in-memory state management work seamlessly across every pattern, enabling rapid iteration and reliable evaluation. # Dynamic agent pattern loading agent_module_path = os.getenv("AGENT_MODULE") agent_module = __import__(agent_module_path, fromlist=["Agent"]) Agent = getattr(agent_module, "Agent") # Common MCP setup across all patterns async def _create_tools(self, headers: Dict[str, str]) -> List[MCPStreamableHTTPTool] | None: if not self.mcp_server_uri: return None return [MCPStreamableHTTPTool( name="mcp-streamable", url=self.mcp_server_uri, headers=headers, timeout=30, request_timeout=30, )] Memory & State Management State management is critical for multi-turn conversations and cross-agent workflows. The system supports two out-of-the-box options: Persistent Storage (Cosmos DB) Acts as the durable, enterprise-ready backend. Stores serialized conversation threads and workflow checkpoints keyed by tenant and session ID. Ensures data durability and auditability across restarts. In-Memory Session Store Default fallback when Cosmos DB credentials are not configured. Maintains ephemeral state per session for fast prototyping or lightweight use cases. All patterns leverage the same thread-based state abstraction, enabling: Session isolation: Each user session maintains its own state and history. Checkpointing: Multi-agent workflows can snapshot shared and executor-local state at any point, supporting pause/resume and fault recovery. Model Context Protocol (MCP): Acts as the connector between agents and tools, standardizing how data is fetched and results are returned to agents, whether querying structured databases or unstructured knowledge sources. Core Principles Across all patterns, the framework emphasizes: Modularity: Components are interchangeable—agents, tools, and state stores can be swapped without disrupting the system. Stateful Coordination: Multi-agent workflows coordinate through shared and local state, enabling complex reasoning without losing context. Deterministic Workflows: While agents operate autonomously, the workflow layer ensures predictable, auditable execution of multi-agent tasks. Unified Execution: From single-agent Q&A to complex Magentic orchestrations, every agent follows the same execution lifecycle and integrates seamlessly with MCP and the state store. Multi-Agent Patterns: Workflow and Coordination With the architecture and core concepts established, we can now explore the agentic patterns implemented in the Contoso chatbot. Each pattern builds on the base agent and MCP integration but differs in how agents orchestrate tasks and communicate with one another to handle multi-domain customer queries. In the sections that follow, we take a deeper dive into each pattern’s workflow and examine the under-the-hood communication flows between agents: Single Agent – A simple, single-domain agent handling straightforward queries. Reflection Agent – Allows agents to introspect and refine their outputs. Handoff Pattern – Routes conversations intelligently to specialized agents across domains. Magentic Orchestration – Coordinates multiple specialist agents for complex, parallel tasks. For each pattern, the focus will be on how agents communicate and coordinate, showing the practical orchestration mechanisms in action. Single Intelligent Agent The Single Agent Pattern represents the simplest orchestration style within the framework. Here, a single autonomous agent handles all reasoning, decision-making, and tool interactions directly — without delegation or multi-agent coordination. When a user submits a request, the single agent processes the query using all tools, memory, and data sources available through the Model Context Protocol (MCP). It performs retrieval, reasoning, and response composition in a single, cohesive loop. Communication Flow: User Input → Agent: The user submits a question or command. Agent → MCP Tools: The agent invokes one or more tools (e.g., vector retrieval, structured queries, or API calls) to gather relevant context and data. Agent → User: The agent synthesizes the tool outputs, applies reasoning, and generates the final response to the user. Session Memory: Throughout the exchange, the agent stores conversation history and extracted entities in the configured memory store (in-memory or Cosmos DB). Key Communication Principles: Single Responsibility: One agent performs both reasoning and action, ensuring fast response times and simpler state management. Direct Tool Invocation: The agent has direct access to all registered tools through MCP, enabling flexible retrieval and action chaining. Stateful Execution: The session memory preserves dialogue context, allowing the agent to maintain continuity across user turns. Deterministic Behavior: The workflow is fully predictable — input, reasoning, tool call, and output occur in a linear sequence. Reflection pattern The Reflection Pattern introduces a lightweight, two-agent communication loop designed to improve the quality and reliability of responses through structured self-review. In this setup, a Primary Agent first generates an initial response to the user’s query. This draft is then passed to a Reviewer Agent, whose role is to critique and refine the response—identifying gaps, inaccuracies, or missed context. Finally, the Primary Agent incorporates this feedback and produces a polished final answer for the user. This process introduces one round of reflection and improvement without adding excessive latency, balancing quality with responsiveness. Communication Flow: User Input → Primary Agent: The user submits a query. Primary Agent → Reviewer Agent: The primary generates an initial draft and passes it to the reviewer. Reviewer Agent → Primary Agent: The reviewer provides feedback or suggested improvements. Primary Agent → User: The primary revises its response and sends the refined version back to the user. Key Communication Principles: Two-Stage Dialogue: Structured interaction between Primary and Reviewer ensures each output undergoes quality assurance. Focused Review: The Reviewer doesn’t recreate answers—it critiques and enhances, reducing redundancy. Stateful Context: Both agents operate over the same shared memory, ensuring consistency between draft and revision. Deterministic Flow: A single reflection round guarantees predictable latency while still improving answer quality. Transparent Traceability: Each step—initial draft, feedback, and final output—is logged, allowing developers to audit reasoning or assess quality improvements over time. In practice, this pattern enables the system to reason about its own output before responding, yielding clearer, more accurate, and policy-aligned answers without requiring multiple independent retries. Handoff Pattern When a user request arrives, the system first routes it through an Intent Classifier (or triage agent) to determine which domain specialist should handle the conversation. Once identified, control is handed off directly to that Specialist Agent, which uses its own tools, domain knowledge, and state context to respond. This specialist continues to handle the user interaction as long as the conversation stays within its domain. If the user’s intent shifts — for example, moving from billing to security — the conversation is routed back to the Intent Classifier, which re-assigns it to the correct specialist agent. This pattern reduces latency and maintains continuity by minimizing unnecessary routing. Each handoff is tracked through the shared state store, ensuring seamless context carry-over and full traceability of decisions. Key Communication Principles: Dynamic Routing: The Intent Classifier routes user input to the right specialist domain. Domain Persistence: The specialist remains active while the user stays within its domain. Context Continuity: Conversation history and entities persist across agents through the shared state store. Traceable Handoffs: Every routing decision is logged for observability and auditability. Low Latency: Responses are faster since domain-appropriate agents handle queries directly. In practice, this means a user could begin a conversation about billing, continue seamlessly, and only be re-routed when switching topics — without losing any conversational context or history. Magentic Pattern The Magentic Pattern is designed for open-ended, multi-faceted tasks that require multiple agents to collaborate. It introduces a Manager (Planner) Agent, which interprets the user’s goal, breaks it into subtasks, and orchestrates multiple Specialist Agents to execute those subtasks. The Manager creates and maintains a Task Ledger, which tracks the status, dependencies, and results of each specialist’s work. As specialists perform their tool calls or reasoning, the Manager monitors their progress, gathers intermediate outputs, and can dynamically re-plan, dispatch additional tasks, or adjust the overall workflow. When all subtasks are complete, the Manager synthesizes the combined results into a coherent final response for the user. Key Communication Principles: Centralized Orchestration: The Manager coordinates all agent interactions and workflow logic. Parallel and Sequential Execution: Specialists can work simultaneously or in sequence based on task dependencies. Task Ledger: Acts as a transparent record of all task assignments, updates, and completions. Dynamic Re-planning: The Manager can modify or extend workflows in real time based on intermediate findings. Shared Memory: All agents access the same state store for consistent context and result sharing. Unified Output: The Manager consolidates results into one response, ensuring coherence across multi-agent reasoning. In practice, Magentic orchestration enables complex reasoning where the system might combine insights from multiple agents — e.g., billing, product, and security — and present a unified recommendation or resolution to the user. Choosing the Right Agent for Your Use Case Selecting the appropriate agent pattern hinges on the complexity of the task and the level of coordination required. As use cases evolve from straightforward queries to intricate, multi-step processes, the need for specialized orchestration increases. Below is a decision matrix to guide your choice: Feature / Requirement Single Agent Reflection Agent Handoff Pattern Magentic Orchestration Handles simple, domain-bound tasks ✔ ✔ ✖ ✖ Supports review / quality assurance ✖ ✔ ✖ ✔ Multi-domain routing ✖ ✖ ✔ ✔ Open-ended / complex workflows ✖ ✖ ✖ ✔ Parallel agent collaboration ✖ ✖ ✖ ✔ Direct tool access ✔ ✔ ✔ ✔ Low latency / fast response ✔ ✔ ✔ ✖ Easy to implement / low orchestration ✔ ✔ ✖ ✖ Dive Deeper: Explore, Build, and Innovate We've explored various agent patterns, from Single Agent to Magentic Orchestration, each tailored to different use cases and complexities. To see these patterns in action, we invite you to explore our Github repo. Clone the repo, experiment with the examples, and adapt them to your own scenarios. Additionally, beyond the patterns discussed here, the repository also features a Human-in-the-Loop (HITL) workflow designed for fraud detection. This workflow integrates human oversight into AI decision-making, ensuring higher accuracy and reliability. For an in-depth look at this approach, we recommend reading our detailed blog post: Building Human-in-the-loop AI Workflows with Microsoft Agent Framework | Microsoft Community Hub Engage with these resources, and start building intelligent, reliable, and scalable AI systems today! This repository and content is developed and maintained by James Nguyen, Nicole Serafino, Kranthi Kumar Manchikanti, Heena Ugale, and Tim Sullivan.From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.¡Curso oficial y gratuito de GenAI y Python! 🚀
¿Quieres aprender a usar modelos de IA generativa en tus aplicaciones de Python?Estamos organizando una serie de nueve transmisiones en vivo, en inglés y español, totalmente dedicadas a la IA generativa. Vamos a cubrir modelos de lenguaje (LLMs), modelos de embeddings, modelos de visión, y también técnicas como RAG, function calling y structured outputs. Además, te mostraremos cómo construir Agentes y servidores MCP, y hablaremos sobre seguridad en IA y evaluaciones, para asegurarnos de que tus modelos y aplicaciones generen resultados seguros. 🔗 Regístrate para toda la serie. Además de las transmisiones en vivo, puedes unirte a nuestras office hours semanales en el AI Foundry Discord de para hacer preguntas que no se respondan durante el chat. ¡Nos vemos en los streams! 👋🏻 Here’s your HTML converted into clean, readable text format (perfect for a newsletter, blog post, or social media caption): Modelos de Lenguaje 📅 7 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡Únete a la primera sesión de nuestra serie de Python + IA! En esta sesión, hablaremos sobre los Modelos de Lenguaje (LLMs), los modelos que impulsan ChatGPT y GitHub Copilot. Usaremos Python para interactuar con LLMs utilizando paquetes como el SDK de OpenAI y Langchain. Experimentaremos con prompt engineering y ejemplos few-shot para mejorar los resultados. También construiremos una aplicación full stack impulsada por LLMs y explicaremos la importancia de la concurrencia y el streaming en apps de IA orientadas al usuario. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Embeddings Vectoriales 📅 8 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la segunda sesión de Python + IA, exploraremos los embeddings vectoriales, una forma de codificar texto o imágenes como arrays de números decimales. Estos modelos permiten realizar búsquedas por similitud en distintos tipos de contenido. Usaremos modelos como la serie text-embedding-3 de OpenAI, visualizaremos resultados en Python y compararemos métricas de distancia. También veremos cómo aplicar cuantización y cómo usar modelos multimodales de embedding. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Recuperación-Aumentada Generación (RAG) 📅 9 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la tercera sesión, exploraremos RAG, una técnica que envía contexto al LLM para obtener respuestas más precisas dentro de un dominio específico. Usaremos distintas fuentes de datos —CSVs, páginas web, documentos, bases de datos— y construiremos una app RAG full-stack con Azure AI Search. Modelos de Visión 📅 14 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡La cuarta sesión trata sobre modelos de visión como GPT-4o y 4o-mini! Estos modelos pueden procesar texto e imágenes, generando descripciones, extrayendo datos, respondiendo preguntas o clasificando contenido. Usaremos Python para enviar imágenes a los modelos, crear una app de chat con imágenes e integrarlos en flujos RAG. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Salidas Estructuradas 📅 15 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la quinta sesión aprenderemos a hacer que los LLMs generen respuestas estructuradas según un esquema. Exploraremos el modo structured outputs de OpenAI y cómo aplicarlo para extracción de entidades, clasificación y flujos con agentes. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Calidad y Seguridad 📅 16 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la sexta sesión hablaremos sobre cómo usar IA de manera segura y evaluar la calidad de las salidas. Mostraremos cómo configurar Azure AI Content Safety, manejar errores en código Python y evaluar resultados con el SDK de Evaluación de Azure AI. Tool Calling 📅 21 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la última semana de la serie, nos enfocamos en tool calling (function calling), la base para construir agentes de IA. Aprenderemos a definir herramientas en Python o JSON, manejar respuestas de los modelos y habilitar llamadas paralelas y múltiples iteraciones. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Agentes de IA 📅 22 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡En la penúltima sesión construiremos agentes de IA! Usaremos frameworks como Langgraph, Semantic Kernel, Autogen, y Pydantic AI. Empezaremos con ejemplos simples y avanzaremos a arquitecturas más complejas como round-robin, supervisor, graphs y ReAct. Model Context Protocol (MCP) 📅 23 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor Cerramos la serie con Model Context Protocol (MCP), la tecnología abierta más candente de 2025. Aprenderás a usar FastMCP para crear un servidor MCP local y conectarlo a chatbots como GitHub Copilot. También veremos cómo integrar MCP con frameworks de agentes como Langgraph, Semantic Kernel y Pydantic AI. Y, por supuesto, hablaremos sobre los riesgos de seguridad y las mejores prácticas para desarrolladores. ¿Querés que lo reformatee para publicación en Markdown (para blogs o repos) o en texto plano con emojis y separadores estilo redes sociales?