ai foundry
51 TopicsHow to Build Safe Natural Language-Driven APIs
TL;DR Building production natural language APIs requires separating semantic parsing from execution. Use LLMs to translate user text into canonical structured requests (via schemas), then execute those requests deterministically. Key patterns: schema completion for clarification, confidence gates to prevent silent failures, code-based ontologies for normalization, and an orchestration layer. This keeps language as input, not as your API contract. Introduction APIs that accept natural language as input are quickly becoming the norm in the age of agentic AI apps and LLMs. From search and recommendations to workflows and automation, users increasingly expect to "just ask" and get results. But treating natural language as an API contract introduces serious risks in production systems: Nondeterministic behavior Prompt-driven business logic Difficult debugging and replay Silent failures that are hard to detect In this post, I'll describe a production-grade architecture for building safe, natural language-driven APIs: one that embraces LLMs for intent discovery and entity extraction while preserving the determinism, observability, and reliability that backend systems require. This approach is based on building real systems using Azure OpenAI and LangGraph, and on lessons learned the hard way. The Core Problem with Natural Language APIs Natural language is an excellent interface for humans. It is a poor interface for systems. When APIs accept raw text directly and execute logic based on it, several problems emerge: The API contract becomes implicit and unversioned Small prompt changes cause behavioral changes Business logic quietly migrates into prompts In short: language becomes the contract, and that's fragile. The solution is not to avoid natural language, but to contain it. A Key Principle: Natural Language Is Input, Not a Contract So how do we contain it? The answer lies in treating natural language fundamentally differently than we treat traditional API inputs. The most important design decision we made was this: Natural language should be translated into structure, not executed directly. That single principle drives the entire architecture. Instead of building "chatty APIs," we split responsibilities clearly: Natural language is used for intent discovery and entity extraction Structured data is used for execution Two Explicit API Layers This principle translates into a concrete architecture with two distinct API layers, each with a single, clear responsibility. 1. Semantic Parse API (Natural Language → Structure) This API: Accepts user text Extracts intent and entities using LLMs Completes a predefined schema Asks clarifying questions when required Returns a canonical, structured request Does not execute business logic Think of this as a compiler, not an engine. 2. Structured Execution API (Structure → Action) This API: Accepts only structured input Calls downstream systems to process the request and get results Is deterministic and versioned Contains no natural language handling Is fully testable and replayable This is where execution happens. Why This Separation Matters Separating these layers gives you: A stable, versionable API contract Freedom to improve NLP without breaking clients Clear ownership boundaries Deterministic execution paths Most importantly, it prevents LLM behavior from leaking into core business logic. Canonical Schemas Are the Backbone Now that we've established the two-layer architecture, let's dive into what makes it work: canonical schemas. Each supported intent is defined by a canonical schema that lives in code. Example (simplified): This schema is used when a user is looking for similar product recommendations. The entities capture which product to use as reference and how to bias the recommendations toward price or quality. { "intent": "recommend_similar", "entities": { "reference_product_id": "string", "price_bias": "number (-1 to 1)", "quality_bias": "number (-1 to 1)" } } Schemas define: Required vs optional fields Allowed ranges and types Validation rules They are the contract, not the prompt. When a user says "show me products like the blue backpack but cheaper", the LLM extracts: Intent: recommend_similar reference_product_id: "blue_backpack_123" price_bias: -0.8 (strongly prefer cheaper) quality_bias: 0.0 (neutral) The schema ensures that even if the user phrased it as "find alternatives to item 123 with better pricing" or "cheaper versions of that blue bag", the output is always the same structure. The natural language variation is absorbed at the semantic layer. The execution layer receives a consistent, validated request every time. This decoupling is what makes the system maintainable. Schema Completion, Not Free-Form Chat But what happens when the user's input doesn't contain all the information needed to complete the schema? This is where structured clarification comes in. A common misconception is that clarification means "chatting until it feels right." In production systems, clarification is schema completion. If required fields are missing or ambiguous, the semantic API responds with: What information is missing A targeted clarification question The current schema state Example response: { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } The state object is the memory. The API itself remains stateless. A Complete Conversation Flow To illustrate how schema completion works in practice, here's a full conversation flow where the user's initial request is missing required information: Initial Request: User: "Show me cheaper alternatives with good quality" API Response (needs clarification): { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } Follow-up Request: User: "The blue backpack" Client sends: { "user_input": "The blue backpack", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } API Response (complete): { "status": "complete", "canonical_request": { "intent": "recommend_similar", "entities": { "reference_product_id": "blue_backpack_123", "price_bias": -0.3, "quality_bias": 0.4 } } } The client passes the state back with each clarification. The API remains stateless, while the client manages the conversation context. Once complete, the canonical_request can be sent directly to the execution API. Why LangGraph Fits This Problem Perfectly With schemas and clarification flows defined, we need a way to orchestrate the semantic parsing workflow reliably. This is where LangGraph becomes valuable. LangGraph allows semantic parsing to be modeled as a structured, deterministic workflow with explicit decision points: Classify intent: Determine what the user wants to do from a predefined set of supported actions Extract candidate entities: Pull out relevant parameters from the natural language input using the LLM Merge into schema state: Map the extracted values into the canonical schema structure Validate required fields: Check if all mandatory fields are present and values are within acceptable ranges Either complete or request clarification: Return the canonical request if complete, or ask a targeted question if information is missing Each node has a single responsibility. Validation and routing are done in code, not by the LLM. LangGraph provides: Explicit state transitions Deterministic routing Observable execution Safe retries Used this way, it becomes a powerful orchestration tool, not a conversational agent. Confidence Gates Prevent Silent Failures Structured workflows handle the process, but there's another critical safety mechanism we need: knowing when the LLM isn't confident about its extraction. Even when outputs are structurally valid, they may not be reliable. We require the semantic layer to emit a confidence score. If confidence falls below a threshold, execution is blocked and clarification is requested. This simple rule eliminates an entire class of silent misinterpretations that are otherwise very hard to detect. Example: When a user says "Show me items similar to the bag", the LLM might extract: { "intent": "recommend_similar", "confidence": 0.55, "entities": { "reference_product_id": "generic_bag_001", "confidence_scores": { "reference_product_id": 0.4 } } } The overall confidence is low (0.55), and the entity confidence for reference_product_id is very low (0.4) because "the bag" is ambiguous. There might be hundreds of bags in the catalog. Instead of proceeding with a potentially wrong guess, the API responds: { "status": "needs_clarification", "reason": "low_confidence", "question": "I found multiple bags. Did you mean the blue backpack, the leather tote, or the travel duffel?", "confidence": 0.55 } This prevents the system from silently executing the wrong recommendation and provides a better user experience. Lightweight Ontologies (Keep Them in Code) Beyond confidence scoring, we need a way to normalize the variety of terms users might use into consistent canonical values. We also introduced lightweight, code-level ontologies: Allowed intents Required entities per intent Synonym-to-canonical mappings Cross-field validation rules These live in code and configuration, not in prompts. LLMs propose values. Code enforces meaning. Example: Consider these user inputs that all mean the same thing: "Show me cheaper options" "Find budget-friendly alternatives" "I want something more affordable" "Give me lower-priced items" The LLM might extract different values: "cheaper", "budget-friendly", "affordable", "lower-priced". The ontology maps all of these to a canonical value: PRICE_BIAS_SYNONYMS = { "cheaper": -0.7, "budget-friendly": -0.7, "affordable": -0.7, "lower-priced": -0.7, "expensive": 0.7, "premium": 0.7, "high-end": 0.7 } When the LLM extracts "budget-friendly", the code normalizes it to -0.7 for the price_bias field. Similarly, cross-field validation catches logical inconsistencies: if entities["price_bias"] < -0.5 and entities["quality_bias"] > 0.5: return clarification("You want cheaper items with higher quality. This might be difficult. Should I prioritize price or quality?") The LLM proposes. The ontology normalizes. The validation enforces business rules. What About Latency? A common concern with multi-step semantic parsing is performance. In practice, we observed: Intent classification: ~40 ms Entity extraction: ~200 ms Validation and routing: ~1 ms Total overhead: ~250–300 ms. For chat-driven user experiences, this is well within acceptable bounds and far cheaper than incorrect or inconsistent execution. Key Takeaways Let's bring it all together. If you're building APIs that accept natural language in production: Do not make language your API contract Translate language into canonical structure Own schema completion server-side Use LLMs for discovery and extraction, not execution Treat safety and determinism as first-class requirements Natural language is an input format. Structure is the contract. Closing Thoughts LLMs make it easy to build impressive demos. Building safe, reliable systems with them requires discipline. By separating semantic interpretation from execution, and by using tools like Azure OpenAI and LangGraph thoughtfully, you can build natural language-driven APIs that scale, evolve, and behave predictably in production. Hopefully, this architecture saves you a few painful iterations.Benchmarking Local AI Models
Introduction Selecting the right AI model for your application requires more than reading benchmark leaderboards. Published benchmarks measure academic capabilities, question answering, reasoning, coding, but your application has specific requirements: latency budgets, hardware constraints, quality thresholds. How do you know if Phi-4 provides acceptable quality for your document summarization use case? Will Qwen2.5-0.5B meet your 100ms response time requirement? Does your edge device have sufficient memory for Phi-3.5 Mini? The answer lies in empirical testing: running actual models on your hardware with your workload patterns. This article demonstrates building a comprehensive model benchmarking platform using FLPerformance, Node.js, React, and Microsoft Foundry Local. You'll learn how to implement scientific performance measurement, design meaningful benchmark suites, visualize multi-dimensional comparisons, and make data-driven model selection decisions. Whether you're evaluating models for production deployment, optimizing inference costs, or validating hardware specifications, this platform provides the tools for rigorous performance analysis. Why Model Benchmarking Requires Purpose-Built Tools You cannot assess model performance by running a few manual tests and noting the results. Scientific benchmarking demands controlled conditions, statistically significant sample sizes, multi-dimensional metrics, and reproducible methodology. Understand why purpose-built tooling is essential. Performance is multi-dimensional. A model might excel at throughput (tokens per second) but suffer at latency (time to first token). Another might generate high-quality outputs slowly. Your application might prioritize consistency over average performance, a model with variable response times (high p95/p99 latency) creates poor user experiences even if averages look good. Measuring all dimensions simultaneously enables informed tradeoffs. Hardware matters enormously. Benchmark results from NVIDIA A100 GPUs don't predict performance on consumer laptops. NPU acceleration changes the picture again. Memory constraints affect which models can even load. Test on your actual deployment hardware or comparable specifications to get actionable results. Concurrency reveals bottlenecks. A model handling one request excellently might struggle with ten concurrent requests. Real applications experience variable load, measuring only single-threaded performance misses critical scalability constraints. Controlled concurrency testing reveals these limits. Statistical rigor prevents false conclusions. Running a prompt once and noting the response time tells you nothing about performance distribution. Was this result typical? An outlier? You need dozens or hundreds of trials to establish p50/p95/p99 percentiles, understand variance, and detect stability issues. Comparison requires controlled experiments. Different prompts, different times of day, different system loads, all introduce confounding variables. Scientific comparison runs identical workloads across models sequentially, controlling for external factors. Architecture: Three-Layer Performance Testing Platform FLPerformance implements a clean separation between orchestration, measurement, and presentation: The frontend React application provides model management, benchmark configuration, test execution, and results visualization. Users add models from the Foundry Local catalog, configure benchmark parameters (iterations, concurrency, timeout values), launch test runs, and view real-time progress. The results dashboard displays comparison tables, latency distribution charts, throughput graphs, and "best model for..." recommendations. The backend Node.js/Express server orchestrates tests and captures metrics. It manages the single Foundry Local service instance, loads/unloads models as needed, executes benchmark suites with controlled concurrency, measures comprehensive metrics (TTFT, TPOT, total latency, throughput, error rates), and persists results to JSON storage. WebSocket connections provide real-time progress updates during long benchmark runs. Foundry Local SDK integration uses the official foundry-local-sdk npm package. The SDK manages service lifecycle, starting, stopping, health checkin, and handles model operations, downloading, loading into memory, unloading. It provides OpenAI-compatible inference APIs for consistent request formatting across models. The architecture supports simultaneous testing of multiple models by loading them one at a time, running identical benchmarks, and aggregating results for comparison: User Initiates Benchmark Run ↓ Backend receives {models: [...], suite: "default", iterations: 10} ↓ For each model: 1. Load model into Foundry Local 2. Execute benchmark suite - For each prompt in suite: * Run N iterations * Measure TTFT, TPOT, total time * Track errors and timeouts * Calculate tokens/second 3. Aggregate statistics (mean, p50, p95, p99) 4. Unload model ↓ Store results with metadata ↓ Return comparison data to frontend ↓ Visualize performance metrics Implementing Scientific Measurement Infrastructure Accurate performance measurement requires instrumentation that captures multiple dimensions without introducing measurement overhead: // src/server/benchmark.js import { performance } from 'perf_hooks'; export class BenchmarkExecutor { constructor(foundryClient, options = {}) { this.client = foundryClient; this.options = { iterations: options.iterations || 10, concurrency: options.concurrency || 1, timeout_ms: options.timeout_ms || 30000, warmup_iterations: options.warmup_iterations || 2 }; } async runBenchmarkSuite(modelId, prompts) { const results = []; // Warmup phase (exclude from results) console.log(`Running ${this.options.warmup_iterations} warmup iterations...`); for (let i = 0; i < this.options.warmup_iterations; i++) { await this.executePrompt(modelId, prompts[0].text); } // Actual benchmark runs for (const prompt of prompts) { console.log(`Benchmarking prompt: ${prompt.id}`); const measurements = []; for (let i = 0; i < this.options.iterations; i++) { const measurement = await this.executeMeasuredPrompt( modelId, prompt.text ); measurements.push(measurement); // Small delay between iterations to stabilize await sleep(100); } results.push({ prompt_id: prompt.id, prompt_text: prompt.text, measurements, statistics: this.calculateStatistics(measurements) }); } return { model_id: modelId, timestamp: new Date().toISOString(), config: this.options, results }; } async executeMeasuredPrompt(modelId, promptText) { const measurement = { success: false, error: null, ttft_ms: null, // Time to first token tpot_ms: null, // Time per output token total_ms: null, tokens_generated: 0, tokens_per_second: 0 }; try { const startTime = performance.now(); let firstTokenTime = null; let tokenCount = 0; // Streaming completion to measure TTFT const stream = await this.client.chat.completions.create({ model: modelId, messages: [{ role: 'user', content: promptText }], max_tokens: 200, temperature: 0.7, stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { if (firstTokenTime === null) { firstTokenTime = performance.now(); measurement.ttft_ms = firstTokenTime - startTime; } tokenCount++; } } const endTime = performance.now(); measurement.total_ms = endTime - startTime; measurement.tokens_generated = tokenCount; if (tokenCount > 1 && firstTokenTime) { // TPOT = time after first token / (tokens - 1) const timeAfterFirstToken = endTime - firstTokenTime; measurement.tpot_ms = timeAfterFirstToken / (tokenCount - 1); measurement.tokens_per_second = 1000 / measurement.tpot_ms; } measurement.success = true; } catch (error) { measurement.error = error.message; measurement.success = false; } return measurement; } calculateStatistics(measurements) { const successful = measurements.filter(m => m.success); const total = measurements.length; if (successful.length === 0) { return { success_rate: 0, error_rate: 1.0, sample_size: total }; } const ttfts = successful.map(m => m.ttft_ms).sort((a, b) => a - b); const tpots = successful.map(m => m.tpot_ms).filter(v => v !== null).sort((a, b) => a - b); const totals = successful.map(m => m.total_ms).sort((a, b) => a - b); const throughputs = successful.map(m => m.tokens_per_second).filter(v => v > 0); return { success_rate: successful.length / total, error_rate: (total - successful.length) / total, sample_size: total, ttft: { mean: mean(ttfts), median: percentile(ttfts, 50), p95: percentile(ttfts, 95), p99: percentile(ttfts, 99), min: Math.min(...ttfts), max: Math.max(...ttfts) }, tpot: tpots.length > 0 ? { mean: mean(tpots), median: percentile(tpots, 50), p95: percentile(tpots, 95) } : null, total_latency: { mean: mean(totals), median: percentile(totals, 50), p95: percentile(totals, 95), p99: percentile(totals, 99) }, throughput: { mean_tps: mean(throughputs), median_tps: percentile(throughputs, 50) } }; } } function mean(arr) { return arr.reduce((sum, val) => sum + val, 0) / arr.length; } function percentile(sortedArr, p) { const index = Math.ceil((sortedArr.length * p) / 100) - 1; return sortedArr[Math.max(0, index)]; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } This measurement infrastructure captures: Time to First Token (TTFT): Critical for perceived responsiveness—users notice delays before output begins Time Per Output Token (TPOT): Determines generation speed after first token—affects throughput Total latency: End-to-end time—matters for batch processing and high-volume scenarios Tokens per second: Overall throughput metric—useful for capacity planning Statistical distributions: Mean alone masks variability—p95/p99 reveal tail latencies that impact user experience Success/error rates: Stability metrics—some models timeout or crash under load Designing Meaningful Benchmark Suites Benchmark quality depends on prompt selection. Generic prompts don't reflect real application behavior. Design suites that mirror actual use cases: // benchmarks/suites/default.json { "name": "default", "description": "General-purpose benchmark covering diverse scenarios", "prompts": [ { "id": "short-factual", "text": "What is the capital of France?", "category": "factual", "expected_tokens": 5 }, { "id": "medium-explanation", "text": "Explain how photosynthesis works in 3-4 sentences.", "category": "explanation", "expected_tokens": 80 }, { "id": "long-reasoning", "text": "Analyze the economic factors that led to the 2008 financial crisis. Discuss at least 5 major causes with supporting details.", "category": "reasoning", "expected_tokens": 250 }, { "id": "code-generation", "text": "Write a Python function that finds the longest palindrome in a string. Include docstring and example usage.", "category": "coding", "expected_tokens": 150 }, { "id": "creative-writing", "text": "Write a short story (3 paragraphs) about a robot learning to paint.", "category": "creative", "expected_tokens": 200 } ] } This suite covers multiple dimensions: Length variation: Short (5 tokens), medium (80), long (250)—tests models across output ranges Task diversity: Factual recall, explanation, reasoning, code, creative—reveals capability breadth Token predictability: Expected token counts enable throughput calculations For production applications, create custom suites matching your actual workload: { "name": "customer-support", "description": "Simulates actual customer support queries", "prompts": [ { "id": "product-question", "text": "How do I reset my password for the customer portal?" }, { "id": "troubleshooting", "text": "I'm getting error code 503 when trying to upload files. What should I do?" }, { "id": "policy-inquiry", "text": "What is your refund policy for annual subscriptions?" } ] } Visualizing Multi-Dimensional Performance Comparisons Raw numbers don't reveal insights—visualization makes patterns obvious. The frontend implements several comparison views: Comparison Table shows side-by-side metrics: // frontend/src/components/ResultsTable.jsx export function ResultsTable({ results }) { return ( {results.map(result => ( ))} Model TTFT (ms) TPOT (ms) Throughput (tok/s) P95 Latency Error Rate {result.model_id} {result.stats.ttft.median.toFixed(0)} (p95: {result.stats.ttft.p95.toFixed(0)}) {result.stats.tpot?.median.toFixed(1) || 'N/A'} {result.stats.throughput.median_tps.toFixed(1)} {result.stats.total_latency.p95.toFixed(0)} ms 0.05 ? 'error' : 'success'}> {(result.stats.error_rate * 100).toFixed(1)}% ); } Latency Distribution Chart reveals performance consistency: // Using Chart.js for visualization export function LatencyChart({ results }) { const data = { labels: results.map(r => r.model_id), datasets: [ { label: 'Median (p50)', data: results.map(r => r.stats.total_latency.median), backgroundColor: 'rgba(75, 192, 192, 0.5)' }, { label: 'p95', data: results.map(r => r.stats.total_latency.p95), backgroundColor: 'rgba(255, 206, 86, 0.5)' }, { label: 'p99', data: results.map(r => r.stats.total_latency.p99), backgroundColor: 'rgba(255, 99, 132, 0.5)' } ] }; return ( ); } Recommendations Engine synthesizes multi-dimensional comparison: export function generateRecommendations(results) { const recommendations = []; // Find fastest TTFT (best perceived responsiveness) const fastestTTFT = results.reduce((best, r) => r.stats.ttft.median < best.stats.ttft.median ? r : best ); recommendations.push({ category: 'Fastest Response', model: fastestTTFT.model_id, reason: `Lowest median TTFT: ${fastestTTFT.stats.ttft.median.toFixed(0)}ms` }); // Find highest throughput const highestThroughput = results.reduce((best, r) => r.stats.throughput.median_tps > best.stats.throughput.median_tps ? r : best ); recommendations.push({ category: 'Best Throughput', model: highestThroughput.model_id, reason: `Highest tok/s: ${highestThroughput.stats.throughput.median_tps.toFixed(1)}` }); // Find most consistent (lowest p95-p50 spread) const mostConsistent = results.reduce((best, r) => { const spread = r.stats.total_latency.p95 - r.stats.total_latency.median; const bestSpread = best.stats.total_latency.p95 - best.stats.total_latency.median; return spread < bestSpread ? r : best; }); recommendations.push({ category: 'Most Consistent', model: mostConsistent.model_id, reason: 'Lowest latency variance (p95-p50 spread)' }); return recommendations; } Key Takeaways and Benchmarking Best Practices Effective model benchmarking requires scientific methodology, comprehensive metrics, and application-specific testing. FLPerformance demonstrates that rigorous performance measurement is accessible to any development team. Critical principles for model evaluation: Test on target hardware: Results from cloud GPUs don't predict laptop performance Measure multiple dimensions: TTFT, TPOT, throughput, consistency all matter Use statistical rigor: Single runs mislead—capture distributions with adequate sample sizes Design realistic workloads: Generic benchmarks don't predict your application's behavior Include warmup iterations: Model loading and JIT compilation affect early measurements Control concurrency: Real applications handle multiple requests—test at realistic loads Document methodology: Reproducible results require documented procedures and configurations The complete benchmarking platform with model management, measurement infrastructure, visualization dashboards, and comprehensive documentation is available at github.com/leestott/FLPerformance. Clone the repository and run the startup script to begin evaluating models on your hardware. Resources and Further Reading FLPerformance Repository - Complete benchmarking platform Quick Start Guide - Setup and first benchmark run Microsoft Foundry Local Documentation - SDK reference and model catalog Architecture Guide - System design and SDK integration Benchmarking Best Practices - Methodology and troubleshootingBuilding Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.Published agent from Foundry doesn't work at all in Teams and M365
I've switched to the new version of Azure AI Foundry (New) and created a project there. Within this project, I created an Agent and connected two custom MCP servers to it. The agent works correctly inside Foundry Playground and responds to all test queries as expected. My goal was to make this agent available for my organization in Microsoft Teams / Microsoft 365 Copilot, so I followed all the steps described in the official Microsoft documentation: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/publish-copilot?view=foundry Issue description The first problems started at Step 8 (Publishing the agent). Organization scope publishing I published the agent using Organization scope. The agent appeared in Microsoft Admin Center in the list of agents. However, when an administrator from my organization attempted to approve it, the approval always failed with a generic error: “Sorry, something went wrong” No diagnostic information, error codes, or logs were provided. We tried recreating and republishing the agent multiple times, but the result was always the same. Shared scope publishing As a workaround, I published the agent using Shared scope. In this case, the agent finally appeared in Microsoft Teams and Microsoft 365 Copilot. I can now see the agent here: Microsoft Teams → Copilot Microsoft Teams → Applications → Manage applications However, this revealed the main issue. Main problem The published agent cannot complete any query in Teams, despite the fact that: The agent works perfectly in Foundry Playground The agent responds correctly to the same prompts before publishing In Teams, every query results in messages such as: “Sorry, something went wrong. Try to complete a query later.” Simplification test To exclude MCP or instruction-related issues, I performed the following: Disabled all MCP tools Removed all complex instructions Left only a minimal system prompt: “When the user types 123, return 456” I then republished the agent. The agent appeared in Teams again, but the behavior did not change — it does not respond at all. Permissions warning in Teams When I go to: Teams → Applications → Manage Applications → My agent → View details I see a red warning label: “Permissions needed. Ask your IT admin to add InfoConnect Agent to this team/chat/meeting.” This message is confusing because: The administrator has already added all required permissions All relevant permissions were granted in Microsoft Entra ID Admin consent was provided Because of this warning, I also cannot properly share the agent with my colleagues. Additional observation I have a similar agent configured in Copilot Studio: It shows the same permissions warning However, that agent still responds correctly in Teams It can also successfully call some MCP tools This suggests that the issue is specific to Azure AI Foundry agents, not to Teams or tenant-wide permissions in general. Steps already taken to resolve the issue Configured all required RBAC roles in Azure Portal according to: https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/rbac-foundry?view=foundry-classic During publishing, an agent-bot application was automatically created. I added my account to this bot with the Azure AI User role I also assigned Azure AI User to: The project’s Managed Identity The project resource itself Verified all permissions related to AI agents publishing in: Microsoft Admin Center Microsoft Teams Admin Center Simplified and republished the agent multiple times Deleted the automatically created agent-bot and allowed Foundry to recreate it Created a new Foundry project, configured several simple agents, and published them — the same issue occurs Tried publishing with different models: gpt-4.1, o4-mini Manually configured permissions in: Microsoft Entra ID → App registrations / Enterprise applications → API permissions Added both Delegated and Application permissions and granted Admin consent Added myself and my colleagues as Azure AI User in: Foundry → Project → Project users Followed all steps mentioned in this related discussion: https://techcommunity.microsoft.com/discussions/azure-ai-foundry-discussions/unable-to-publish-foundry-agent-to-m365-copilot-or-teams/4481420 Questions How can I make a Foundry agent work correctly in Microsoft Teams? Why does the agent fail to process requests in Teams while working correctly in Foundry? What does the “Permissions needed” warning actually mean for Foundry agents? How can I properly share the agent with other users in my organization? Any guidance, diagnostics, or clarification on the correct publishing and permission model for Foundry agents in Teams would be greatly appreciated.SolvedMicrosoft Foundry for VS Code: January 2026 Update
Enhanced Workflow and Agent Experience The January 2026 update for Microsoft Foundry extension in VS Code brings a follow update to the capabilities we introduced during Ignite of last year. We’re excited to announce a set of powerful updates that make building and managing AI workflows in Azure AI Foundry even more seamless. These enhancements are designed to give developers greater flexibility, visibility, and control when working with multi-agent systems and workflows. Support for Multiple Workflows in the Visualizer Managing complex AI solutions often involves multiple workflows. With this update, the Workflow Visualizer now supports viewing and navigating multiple workflows in a single project. This makes it easier to design, debug, and optimize interconnected workflows without switching contexts. View and Test Prompt Agents in the Playground Prompt agents are a critical part of orchestrating intelligent behaviors. You can now view all prompt agents directly in the Playground and test them interactively. This feature helps you validate agent logic and iterate quickly, ensuring your prompts deliver the desired outcomes. Open Code files Transparency and customization are key for developers. We’ve introduced the ability to open sample code files for all agents, including: Prompt agents YAML-based workflows Hosted agents Foundry classic agents This gives you the ability to programmatically run agents, enabling adding these agents into your existing project. Separated Resource View for v1 and v2 Agents To reduce confusion and improve clarity, we’ve introduced a separated resource view for Foundry Classic resources and agents. This makes it simple to distinguish between legacy and new-generation agents, ensuring you always know which resources you’re working with. How to Get Started Download the extension here: Microsoft Foundry in VS Code Marketplace Get started with building agents and workflows with Microsoft Foundry in VS Code MS Learn Docs Feedback & Support These improvements are part of our ongoing commitment to deliver a developer-first experience in Microsoft Foundry. Whether you’re orchestrating multi-agent workflows or fine-tuning prompt logic, these features help you build smarter, faster, and with greater confidence. Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension. Your input helps us make continuous improvements.🚀 AI Toolkit for VS Code: January 2026 Update
Happy New Year! 🎆 We are kicking off 2026 with a major set of updates designed to streamline how you build, test, and deploy AI agents. This month, we’ve focused on aligning with the latest GitHub Copilot standards, introducing powerful new debugging tools, and enhancing our support for enterprise-grade models via Microsoft Foundry. 💡 From Copilot Instructions to Agent Skills The biggest architectural shift following the latest VS Code Copilot standards, in v0.28.1 is the transition from Copilot Instructions to Copilot Skills. This transition has equipped GitHub Copilot specialized skills on developing AI agents using Microsoft Foundry and Agent Framework in a cost-efficient way. In AI Toolkit, we have migrated our Copilot Tools from the Custom Instructions to Agent Skills. This change allows for a more capable integration within GitHub Copilot Chat. 🔄 Enhanced AIAgentExpert: Our custom agent now has a deeper understanding of workflow code generation and evaluation planning/execution. 🧹Automatic Migration: When you upgrade to v0.28.1, the toolkit will automatically clean up your old instructions to ensure a seamless transition to the new skills-based framework. 🏗️ Major Enhancements to Agent Development Our v0.28.0 milestone release brought significant improvements to how agents are authored and authenticated. 🔒 Anthropic & Entra Auth Support We’ve expanded the Agent Builder and Playground to support Anthropic models using Entra Auth types. This provides enterprise developers with a more secure way to leverage Claude models within the Agent Framework while maintaining strict authentication standards. 🏢 Foundry-First Development We are prioritizing the Microsoft Foundry ecosystem to provide a more robust development experience: Foundry v2: Code generation for agents now defaults to Foundry v2. ⚡ Eval Tool: You can now generate evaluation code directly within the toolkit to create and run evaluations in Microsoft Foundry. 📊 Model Catalog: We’ve optimized the Model Catalog to prioritize Foundry models and improved general loading performance. 🏎️ 💻 Performance and Local Models For developers building on Windows, we continue to optimize the local model experience: Profiling for Windows ML: Version 0.28.0 introduces profiling features for Windows ML-based local models, allowing you to monitor performance and resource utilization directly within VS Code. Platform Optimization: To keep the interface clean, we’ve removed the Windows AI API tab from the Model Catalog when running on Linux and macOS platforms. 🐛 Squashing Bugs & Polishing the Experience Codespaces Fix: Resolved a crash occurring when selecting images in the Playground while using GitHub Codespaces. Resource Management: Fixed a delay where newly added models wouldn't immediately appear in the "My Resources" view. Claude Compatibility: Fixed an issue where non-empty content was required for Claude models when used via the AI Toolkit in GitHub Copilot. 🚀 Getting Started Ready to experience the future of AI development? Here's how to get started: 📥 Download: Install the AI Toolkit from the Visual Studio Code Marketplace 📖 Learn: Explore our comprehensive AI Toolkit Documentation 🔍 Discover: Check out the complete changelog for v0.24.0 We'd love to hear from you! Whether it's a feature request, bug report, or feedback on your experience, join the conversation and contribute directly on our GitHub repository. Happy Coding! 💻✨From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.Building a Multi-Agent System with Azure AI Agent Service: Campus Event Management
Personal Background My name is Peace Silly. I studied French and Spanish at the University of Oxford, where I developed a strong interest in how language is structured and interpreted. That curiosity about syntax and meaning eventually led me to computer science, which I came to see as another language built on logic and structure. In the academic year 2024–2025, I completed the MSc Computer Science at University College London, where I developed this project as part of my Master’s thesis. Project Introduction Can large-scale event management be handled through a simple chat interface? This was the question that guided my Master’s thesis project at UCL. As part of the Industry Exchange Network (IXN) and in collaboration with Microsoft, I set out to explore how conversational interfaces and autonomous AI agents could simplify one of the most underestimated coordination challenges in campus life: managing events across multiple departments, societies, and facilities. At large universities, event management is rarely straightforward. Rooms are shared between academic timetables, student societies, and one-off events. A single lecture theatre might host a departmental seminar in the morning, a society meeting in the afternoon, and a careers talk in the evening, each relying on different systems, staff, and communication chains. Double bookings, last-minute cancellations, and maintenance issues are common, and coordinating changes often means long email threads, manual spreadsheets, and frustrated users. These inefficiencies do more than waste time; they directly affect how a campus functions day to day. When venues are unavailable or notifications fail to reach the right people, even small scheduling errors can ripple across entire departments. A smarter, more adaptive approach was needed, one that could manage complex workflows autonomously while remaining intuitive and human for end users. The result was the Event Management Multi-Agent System, a cloud-based platform where staff and students can query events, book rooms, and reschedule activities simply by chatting. Behind the scenes, a network of Azure-powered AI agents collaborates to handle scheduling, communication, and maintenance in real time, working together to keep the campus running smoothly. The user scenario shown in the figure below exemplifies the vision that guided the development of this multi-agent system. Starting with Microsoft Learning Resources I began my journey with Microsoft’s tutorial Build Your First Agent with Azure AI Foundry which introduced the fundamentals of the Azure AI Agent Service and provided an ideal foundation for experimentation. Within a few weeks, using the Azure Foundry environment, I extended those foundations into a fully functional multi-agent system. Azure Foundry’s visual interface was an invaluable learning space. It allowed me to deploy, test, and adjust model parameters such as temperature, system prompts, and function calling while observing how each change influenced the agents’ reasoning and collaboration. Through these experiments, I developed a strong conceptual understanding of orchestration and coordination before moving to the command line for more complex development later. When development issues inevitably arose, I relied on the Discord support community and the GitHub forum for troubleshooting. These communities were instrumental in addressing configuration issues and providing practical examples, ensuring that each agent performed reliably within the shared-thread framework. This early engagement with Microsoft’s learning materials not only accelerated my technical progress but also shaped how I approached experimentation, debugging, and iteration. It transformed a steep learning curve into a structured, hands-on process that mirrored professional software development practice. A Decentralised Team of AI Agents The system’s intelligence is distributed across three specialised agents, powered by OpenAI’s GPT-4.1 models through Azure OpenAI Service. They each perform a distinct role within the event management workflow: Scheduling Agent – interprets natural language requests, checks room availability, and allocates suitable venues. Communications Agent – notifies stakeholders when events are booked, modified, or cancelled. Maintenance Agent – monitors room readiness, posts fault reports when venues become unavailable, and triggers rescheduling when needed. Each agent operates independently but communicates through a shared thread, a transparent message log that serves as the coordination backbone. This thread acts as a persistent state space where agents post updates, react to changes, and maintain a record of every decision. For example, when a maintenance fault is detected, the Maintenance Agent logs the issue, the Scheduling Agent identifies an alternative venue, and the Communications Agent automatically notifies attendees. These interactions happen autonomously, with each agent responding to the evolving context recorded in the shared thread. Interfaces and Backend The system was designed with both developer-focused and user-facing interfaces, supporting rapid iteration and intuitive interaction. The Terminal Interface Initially, the agents were deployed and tested through a terminal interface, which provided a controlled environment for debugging and verifying logic step by step. This setup allowed quick testing of individual agents and observation of their interactions within the shared thread. The Chat Interface As the project evolved, I introduced a lightweight chat interface to make the system accessible to staff and students. This interface allows users to book rooms, query events, and reschedule activities using plain language. Recognising that some users might still want to see what happens behind the scenes, I added an optional toggle that reveals the intermediate steps of agent reasoning. This transparency feature proved valuable for debugging and for more technical users who wanted to understand how the agents collaborated. When a user interacts with the chat interface, they are effectively communicating with the Scheduling Agent, which acts as the primary entry point. The Scheduling Agent interprets natural-language commands such as “Book the Engineering Auditorium for Friday at 2 PM” or “Reschedule the robotics demo to another room.” It then coordinates with the Maintenance and Communications Agents to complete the process. Behind the scenes, the chat interface connects to a FastAPI backend responsible for core logic and data access. A Flask + HTMX layer handles lightweight rendering and interactivity, while the Azure AI Agent Service manages orchestration and shared-thread coordination. This combination enables seamless agent communication and reliable task execution without exposing any of the underlying complexity to the end user. Automated Notifications and Fault Detection Once an event is scheduled, the Scheduling Agent posts the confirmation to the shared thread. The Communications Agent, which subscribes to thread updates, automatically sends notifications to all relevant stakeholders by email. This ensures that every participant stays informed without any manual follow-up. The Maintenance Agent runs routine availability checks. If a fault is detected, it logs the issue to the shared thread, prompting the Scheduling Agent to find an alternative room. The Communications Agent then notifies attendees of the change, ensuring minimal disruption to ongoing events. Testing and Evaluation The system underwent several layers of testing to validate both functional and non-functional requirements. Unit and Integration Tests Backend reliability was evaluated through unit and integration tests to ensure that room allocation, conflict detection, and database operations behaved as intended. Automated test scripts verified end-to-end workflows for event creation, modification, and cancellation across all agents. Integration results confirmed that the shared-thread orchestration functioned correctly, with all test cases passing consistently. However, coverage analysis revealed that approximately 60% of the codebase was tested, leaving some areas such as Azure service integration and error-handling paths outside automated validation. These trade-offs were deliberate, balancing test depth with project scope and the constraints of mocking live dependencies. Azure AI Evaluation While functional testing confirmed correctness, it did not capture the agents’ reasoning or language quality. To assess this, I used Azure AI Evaluation, which measures conversational performance across metrics such as relevance, coherence, fluency, and groundedness. The results showed high scores in relevance (4.33) and groundedness (4.67), confirming the agents’ ability to generate accurate and context-aware responses. However, slightly lower fluency scores and weaker performance in multi-turn tasks revealed a retrieval–execution gap typical in task-oriented dialogue systems. Limitations and Insights The evaluation also surfaced several key limitations: Synthetic data: All tests were conducted with simulated datasets rather than live campus systems, limiting generalisability. Scalability: A non-functional requirement in the form of horizontal scalability was not tested. The architecture supports scaling conceptually but requires validation under heavier load. Despite these constraints, the testing process confirmed that the system was both technically reliable and linguistically robust, capable of autonomous coordination under normal conditions. The results provided a realistic picture of what worked well and what future iterations should focus on improving. Impact and Future Work This project demonstrates how conversational AI and multi-agent orchestration can streamline real operational processes. By combining Azure AI Agent Services with modular design principles, the system automates scheduling, communication, and maintenance while keeping the user experience simple and intuitive. The architecture also establishes a foundation for future extensions: Predictive maintenance to anticipate venue faults before they occur. Microsoft Teams integration for seamless in-chat scheduling. Scalability testing and real-user trials to validate performance at institutional scale. Beyond its technical results, the project underscores the potential of multi-agent systems in real-world coordination tasks. It illustrates how modularity, transparency, and intelligent orchestration can make everyday workflows more efficient and human-centred. Acknowledgements What began with a simple Microsoft tutorial evolved into a working prototype that reimagines how campuses could manage their daily operations through conversation and collaboration. This was both a challenging and rewarding journey, and I am deeply grateful to Professor Graham Roberts (UCL) and Professor Lee Stott (Microsoft) for their guidance, feedback, and support throughout the project.498Views4likes1CommentAI Upskilling Framework Level 3 Building
The Global AI Community is excited to bring you the latest updates on AI Upskilling Framework Level 3 Building, straight from Microsoft Ignite! This session dives deep into advanced concepts for building agentic workflows and showcases new announcements that will help developers accelerate their Agentic AI journey.Exploring the Future of AI Agents with Microsoft Foundry
Why Agentic AI Matters AI agents are no longer a distant vision—they’re here and transforming how businesses operate. According to industry analysts: Over 1 billion AI agents are expected to be in use by 2028. 80% of organisations plan to integrate agents within the next 2–3 years. By 2026, 40% of enterprise apps will include task-specific AI agents. Why this surge? Agents address critical challenges such as inefficiencies in manual processes, human error, lack of visibility, and scalability issues. They enable autonomous decision-making, with projections suggesting that by 2028, half of day-to-day work decisions will be made autonomously. From Chatbots to Intelligent Agents As Mary Joe highlighted, early chatbots relied on rigid rules and regular expressions, often leading to frustrating user experiences. The introduction of large language models (LLMs) changed the game, making interactions more natural. But true autonomy, where systems act on our behalf, required more than conversational AI. Agentic AI combines: Reasoning and planning capabilities. Tools and APIs for real-world actions. Memory for learning and improving over time. This evolution moves us beyond simple input-output interactions to intelligent systems that can execute workflows, validate data, and deliver outcomes. Microsoft Foundry: Your Platform for Building Agents Microsoft Foundry offers a Platform-as-a-Service (PaaS) approach for creating AI agents, striking a balance between control and ease of use. Key components include: Model Catalogue: Access models from OpenAI, Anthropic, Mistral, and more. Foundry Agent Service: Build and customise agents with integrated tools. Foundry IQ: Knowledge grounding for accurate responses. Control Plane: Ensures safety, trust, and observability in production. Whether you need full control (Infrastructure-as-a-Service) or simplicity (Software-as-a-Service via Copilot Studio), Foundry provides flexibility for diverse scenarios. What Makes an AI Solution Agentic? Unlike traditional AI apps that perform narrow tasks (e.g., extracting text from receipts), agentic solutions: Analyse inputs using LLMs and system instructions. Integrate tools for actions like file search, code execution, or API calls. Retain memory for contextual learning. Operate autonomously across workflows. Real-World Use Cases Agentic AI unlocks new possibilities across industries: Expense Management: Automate claims and approvals. Employee Onboarding: Personalised learning paths and skills navigation. Customer Support: Intelligent assistants for FAQs and troubleshooting. Data Analytics: Interactive insights and reporting with Fabric agents. Multi-agent systems can coordinate complex tasks, with specialised agents handling subtasks under a central orchestrator. Getting Started with Microsoft Foundry Creating your first agent is simple: Sign in at https://ai.azure.com and create a Foundry project. Select a model (e.g., GPT-4.1 mini) and configure deployment options. Customise instructions to define your agent’s persona and tasks. Add tools like file search or code interpreter for extended functionality. Test and iterate using the agent playground, then export code to Visual Studio Code for deployment. For detailed guidance, explore the https://learn.microsoft.com/training. Follow the skilling plan for this series Plans | Microsoft Learn Get started with AI Agents https://aka.ms/ai-agents-fundamentals Join the Community Stay connected and keep learning: Discord: Engage with developers building agents. https://aka.ms/foundry/discord GitHub Discussions: Share ideas and troubleshoot. https://aka.ms/foundrydevs Office Hours: Get direct support from product teams. Final Thoughts Agentic AI is reshaping the way we work, enabling systems to act, learn, and collaborate. With Microsoft Foundry, developers have the tools to build secure, scalable, and intelligent agents today not tomorrow. Join the sessions at https://aka.ms/AzureSkilling-Ignite/25