azure
197 TopicsOptimising AI Costs with Microsoft Foundry Model Router
Microsoft Foundry Model Router analyses each prompt in real-time and forwards it to the most appropriate LLM from a pool of underlying models. Simple requests go to fast, cheap models; complex requests go to premium ones, all automatically. I built an interactive demo app so you can see the routing decisions, measure latencies, and compare costs yourself. This post walks through how it works, what we measured, and when it makes sense to use. The Problem: One Model for Everything Is Wasteful Traditional deployments force a single choice: Strategy Upside Downside Use a small model Fast, cheap Struggles with complex tasks Use a large model Handles everything Overpay for simple tasks Build your own router Full control Maintenance burden; hard to optimise Most production workloads are mixed-complexity. Classification, FAQ look-ups, and data extraction sit alongside code analysis, multi-constraint planning, and long-document summarisation. Paying premium-model prices for the simple 40% is money left on the table. The Solution: Model Router Model Router is a trained language model deployed as a single Azure endpoint. For each incoming request it: Analyses the prompt — complexity, task type, context length Selects an underlying model from the routing pool Forwards the request and returns the response Exposes the choice via the response.model field You interact with one deployment. No if/else routing logic in your code. Routing Modes Mode Goal Trade-off Balanced (default) Best cost-quality ratio General-purpose Cost Minimise spend May use smaller models more aggressively Quality Maximise accuracy Higher cost for complex tasks Modes are configured in the Foundry Portal, no code change needed to switch. Building the Demo To make routing decisions tangible, we built a React + TypeScript app that sends the same prompt through both Model Router and a fixed standard deployment (e.g. GPT-5-nano), then compares: Which model the router selected Latency (ms) Token usage (prompt + completion) Estimated cost (based on per-model pricing) Select a prompt, choose a routing mode, and hit Run Both to compare side-by-side What You Can Do 10 pre-built prompts spanning simple classification to complex multi-constraint planning Custom prompt input enter any text and benchmarks run automatically Three routing modes switch and re-run to see how distribution changes Batch mode run all 10 prompts in one click to gather aggregate stats API Integration The integration is a standard Azure OpenAI chat completion call. The only difference is the deployment name ( model-router instead of a specific model): const response = await fetch( `${endpoint}/openai/deployments/model-router/chat/completions?api-version=2024-10-21`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ messages: [{ role: 'user', content: prompt }], max_completion_tokens: 1024, }), } ); const data = await response.json(); // The key insight: response.model reveals the underlying model const selectedModel = data.model; // e.g. "gpt-5-nano-2025-08-07" That data.model field is what makes cost tracking and distribution analysis possible. Results: What the Data Shows We ran all 10 prompts through both Model Router (Balanced mode) and a fixed standard deployment. Note: Results vary by run, region, model versions, and Azure load. These numbers are from a representative sample run. Side-by-side comparison across all 10 prompts in Balanced mode Summary Metric Router (Balanced) Standard (GPT-5-nano) Avg Latency ~7,800 ms ~7,700 ms Total Cost (10 prompts) ~$0.029 ~$0.030 Cost Savings ~4.5% — Models Used 4 1 Model Distribution The router used 4 different models across 10 prompts: Model Requests Share Typical Use gpt-5-nano 5 50% Classification, summarisation, planning gpt-5-mini 2 20% FAQ answers, data extraction gpt-oss-120b 2 20% Long-context analysis, creative tasks gpt-4.1-mini 1 10% Complex debugging & reasoning Routing distribution chart — the router favours efficient models for simple prompts Across All Three Modes Metric Balanced Cost-Optimised Quality-Optimised Cost Savings ~4.5% ~4.7% ~14.2% Avg Latency (Router) ~7,800 ms ~7,800 ms ~6,800 ms Avg Latency (Standard) ~7,700 ms ~7,300 ms ~8,300 ms Primary Goal Balance cost + quality Minimise spend Maximise accuracy Model Selection Mixed (4 models) Prefers cheaper Prefers premium Cost-optimised mode — routes more aggressively to nano/mini models Quality-optimised mode — routes to larger models for complex tasks Analysis What Worked Well Intelligent distribution The router didn't just default to one model. It used 4 different models and mapped prompt complexity to model capability: simple classification → nano, FAQ answers → mini, long-context documents → oss-120b, complex debugging → 4.1-mini. Measurable cost savings across all modes 4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode. Quality mode was the surprise winner by choosing faster, cheaper models for simple prompts, it actually saved the most while still routing complex requests to capable models. Zero routing logic in application code One endpoint, one deployment name. The complexity lives in Azure's infrastructure, not yours. Operational flexibility Switch between Balanced, Cost, and Quality modes in the Foundry Portal without redeploying your app. Need to cut costs for a high-traffic period? Switch to Cost mode. Need accuracy for a compliance run? Switch to Quality. Future-proofing As Azure adds new models to the routing pool, your deployment benefits automatically. No code changes needed. Trade-offs to Consider Latency is comparable, not always faster In Balanced mode, Router averaged ~7,800 ms vs Standard's ~7,700 ms nearly identical. In Quality mode, the Router was actually faster (~6,800 ms vs ~8,300 ms) because it chose more efficient models for simple prompts. The delta depends on which models the router selects. Savings scale with workload diversity Our 10-prompt test set showed 4.5–14.2% savings. Production workloads with a wider spread of simple vs complex prompts should see larger savings, since the router has more opportunity to route simple requests to cheaper models. Opaque routing decisions You can see which model was picked via response.model , but you can't see why. For most applications this is fine; for debugging edge cases you may want to test specific prompts in the demo first. Custom Prompt Testing One of the most practical features of the demo is testing your own prompts before committing to Model Router in production. Enter any prompt `the quantum computing example is a medium-complexity educational prompt` Benchmarks execute automatically, showing the selected model, latency, tokens, and cost Workflow: Click ✏️ Custom in the prompt selector Enter your production-representative prompt Click ✓ Use This Prompt — Router and Standard run automatically Compare results — repeat with different routing modes Use the data to inform your deployment strategy This lets you predict costs and validate routing behaviour with your actual workload before going to production. When to Use Model Router Great Fit Mixed-complexity workloads — chatbots, customer service, content pipelines Cost-sensitive deployments — where even single-digit percentage savings matter at scale Teams wanting simplicity — one endpoint beats managing multi-model routing logic Rapid experimentation — try new models without changing application code Consider Carefully Ultra-low-latency requirements — if you need sub-second responses, the routing overhead matters Single-task, single-model workloads — if one model is clearly optimal for 100% of your traffic, a router adds complexity without benefit Full control over model selection — if you need deterministic model choice per request Mode Selection Guide Is accuracy critical (compliance, legal, medical)? Is accuracy critical (compliance, legal, medical)? └─ YES → Quality-Optimised └─ NO → Strict budget constraints? └─ YES → Cost-Optimised └─ NO → Balanced (recommended) Best Practices Start with Balanced mode — measure actual results, then optimise Test with your real prompts — use the Custom Prompt feature to validate routing before production Monitor model distribution — track which models handle your traffic over time Compare against a baseline — always keep a standard deployment to measure savings Review regularly — as new models enter the routing pool, distributions shift Technical Stack Technology Purpose React 19 + TypeScript 5.9 UI and type safety Vite 7 Dev server and build tool Tailwind CSS 4 Styling Recharts 3 Distribution and comparison charts Azure OpenAI API (2024-10-21) Model Router and standard completions Security measures include an ErrorBoundary for crash resilience, sanitised API error messages, AbortController request timeouts, input length validation, and restrictive security headers. API keys are loaded from environment variables and gitignored. Source: leestott/router-demo-app: An interactive web application demonstrating the power of Microsoft Foundry Model Router - an intelligent routing system that automatically selects the optimal language model for each request based on complexity, reasoning requirements, and task type. ⚠️ This demo calls Azure OpenAI directly from the browser. This is fine for local development. For production, proxy through a backend and use Managed Identity. Try It Yourself Quick Start git clone https://github.com/leestott/router-demo-app/ cd router-demo-app # Option A: Use the setup script (recommended) # Windows: .\setup.ps1 -StartDev # macOS/Linux: chmod +x setup.sh && ./setup.sh --start-dev # Option B: Manual npm install cp .env.example .env.local # Edit .env.local with your Azure credentials npm run dev Open http://localhost:5173 , select a prompt, and click ⚡ Run Both. Get Your Credentials Go to ai.azure.com → open your project Copy the Project connection string (endpoint URL) Navigate to Deployments → confirm model-router is deployed Get your API key from Project Settings → Keys Configuration Edit .env.local : VITE_ROUTER_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_ROUTER_API_KEY=your-api-key VITE_ROUTER_DEPLOYMENT=model-router VITE_STANDARD_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_STANDARD_API_KEY=your-api-key VITE_STANDARD_DEPLOYMENT=gpt-5-nano Ideas for Enhancement Historical analysis — persist results to track routing trends over time Cost projections — estimate monthly spend based on prompt patterns and volume A/B testing framework — compare modes with statistical significance Streaming support — show model selection for streaming responses Export reports — download benchmark data as CSV/JSON for further analysis Conclusion Model Router addresses a real problem: most AI workloads have mixed complexity, but most deployments use a single model. By routing each request to the right model automatically, you get: Cost savings (~4.5–14.2% measured across modes, scaling with volume) Intelligent distribution (4 models used, zero routing code) Operational simplicity (one endpoint, mode changes via portal) Future-proofing (new models added to the pool automatically) The latency trade-off is minimal — in Quality mode, the Router was actually faster than the standard deployment. The real value is flexibility: tune for cost, quality, or balance without touching your code. Ready to try it? Clone the demo repository, plug in your Azure credentials, and test with your own prompts. Resources Model Router Benchmark Sample Sample App Model Router Concepts Official documentation Model Router How-To Deployment guide Microsoft Foundry Portal Deploy and manage Model Router in the Catalog Model listing Azure OpenAI Managed Identity Production auth Built to explore Model Router and share findings with the developer community. Feedback and contributions welcome, open an issue or PR on GitHub.Building a Privacy-First Hybrid AI Briefing Tool with Foundry Local and Azure OpenAI
Introduction Management consultants face a critical challenge: they need instant AI-powered insights from sensitive client documents, but traditional cloud-only AI solutions create unacceptable data privacy risks. Every document uploaded to a cloud API potentially exposes confidential client information, violates data residency requirements, and creates compliance headaches. The solution lies in a hybrid architecture that combines the speed and privacy of on-device AI with the sophistication of cloud models—but only when explicitly requested. This article walks through building a production-ready briefing assistant that runs AI inference locally first, then optionally refines outputs using Azure OpenAI for executive-quality presentations. We'll explore a sample implementation using FL-Client-Briefing-Assistant, built with Next.js 14, TypeScript, and Microsoft Foundry Local. You'll learn how to architect privacy-first AI applications, implement sub-second local inference, and design transparent hybrid workflows that give users complete control over their data. Why Hybrid AI Architecture Matters for Enterprise Applications Before diving into implementation details, let's understand why a hybrid approach is essential for enterprise AI applications, particularly in consulting and professional services. Cloud-only AI services like OpenAI's GPT-4 offer remarkable capabilities, but they introduce several critical challenges. First, every API call sends your data to external servers, creating audit trails and potential exposure points. For consultants handling merger documents, financial reports, or strategic plans, this is often a non-starter. Second, cloud APIs introduce latency, typically 2-5 seconds per request due to network round-trips and queue times. Third, costs scale linearly with usage, making high-volume document analysis expensive at scale. Local-only AI solves privacy and latency concerns but sacrifices quality. Small language models (SLMs) running on laptops produce quick summaries, but they lack the nuanced reasoning and polish needed for C-suite presentations. You get fast, private results that may require significant manual refinement. The hybrid approach gives you the best of both worlds: instant, private local processing as the default, with optional cloud refinement only when quality matters most. This architecture respects data privacy by default while maintaining the flexibility to produce executive-grade outputs when needed. Architecture Overview: Three-Layer Design for Privacy and Performance The FL-Client-Briefing-Assistant implements a clean three-layer architecture that separates concerns and ensures privacy at every level. At the frontend, a Next.js 14 application provides the user interface with strong TypeScript typing throughout. Users interact with four quick-action templates: document summarization, talking points generation, risk analysis, and executive summaries. The UI clearly indicates which model (local or cloud) processed each request, ensuring transparency. The middle tier consists of Next.js API routes that act as orchestration endpoints. These routes validate requests using Zod schemas, route to appropriate inference services, and enforce privacy settings. Critically, the API layer never persists user content unless explicitly opted in via privacy settings. The inference layer contains two distinct services. The local service uses Foundry Local SDK to communicate with a locally running Phi-4 model (or similar SLM). This provides sub-second inference, typical 500ms-1s response times, completely offline. The cloud service connects to Azure OpenAI using the official JavaScript SDK, accessed via Managed Identity or API keys, with proper timeout and retry logic. Setting Up Foundry Local for On-Device Inference Foundry Local is Microsoft's runtime for running AI models entirely on your device—no internet required, no data leaving your machine. Here's how to get it running for this application. First, install Foundry Local on Windows using Windows Package Manager: winget install Microsoft.FoundryLocal After installation, verify the service is ready: foundry service start foundry service status The status command will show you the service endpoint, typically running on a dynamic port like http://127.0.0.1:5272 . This port changes between restarts, so your application must query it programmatically. Next, load an appropriate model. For briefing tasks, Phi-4 Mini provides an excellent balance of quality and speed: foundry model load phi-4 The model downloads (approximately 3.6GB) and loads into memory. This takes 2-5 minutes on first run but persists between sessions. Once loaded, inference is nearly instant, most requests complete in under 1 second. In your application, configure the connection in .env.local : the port for foundry local is dynamic so please ensure you add the correct port. FOUNDRY_LOCAL_ENDPOINT=http://127.0.0.1:**** The application uses the Foundry Local SDK to query the running service: import { FoundryLocalClient } from 'foundry-local-sdk'; const client = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); const response = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: 'You are a professional consultant assistant.' }, { role: 'user', content: 'Summarize this document: ...' } ], max_tokens: 500, temperature: 0.3 }); This code demonstrates several best practices: Explicit model specification: Always name the model to ensure consistency across environments System message framing: Set the appropriate professional context for consulting use cases Conservative temperature: Use 0.3 for factual summarization tasks to reduce hallucination Token limits: Cap outputs to prevent excessive generation times and costs Implementing Privacy-First API Routes The Next.js API routes form the security boundary of the application. Every request must be validated, sanitized, and routed according to privacy settings before reaching inference services. Here's the core local inference route ( app/api/briefing/local/route.ts ): import { NextRequest, NextResponse } from 'next/server'; import { z } from 'zod'; import { FoundryLocalClient } from 'foundry-local-sdk'; const RequestSchema = z.object({ prompt: z.string().min(10).max(5000), template: z.enum(['summary', 'talking-points', 'risk-analysis', 'executive']), context: z.string().optional() }); export async function POST(request: NextRequest) { try { // Validate and parse request body const body = await request.json(); const validated = RequestSchema.parse(body); // Initialize Foundry Local client const client = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT! }); // Build system prompt based on template const systemPrompts = { 'summary': 'You are a consultant creating concise document summaries.', 'talking-points': 'You are preparing structured talking points for meetings.', 'risk-analysis': 'You are analyzing risks and opportunities systematically.', 'executive': 'You are crafting executive-level briefing notes.' }; // Execute local inference const startTime = Date.now(); const completion = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: systemPrompts[validated.template] }, { role: 'user', content: validated.prompt } ], temperature: 0.3, max_tokens: 500 }); const latency = Date.now() - startTime; // Return structured response with metadata return NextResponse.json({ content: completion.choices[0].message.content, model: 'phi-4 (local)', latency_ms: latency, tokens: completion.usage?.total_tokens, timestamp: new Date().toISOString() }); } catch (error) { if (error instanceof z.ZodError) { return NextResponse.json( { error: 'Invalid request format', details: error.errors }, { status: 400 } ); } console.error('Local inference error:', error); return NextResponse.json( { error: 'Inference failed', message: error.message }, { status: 500 } ); } } This implementation demonstrates several critical security and quality patterns: Request validation with Zod: Every field is type-checked and bounded before processing, preventing injection attacks and malformed inputs Template-based system prompts: Different use cases get optimized prompts, improving output quality and consistency Comprehensive error handling: Validation errors, inference failures, and network issues are caught and reported with appropriate HTTP status codes Performance tracking: Latency measurement enables monitoring and helps users understand response times Metadata enrichment: Responses include model attribution, token usage, and timestamps for auditing The cloud refinement route follows a similar pattern but adds privacy checks: export async function POST(request: NextRequest) { try { const body = await request.json(); const validated = RequestSchema.parse(body); // Check privacy settings from cookie/header const confidentialMode = request.cookies.get('confidential-mode')?.value === 'true'; if (confidentialMode) { return NextResponse.json( { error: 'Cloud refinement disabled in confidential mode' }, { status: 403 } ); } // Proceed with Azure OpenAI call only if privacy allows const client = new OpenAI({ apiKey: process.env.AZURE_OPENAI_KEY, baseURL: process.env.AZURE_OPENAI_ENDPOINT, defaultHeaders: { 'api-key': process.env.AZURE_OPENAI_KEY } }); const completion = await client.chat.completions.create({ model: process.env.AZURE_OPENAI_DEPLOYMENT!, messages: [/* ... */], temperature: 0.5, // Slightly higher for creative refinement max_tokens: 800 }); return NextResponse.json({ content: completion.choices[0].message.content, model: `${process.env.AZURE_OPENAI_DEPLOYMENT} (cloud)`, privacy_notice: 'Content processed by Azure OpenAI', // ... metadata }); } catch (error) { // Error handling } } The confidential mode check is crucial—it ensures that even if a user accidentally clicks the refinement button, no data leaves the device when privacy mode is enabled. This fail-safe design prevents data leakage through UI mistakes or automated workflows. Building the Frontend: Transparent Privacy Controls The user interface must make privacy decisions explicit and visible. Users need to understand which AI service processed their content and make informed choices about cloud refinement. The main briefing interface ( app/page.tsx ) implements this transparency through clear visual indicators: 'use client'; import { useState, useEffect } from 'react'; import { PrivacySettings } from '@/components/PrivacySettings'; export default function BriefingAssistant() { const [confidentialMode, setConfidentialMode] = useState(true); // Privacy by default const [content, setContent] = useState(''); const [result, setResult] = useState(null); const [loading, setLoading] = useState(false); // Load privacy preference from localStorage useEffect(() => { const saved = localStorage.getItem('confidential-mode'); if (saved !== null) { setConfidentialMode(saved === 'true'); } }, []); async function generateBriefing(template: string, useCloud: boolean = false) { if (useCloud && confidentialMode) { alert('Cloud refinement is disabled in confidential mode. Adjust settings to enable.'); return; } setLoading(true); const endpoint = useCloud ? '/api/briefing/cloud' : '/api/briefing/local'; try { const response = await fetch(endpoint, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: content, template }) }); const data = await response.json(); setResult({ ...data, processedBy: useCloud ? 'cloud' : 'local' }); } catch (error) { console.error('Briefing generation failed:', error); } finally { setLoading(false); } } return ( <div className="briefing-assistant"> <header> <h1>Client Briefing Assistant</h1> <div className="status-bar"> <span className={confidentialMode ? 'confidential' : 'standard'}> {confidentialMode ? '🔒 Confidential Mode' : '🌐 Standard Mode'} </span> <PrivacySettings confidentialMode={confidentialMode} onChange={setConfidentialMode} /> </div> </header> <div className="quick-actions"> <button onClick={() => generateBriefing('summary')}> 📄 Summarize Document </button> <button onClick={() => generateBriefing('talking-points')}> 💬 Generate Talking Points </button> <button onClick={() => generateBriefing('risk-analysis')}> 🎯 Risk Analysis </button> <button onClick={() => generateBriefing('executive')}> 📊 Executive Summary </button> </div> <textarea value={content} onChange={(e) => setContent(e.target.value)} placeholder="Paste client document or meeting notes here..." /> {result && ( <div className="result-card"> <div className="result-header"> <span className="model-badge">{result.model}</span> <span className="latency">{result.latency_ms}ms</span> </div> <div className="result-content">{result.content}</div> {result.processedBy === 'local' && !confidentialMode && ( <button onClick={() => generateBriefing(result.template, true)} className="refine-btn" > ✨ Refine for Executive Presentation </button> )} </div> )} </div> ); } This interface design embodies several principles of responsible AI UX: Privacy by default: Confidential mode is enabled unless explicitly changed, ensuring accidental cloud usage requires multiple intentional actions Clear attribution: Every result shows which model generated it and how long it took, building user trust through transparency Conditional refinement: The cloud refinement button only appears when privacy allows and local inference has completed, preventing premature cloud requests Persistent settings: Privacy preferences save to localStorage, respecting user choices across sessions Visual status indicators: The header always shows current privacy mode with recognizable icons (🔒 for confidential, 🌐 for standard) Testing Privacy and Performance Requirements A privacy-first application demands rigorous testing to ensure data never leaks unintentionally. The project includes comprehensive test suites using Vitest for unit tests and Playwright for end-to-end scenarios. Here's a critical privacy test ( tests/privacy.test.ts ): import { describe, it, expect, beforeEach } from 'vitest'; import { TestUtils } from './utils/test-helpers'; describe('Privacy Controls', () => { let testUtils: TestUtils; beforeEach(() => { testUtils = new TestUtils(); testUtils.enableConfidentialMode(); }); it('should prevent cloud API calls when confidential mode is enabled', async () => { const response = await testUtils.requestBriefing({ template: 'summary', prompt: 'Confidential merger document...', cloud: true }); expect(response.status).toBe(403); expect(response.error).toContain('disabled in confidential mode'); }); it('should allow local inference in confidential mode', async () => { const response = await testUtils.requestBriefing({ template: 'summary', prompt: 'Confidential merger document...', cloud: false }); expect(response.status).toBe(200); expect(response.model).toContain('local'); expect(response.content).toBeTruthy(); }); it('should not persist sensitive content without opt-in', async () => { await testUtils.requestBriefing({ template: 'executive', prompt: 'Strategic acquisition plan...', cloud: false }); const history = await testUtils.getConversationHistory(); expect(history).toHaveLength(0); // No storage by default }); it('should support opt-in history with explicit consent', async () => { testUtils.enableHistorySaving(); await testUtils.requestBriefing({ template: 'executive', prompt: 'Strategic acquisition plan...', cloud: false }); const history = await testUtils.getConversationHistory(); expect(history).toHaveLength(1); expect(history[0].prompt).toContain('acquisition'); }); }); Performance testing ensures local inference meets the sub-second requirement: describe('Performance SLA', () => { it('should complete local inference in under 1 second', async () => { const samples = []; for (let i = 0; i < 10; i++) { const start = Date.now(); await testUtils.requestBriefing({ template: 'summary', prompt: 'Standard 500-word document...', cloud: false }); samples.push(Date.now() - start); } const p95 = calculatePercentile(samples, 95); expect(p95).toBeLessThan(1000); // 95th percentile under 1s }); it('should handle 5 concurrent requests without degradation', async () => { const requests = Array(5).fill(null).map(() => testUtils.requestBriefing({ template: 'talking-points', prompt: 'Meeting agenda...', cloud: false }) ); const results = await Promise.all(requests); expect(results.every(r => r.status === 200)).toBe(true); expect(results.every(r => r.latency_ms < 2000)).toBe(true); }); }); These tests validate the core promise: local inference is fast, private, and reliable under realistic loads. Deployment Considerations and Production Readiness Moving from development to production requires addressing several operational concerns: model distribution, environment configuration, monitoring, and incident response. For Foundry Local deployment, ensure IT teams pre-install the runtime and required models on consultant laptops. Use MDM (Mobile Device Management) systems or Group Policy to automate model downloads during onboarding. Models can be cached in shared network locations to avoid redundant downloads across teams. Environment configuration should separate local and cloud credentials cleanly: # .env.local (local development) FOUNDRY_LOCAL_ENDPOINT=http://127.0.0.1:5272 AZURE_OPENAI_ENDPOINT=https://your-org.openai.azure.com AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini AZURE_OPENAI_KEY=your-key-here # For production, use Azure Managed Identity instead of API keys USE_MANAGED_IDENTITY=true Managed Identity eliminates API key management—the application authenticates using Azure AD, with permissions controlled via IAM policies. This prevents key leakage and simplifies rotation. Monitoring should track both local and cloud usage patterns. Implement structured logging with clear privacy labels: logger.info('Briefing generated', { model: 'local', template: 'summary', latency_ms: 847, tokens: 312, privacy_mode: 'confidential', user_id: hash(userId), // Never log raw user IDs timestamp: new Date().toISOString() }); This approach enables operational insights (average latency, most-used templates, error rates) without exposing sensitive content or user identities. For incident response, establish clear escalation paths. If Foundry Local fails, the application should gracefully degrade—inform users that local inference is unavailable and offer cloud-only mode (with explicit consent). If cloud services fail, local inference continues uninterrupted, ensuring the application remains useful even during Azure outages. Key Takeaways and Next Steps Building a privacy-first hybrid AI application requires careful architectural decisions that prioritize user data protection while maintaining high-quality outputs. The FL-Client-Briefing-Assistant demonstrates that you can achieve sub-second local inference, transparent privacy controls, and optional cloud refinement in a production-ready package. Key lessons from this implementation: Privacy must be the default, not an opt-in feature—confidential mode should require explicit action to disable Transparency builds trust—always show users which model processed their data and how long it took Fallback strategies ensure reliability—graceful degradation when services fail keeps the application useful Testing validates promises—comprehensive tests for privacy, performance, and functionality are non-negotiable Operational visibility without privacy leaks—structured logging enables monitoring without exposing sensitive content To extend this application, consider adding: Document parsing: Integrate PDF, DOCX, and PPTX extractors to analyze file uploads directly Multi-document synthesis: Combine insights from multiple client documents into unified briefings Custom templates: Allow consultants to define their own briefing formats and save them for reuse Offline mode indicators: Detect network connectivity and disable cloud features automatically Audit logging: For regulated industries, implement immutable audit trails showing when cloud refinement was used The full implementation, including all code, tests, and deployment guides, is available at github.com/leestott/FL-Client-Briefing-Assistant. Clone the repository, follow the setup guide, and experience privacy-first AI in action. Resources and Further Reading FL-Client-Briefing-Assistant Repository - Complete source code and documentation Microsoft Foundry Local Documentation - Official runtime documentation and API reference Azure OpenAI Service - Cloud refinement integration guide Project Specification - Detailed requirements and acceptance criteria Implementation Guide - Architecture decisions and design patterns Testing Guide - How to run and interpret comprehensive test suitesExploring Azure Face API: Facial Landmark Detection and Real-Time Analysis with C#
In today’s world, applications that understand and respond to human facial cues are no longer science fiction—they’re becoming a reality in domains like security, driver monitoring, gaming, and AR/VR. With Azure Face API, developers can leverage powerful cloud-based facial recognition and analysis tools without building complex machine learning models from scratch. In this blog, we’ll explore how to use C# to detect faces, identify key facial landmarks, estimate head pose, track eye and mouth movements, and process real-time video streams. Using OpenCV for visualization, we’ll show how to overlay landmarks, draw bounding boxes, and calculate metrics like Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR)—all in real time. You'll learn to: Set up Azure Face API Detect 27 facial landmarks Estimate head pose (yaw, pitch, roll) Calculate eye aspect ratio (EAR) and mouth openness Draw bounding boxes around features using OpenCV Process real-time video Prerequisites .NET 8 SDK installed Azure subscription with Face API resource Visual Studio 2022 or later Webcam for testing (optional) Basic understanding of C# and computer vision concepts Part 1: Azure Face API Setup 1.1 Install Required NuGet Packages dotnet add package Azure.AI.Vision.Face dotnet add package OpenCvSharp4 dotnet add package OpenCvSharp4.runtime.win 1.2 Create Azure Face API Resource Navigate to Azure Portal Search for "Face" and create a new Face API resource Choose your pricing tier (Free tier: 20 calls/min, 30K calls/month) Copy the Endpoint URL and API Key 1.3 Configure in .NET Application appsettings.json: { "Azure": { "FaceApi": { "Endpoint": "https://your-resource.cognitiveservices.azure.com/", "ApiKey": "your-api-key-here" } } } Initialize Face Client: using Azure; using Azure.AI.Vision.Face; using Microsoft.Extensions.Configuration; public class FaceAnalysisService { private readonly FaceClient _faceClient; private readonly ILogger<FaceAnalysisService> _logger; public FaceAnalysisService(ILogger<FaceAnalysisService> logger, IConfiguration configuration) { _logger = logger; string endpoint = configuration["Azure:FaceApi:Endpoint"]; string apiKey = configuration["Azure:FaceApi:ApiKey"]; _faceClient = new FaceClient(new Uri(endpoint), new AzureKeyCredential(apiKey)); _logger.LogInformation("FaceClient initialized with endpoint: {Endpoint}", endpoint); } } Part 2: Understanding Face Detection Models 2.1 Basic Face Detection public async Task<List<FaceDetectionResult>> DetectFacesAsync(byte[] imageBytes) { using var stream = new MemoryStream(imageBytes); var response = await _faceClient.DetectAsync( BinaryData.FromStream(stream), FaceDetectionModel.Detection03, FaceRecognitionModel.Recognition04, returnFaceId: false, returnFaceAttributes: new FaceAttributeType[] { FaceAttributeType.HeadPose }, returnFaceLandmarks: true, returnRecognitionModel: false ); _logger.LogInformation("Detected {Count} faces", response.Value.Count); return response.Value.ToList(); } Part 3: Facial Landmarks - The 27 Key Points 3.1 Understanding Facial Landmarks 3.2 Accessing Landmarks in Code public void PrintLandmarks(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; if (landmarks == null) { _logger.LogWarning("No landmarks detected"); return; } // Eye landmarks Console.WriteLine($"Left Eye Outer: ({landmarks.EyeLeftOuter.X}, {landmarks.EyeLeftOuter.Y})"); Console.WriteLine($"Left Eye Inner: ({landmarks.EyeLeftInner.X}, {landmarks.EyeLeftInner.Y})"); Console.WriteLine($"Left Eye Top: ({landmarks.EyeLeftTop.X}, {landmarks.EyeLeftTop.Y})"); Console.WriteLine($"Left Eye Bottom: ({landmarks.EyeLeftBottom.X}, {landmarks.EyeLeftBottom.Y})"); // Mouth landmarks Console.WriteLine($"Upper Lip Top: ({landmarks.UpperLipTop.X}, {landmarks.UpperLipTop.Y})"); Console.WriteLine($"Under Lip Bottom: ({landmarks.UnderLipBottom.X}, {landmarks.UnderLipBottom.Y})"); // Nose landmarks Console.WriteLine($"Nose Tip: ({landmarks.NoseTip.X}, {landmarks.NoseTip.Y})"); } 3.3 Visualizing All Landmarks public void DrawAllLandmarks(FaceLandmarks landmarks, Mat frame) { void DrawPoint(FaceLandmarkCoordinate point, Scalar color) { if (point != null) { Cv2.Circle(frame, new Point((int)point.X, (int)point.Y), radius: 3, color: color, thickness: -1); } } // Eyes (Green) DrawPoint(landmarks.EyeLeftOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftBottom, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightBottom, new Scalar(0, 255, 0)); // Eyebrows (Cyan) DrawPoint(landmarks.EyebrowLeftOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowLeftInner, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightInner, new Scalar(255, 255, 0)); // Nose (Yellow) DrawPoint(landmarks.NoseTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootLeft, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootRight, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseLeftAlarOutTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRightAlarOutTip, new Scalar(0, 255, 255)); // Mouth (Blue) DrawPoint(landmarks.UpperLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UpperLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthLeft, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthRight, new Scalar(255, 0, 0)); // Pupils (Red) DrawPoint(landmarks.PupilLeft, new Scalar(0, 0, 255)); DrawPoint(landmarks.PupilRight, new Scalar(0, 0, 255)); } Part 4: Drawing Bounding Boxes Around Features 4.1 Eye Bounding Boxes /// <summary> /// Draws rectangles around eyes using OpenCV. /// </summary> public void DrawEyeBoxes(FaceLandmarks landmarks, Mat frame) { int boxWidth = 60; int boxHeight = 35; // Calculate Rectangles var leftEyeRect = new Rect((int)landmarks.EyeLeftOuter.X - boxWidth / 2, (int)landmarks.EyeLeftOuter.Y - boxHeight / 2, boxWidth, boxHeight); var rightEyeRect = new Rect((int)landmarks.EyeRightOuter.X - boxWidth / 2, (int)landmarks.EyeRightOuter.Y - boxHeight / 2, boxWidth, boxHeight); // Draw Rectangles (Green in BGR) Cv2.Rectangle(frame, leftEyeRect, new Scalar(0, 255, 0), 2); Cv2.Rectangle(frame, rightEyeRect, new Scalar(0, 255, 0), 2); // Add Labels Cv2.PutText(frame, "Left Eye", new Point(leftEyeRect.X, leftEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); Cv2.PutText(frame, "Right Eye", new Point(rightEyeRect.X, rightEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); } 4.2 Mouth Bounding Box /// <summary> /// Draws rectangle around mouth region. /// </summary> public void DrawMouthBox(FaceLandmarks landmarks, Mat frame) { int boxWidth = 80; int boxHeight = 50; // Calculate center based on the vertical lip landmarks int centerX = (int)((landmarks.UpperLipTop.X + landmarks.UnderLipBottom.X) / 2); int centerY = (int)((landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2); var mouthRect = new Rect(centerX - boxWidth / 2, centerY - boxHeight / 2, boxWidth, boxHeight); // Draw Mouth Box (Blue in BGR) Cv2.Rectangle(frame, mouthRect, new Scalar(255, 0, 0), 2); // Add Label Cv2.PutText(frame, "Mouth", new Point(mouthRect.X, mouthRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(255, 0, 0), 1); } 4.3 Face Bounding Box /// <summary> /// Draws rectangle around entire face using the face rectangle from API. /// </summary> public void DrawFaceBox(FaceDetectionResult face, Mat frame) { var faceRect = face.FaceRectangle; if (faceRect == null) { return; } var rect = new Rect( faceRect.Left, faceRect.Top, faceRect.Width, faceRect.Height ); // Draw Face Bounding Box (Red in BGR) Cv2.Rectangle(frame, rect, new Scalar(0, 0, 255), 2); // Add Label with dimensions Cv2.PutText(frame, $"Face {faceRect.Width}x{faceRect.Height}", new Point(rect.X, rect.Y - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(0, 0, 255), 2); } 4.4 Nose Bounding Box /// <summary> /// Draws bounding box around nose using nose landmarks. /// </summary> public void DrawNoseBox(FaceLandmarks landmarks, Mat frame) { // Calculate horizontal bounds from Alar tips int minX = (int)Math.Min(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); int maxX = (int)Math.Max(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); // Calculate vertical bounds from Root to Tip int minY = (int)Math.Min(landmarks.NoseRootLeft.Y, landmarks.NoseTip.Y); int maxY = (int)landmarks.NoseTip.Y; // Create Rect with a 10px padding buffer var noseRect = new Rect( minX - 10, minY - 10, (maxX - minX) + 20, (maxY - minY) + 20 ); // Draw Nose Box (Yellow in BGR) Cv2.Rectangle(frame, noseRect, new Scalar(0, 255, 255), 2); } Part 5: Geometric Calculations with Landmarks 5.1 Calculating Euclidean Distance /// <summary> /// Calculates distance between two landmark points. /// </summary> public static double CalculateDistance(dynamic point1, dynamic point2) { double dx = point1.X - point2.X; double dy = point1.Y - point2.Y; return Math.Sqrt(dx * dx + dy * dy); } 5.2 Eye Aspect Ratio (EAR) Formula /// <summary> /// Calculates the Eye Aspect Ratio (EAR) to detect eye closure. /// </summary> public double CalculateEAR( FaceLandmarkCoordinate top1, FaceLandmarkCoordinate top2, FaceLandmarkCoordinate bottom1, FaceLandmarkCoordinate bottom2, FaceLandmarkCoordinate inner, FaceLandmarkCoordinate outer) { // Vertical distances double v1 = CalculateDistance(top1, bottom1); double v2 = CalculateDistance(top2, bottom2); // Horizontal distance double h = CalculateDistance(inner, outer); // EAR formula: (||p2-p6|| + ||p3-p5||) / (2 * ||p1-p4||) return (v1 + v2) / (2.0 * h); } Simplified Implementation: /// <summary> /// Calculates Eye Aspect Ratio (EAR) for a single eye. /// Reference: "Real-Time Eye Blink Detection using Facial Landmarks" (Soukupová & Čech, 2016) /// </summary> public double ComputeEAR(FaceLandmarks landmarks, bool isLeftEye) { var top = isLeftEye ? landmarks.EyeLeftTop : landmarks.EyeRightTop; var bottom = isLeftEye ? landmarks.EyeLeftBottom : landmarks.EyeRightBottom; var inner = isLeftEye ? landmarks.EyeLeftInner : landmarks.EyeRightInner; var outer = isLeftEye ? landmarks.EyeLeftOuter : landmarks.EyeRightOuter; if (top == null || bottom == null || inner == null || outer == null) { _logger.LogWarning("Missing eye landmarks"); return 1.0; // Return 1.0 (open) to prevent false positives for drowsiness } double verticalDist = CalculateDistance(top, bottom); double horizontalDist = CalculateDistance(inner, outer); // Simplified EAR for Azure 27-point model double ear = verticalDist / horizontalDist; _logger.LogDebug( "EAR for {Eye}: {Value:F3}", isLeftEye ? "left" : "right", ear ); return ear; } Usage Example: var leftEAR = ComputeEAR(landmarks, isLeftEye: true); var rightEAR = ComputeEAR(landmarks, isLeftEye: false); var avgEAR = (leftEAR + rightEAR) / 2.0; Console.WriteLine($"Average EAR: {avgEAR:F3}"); // Open eyes: ~0.25-0.30 // Closed eyes: ~0.10-0.15 5.3 Mouth Aspect Ratio (MAR) /// <summary> /// Calculates Mouth Aspect Ratio relative to face height. /// </summary> public double CalculateMouthAspectRatio(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthWidth = CalculateDistance(landmarks.MouthLeft, landmarks.MouthRight); double mouthOpenRatio = mouthHeight / faceRect.Height; double mouthWidthRatio = mouthWidth / faceRect.Width; _logger.LogDebug( "Mouth - Height ratio: {HeightRatio:F3}, Width ratio: {WidthRatio:F3}", mouthOpenRatio, mouthWidthRatio ); return mouthOpenRatio; } 5.4 Inter-Eye Distance /// <summary> /// Calculates the distance between pupils (inter-pupillary distance). /// </summary> public double CalculateInterEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.PupilLeft, landmarks.PupilRight); } /// <summary> /// Calculates distance between inner eye corners. /// </summary> public double CalculateInnerEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.EyeLeftInner, landmarks.EyeRightInner); } 5.5 Face Symmetry Analysis /// <summary> /// Analyzes facial symmetry by comparing left and right sides. /// </summary> public FaceSymmetryMetrics AnalyzeFaceSymmetry(FaceLandmarks landmarks) { double centerX = landmarks.NoseTip.X; double leftEyeDistance = CalculateDistance(landmarks.EyeLeftInner, new { X = centerX, Y = landmarks.EyeLeftInner.Y }); double leftMouthDistance = CalculateDistance(landmarks.MouthLeft, new { X = centerX, Y = landmarks.MouthLeft.Y }); double rightEyeDistance = CalculateDistance(landmarks.EyeRightInner, new { X = centerX, Y = landmarks.EyeRightInner.Y }); double rightMouthDistance = CalculateDistance(landmarks.MouthRight, new { X = centerX, Y = landmarks.MouthRight.Y }); return new FaceSymmetryMetrics { EyeSymmetryRatio = leftEyeDistance / rightEyeDistance, MouthSymmetryRatio = leftMouthDistance / rightMouthDistance, IsSymmetric = Math.Abs(leftEyeDistance - rightEyeDistance) < 5.0 }; } public class FaceSymmetryMetrics { public double EyeSymmetryRatio { get; set; } public double MouthSymmetryRatio { get; set; } public bool IsSymmetric { get; set; } } Part 6: Head Pose Estimation 6.1 Understanding Head Pose Angles Azure Face API provides three Euler angles for head orientation: 6.2 Accessing Head Pose Data public void AnalyzeHeadPose(FaceDetectionResult face) { var headPose = face.FaceAttributes?.HeadPose; if (headPose == null) { _logger.LogWarning("Head pose not available"); return; } double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; Console.WriteLine("Head Pose:"); Console.WriteLine($" Yaw: {yaw:F2}° (Left/Right)"); Console.WriteLine($" Pitch: {pitch:F2}° (Up/Down)"); Console.WriteLine($" Roll: {roll:F2}° (Tilt)"); InterpretHeadPose(yaw, pitch, roll); } 6.3 Interpreting Head Pose public string InterpretHeadPose(double yaw, double pitch, double roll) { var directions = new List<string>(); // Interpret Yaw (horizontal) if (Math.Abs(yaw) < 10) directions.Add("Looking Forward"); else if (yaw < -20) directions.Add($"Turned Left ({Math.Abs(yaw):F0}°)"); else if (yaw > 20) directions.Add($"Turned Right ({yaw:F0}°)"); // Interpret Pitch (vertical) if (Math.Abs(pitch) < 10) directions.Add("Level"); else if (pitch < -15) directions.Add($"Looking Down ({Math.Abs(pitch):F0}°)"); else if (pitch > 15) directions.Add($"Looking Up ({pitch:F0}°)"); // Interpret Roll (tilt) if (Math.Abs(roll) > 15) { string side = roll < 0 ? "Left" : "Right"; directions.Add($"Tilted {side} ({Math.Abs(roll):F0}°)"); } return string.Join(", ", directions); } 6.4 Visualizing Head Pose on Frame /// <summary> /// Draws head pose information with color-coded indicators. /// </summary> public void DrawHeadPoseInfo(Mat frame, HeadPose headPose, FaceRectangle faceRect) { double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; int centerX = faceRect.Left + faceRect.Width / 2; int centerY = faceRect.Top + faceRect.Height / 2; string poseText = $"Yaw: {yaw:F1}° Pitch: {pitch:F1}° Roll: {roll:F1}°"; Cv2.PutText(frame, poseText, new Point(faceRect.Left, faceRect.Top - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(255, 255, 255), 1); int arrowLength = 50; double yawRadians = yaw * Math.PI / 180.0; int arrowEndX = centerX + (int)(arrowLength * Math.Sin(yawRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(arrowEndX, centerY), new Scalar(0, 255, 0), 2, tipLength: 0.3); double pitchRadians = -pitch * Math.PI / 180.0; int arrowPitchEndY = centerY + (int)(arrowLength * Math.Sin(pitchRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(centerX, arrowPitchEndY), new Scalar(255, 0, 0), 2, tipLength: 0.3); } 6.5 Detecting Head Orientation States public enum HeadOrientation { Forward, Left, Right, Up, Down, TiltedLeft, TiltedRight, UpLeft, UpRight, DownLeft, DownRight } public List<HeadOrientation> DetectHeadOrientation(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; var orientations = new List<HeadOrientation>(); if (!lookingUp && !lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Forward); if (lookingUp && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Up); if (lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Down); if (lookingLeft && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Left); if (lookingRight && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Right); if (lookingUp && lookingLeft) orientations.Add(HeadOrientation.UpLeft); if (lookingUp && lookingRight) orientations.Add(HeadOrientation.UpRight); if (lookingDown && lookingLeft) orientations.Add(HeadOrientation.DownLeft); if (lookingDown && lookingRight) orientations.Add(HeadOrientation.DownRight); return orientations; } Part 7: Real-Time Video Processing 7.1 Setting Up Video Capture using OpenCvSharp; public class RealTimeFaceAnalyzer : IDisposable { private VideoCapture? _capture; private Mat? _frame; private readonly FaceClient _faceClient; private bool _isRunning; public async Task StartAsync() { _capture = new VideoCapture(0); _frame = new Mat(); _isRunning = true; await Task.Run(() => ProcessVideoLoop()); } private async Task ProcessVideoLoop() { while (_isRunning) { if (_capture == null || !_capture.IsOpened()) break; _capture.Read(_frame); if (_frame == null || _frame.Empty()) { await Task.Delay(1); // Minimal delay to prevent CPU spiking continue; } Cv2.Resize(_frame, _frame, new Size(640, 480)); // Ensure we don't await indefinitely in the rendering loop _ = ProcessFrameAsync(_frame.Clone()); Cv2.ImShow("Face Analysis", _frame); if (Cv2.WaitKey(30) == 'q') break; } Dispose(); } private async Task ProcessFrameAsync(Mat frame) { // This is where your DrawFaceBox, DrawAllLandmarks, and EAR logic will sit. // Remember to use try-catch here to prevent API errors from crashing the loop. } public void Dispose() { _isRunning = false; _capture?.Dispose(); _frame?.Dispose(); Cv2.DestroyAllWindows(); } } 7.2 Optimizing API Calls Problem: Calling Azure Face API on every frame (30 fps) is expensive and slow. Solution: Call API once per second, cache results for 30 frames. private List<FaceDetectionResult> _cachedFaces = new(); private DateTime _lastDetectionTime = DateTime.MinValue; private readonly object _cacheLock = new(); private async Task ProcessFrameAsync(Mat frame) { if ((DateTime.Now - _lastDetectionTime).TotalSeconds >= 1.0) { _lastDetectionTime = DateTime.Now; byte[] imageBytes; Cv2.ImEncode(".jpg", frame, out imageBytes); var faces = await DetectFacesAsync(imageBytes); lock (_cacheLock) { _cachedFaces = faces; } } List<FaceDetectionResult> facesToProcess; lock (_cacheLock) { facesToProcess = _cachedFaces.ToList(); } foreach (var face in facesToProcess) { DrawFaceAnnotations(face, frame); } } Performance Improvement: 30x fewer API calls (1/sec instead of 30/sec) ~$0.02/hour instead of ~$0.60/hour Smooth 30 fps rendering < 100ms latency for visual updates 7.3 Drawing Complete Face Annotations private void DrawFaceAnnotations(FaceDetectionResult face, Mat frame) { DrawFaceBox(face, frame); if (face.FaceLandmarks != null) { DrawAllLandmarks(face.FaceLandmarks, frame); DrawEyeBoxes(face.FaceLandmarks, frame); DrawMouthBox(face.FaceLandmarks, frame); DrawNoseBox(face.FaceLandmarks, frame); double leftEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: true); double rightEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; Cv2.PutText(frame, $"EAR: {avgEAR:F3}", new Point(10, 30), HersheyFonts.HersheySimplex, 0.6, new Scalar(0, 255, 0), 2); } if (face.FaceAttributes?.HeadPose != null) { DrawHeadPoseInfo(frame, face.FaceAttributes.HeadPose, face.FaceRectangle); string orientation = InterpretHeadPose(face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll); Cv2.PutText(frame, orientation, new Point(10, 60), HersheyFonts.HersheySimplex, 0.6, new Scalar(255, 255, 0), 2); } } Part 8: Advanced Features and Use Cases 8.1 Face Tracking Across Frames public class FaceTracker { private class TrackedFace { public FaceRectangle Rectangle { get; set; } public DateTime LastSeen { get; set; } public int TrackId { get; set; } } private List<TrackedFace> _trackedFaces = new(); private int _nextTrackId = 1; public int TrackFace(FaceRectangle newFace) { const int MATCH_THRESHOLD = 50; var match = _trackedFaces.FirstOrDefault(tf => { double distance = Math.Sqrt(Math.Pow(tf.Rectangle.Left - newFace.Left, 2) + Math.Pow(tf.Rectangle.Top - newFace.Top, 2)); return distance < MATCH_THRESHOLD; }); if (match != null) { match.Rectangle = newFace; match.LastSeen = DateTime.Now; return match.TrackId; } var newTrack = new TrackedFace { Rectangle = newFace, LastSeen = DateTime.Now, TrackId = _nextTrackId++ }; _trackedFaces.Add(newTrack); return newTrack.TrackId; } public void RemoveOldTracks(TimeSpan maxAge) { _trackedFaces.RemoveAll(tf => DateTime.Now - tf.LastSeen > maxAge); } } 8.2 Multi-Face Detection and Analysis public async Task<FaceAnalysisReport> AnalyzeMultipleFacesAsync(byte[] imageBytes) { var faces = await DetectFacesAsync(imageBytes); var report = new FaceAnalysisReport { TotalFacesDetected = faces.Count, Timestamp = DateTime.Now, Faces = new List<SingleFaceAnalysis>() }; for (int i = 0; i < faces.Count; i++) { var face = faces[i]; var analysis = new SingleFaceAnalysis { FaceIndex = i, FaceLocation = face.FaceRectangle, FaceSize = face.FaceRectangle.Width * face.FaceRectangle.Height }; if (face.FaceLandmarks != null) { analysis.LeftEyeEAR = ComputeEAR(face.FaceLandmarks, true); analysis.RightEyeEAR = ComputeEAR(face.FaceLandmarks, false); analysis.InterPupillaryDistance = CalculateInterEyeDistance(face.FaceLandmarks); } if (face.FaceAttributes?.HeadPose != null) { analysis.HeadYaw = face.FaceAttributes.HeadPose.Yaw; analysis.HeadPitch = face.FaceAttributes.HeadPose.Pitch; analysis.HeadRoll = face.FaceAttributes.HeadPose.Roll; } report.Faces.Add(analysis); } report.Faces = report.Faces.OrderByDescending(f => f.FaceSize).ToList(); return report; } public class FaceAnalysisReport { public int TotalFacesDetected { get; set; } public DateTime Timestamp { get; set; } public List<SingleFaceAnalysis> Faces { get; set; } } public class SingleFaceAnalysis { public int FaceIndex { get; set; } public FaceRectangle FaceLocation { get; set; } public int FaceSize { get; set; } public double LeftEyeEAR { get; set; } public double RightEyeEAR { get; set; } public double InterPupillaryDistance { get; set; } public double HeadYaw { get; set; } public double HeadPitch { get; set; } public double HeadRoll { get; set; } } 8.3 Exporting Landmark Data to JSON using System.Text.Json; public string ExportLandmarksToJson(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; var landmarkData = new { Face = new { Rectangle = new { face.FaceRectangle.Left, face.FaceRectangle.Top, face.FaceRectangle.Width, face.FaceRectangle.Height } }, Eyes = new { Left = new { Outer = new { landmarks.EyeLeftOuter.X, landmarks.EyeLeftOuter.Y }, Inner = new { landmarks.EyeLeftInner.X, landmarks.EyeLeftInner.Y }, Top = new { landmarks.EyeLeftTop.X, landmarks.EyeLeftTop.Y }, Bottom = new { landmarks.EyeLeftBottom.X, landmarks.EyeLeftBottom.Y } }, Right = new { Outer = new { landmarks.EyeRightOuter.X, landmarks.EyeRightOuter.Y }, Inner = new { landmarks.EyeRightInner.X, landmarks.EyeRightInner.Y }, Top = new { landmarks.EyeRightTop.X, landmarks.EyeRightTop.Y }, Bottom = new { landmarks.EyeRightBottom.X, landmarks.EyeRightBottom.Y } } }, Mouth = new { UpperLipTop = new { landmarks.UpperLipTop.X, landmarks.UpperLipTop.Y }, UnderLipBottom = new { landmarks.UnderLipBottom.X, landmarks.UnderLipBottom.Y }, Left = new { landmarks.MouthLeft.X, landmarks.MouthLeft.Y }, Right = new { landmarks.MouthRight.X, landmarks.MouthRight.Y } }, Nose = new { Tip = new { landmarks.NoseTip.X, landmarks.NoseTip.Y }, RootLeft = new { landmarks.NoseRootLeft.X, landmarks.NoseRootLeft.Y }, RootRight = new { landmarks.NoseRootRight.X, landmarks.NoseRootRight.Y } }, HeadPose = face.FaceAttributes?.HeadPose != null ? new { face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll } : null }; return JsonSerializer.Serialize(landmarkData, new JsonSerializerOptions { WriteIndented = true }); } Part 9: Practical Applications 9.1 Gaze Direction Estimation public enum GazeDirection { Center, Left, Right, Up, Down, UpLeft, UpRight, DownLeft, DownRight } public GazeDirection EstimateGazeDirection(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; if (lookingUp && lookingLeft) return GazeDirection.UpLeft; if (lookingUp && lookingRight) return GazeDirection.UpRight; if (lookingDown && lookingLeft) return GazeDirection.DownLeft; if (lookingDown && lookingRight) return GazeDirection.DownRight; if (lookingUp) return GazeDirection.Up; if (lookingDown) return GazeDirection.Down; if (lookingLeft) return GazeDirection.Left; if (lookingRight) return GazeDirection.Right; return GazeDirection.Center; } 9.2 Expression Analysis Using Landmarks public class ExpressionAnalyzer { public bool IsSmiling(FaceLandmarks landmarks) { double mouthCenterY = (landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2; double leftCornerY = landmarks.MouthLeft.Y; double rightCornerY = landmarks.MouthRight.Y; return leftCornerY < mouthCenterY && rightCornerY < mouthCenterY; } public bool IsMouthOpen(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthOpenRatio = mouthHeight / faceRect.Height; return mouthOpenRatio > 0.08; // 8% of face height } public bool AreEyesClosed(FaceLandmarks landmarks) { double leftEAR = ComputeEAR(landmarks, isLeftEye: true); double rightEAR = ComputeEAR(landmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; return avgEAR < 0.18; // Threshold for closed eyes } } 9.3 Face Orientation for AR/VR Applications public class FaceOrientationFor3D { public (Vector3 forward, Vector3 up, Vector3 right) GetFaceOrientation(HeadPose headPose) { double yawRad = headPose.Yaw * Math.PI / 180.0; double pitchRad = headPose.Pitch * Math.PI / 180.0; double rollRad = headPose.Roll * Math.PI / 180.0; var forward = new Vector3((float)(Math.Sin(yawRad) * Math.Cos(pitchRad)), (float)(-Math.Sin(pitchRad)), (float)(Math.Cos(yawRad) * Math.Cos(pitchRad))); var up = new Vector3((float)(Math.Sin(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) - Math.Cos(yawRad) * Math.Sin(rollRad)), (float)(Math.Cos(pitchRad) * Math.Cos(rollRad)), (float)(Math.Cos(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) + Math.Sin(yawRad) * Math.Sin(rollRad))); var right = Vector3.Cross(up, forward); return (forward, up, right); } } public struct Vector3 { public float X, Y, Z; public Vector3(float x, float y, float z) { X = x; Y = y; Z = z; } public static Vector3 Cross(Vector3 a, Vector3 b) => new Vector3(a.Y * b.Z - a.Z * b.Y, a.Z * b.X - a.X * b.Z, a.X * b.Y - a.Y * b.X); } Conclusion This technical guide has explored the capabilities of Azure Face API for facial analysis in C#. We've covered: Key Capabilities Demonstrated Facial Landmark Detection - Accessing 27 precise points on the face Head Pose Estimation - Tracking yaw, pitch, and roll angles Geometric Calculations - Computing EAR, distances, and ratios Visual Annotations - Drawing bounding boxes with OpenCV Real-Time Processing - Optimized video stream analysis Technical Achievements Computer Vision Math: Euclidean distance calculations Eye Aspect Ratio (EAR) formula Mouth aspect ratio measurements Face symmetry analysis OpenCV Integration: Drawing bounding boxes and landmarks Color-coded feature highlighting Real-time annotation overlays Video capture and processing Practical Applications This technology enables: 👁️ Gaze tracking for UI/UX studies 🎮 Head-controlled game interfaces 📸 Auto-focus camera systems 🎭 Expression analysis for feedback 🥽 AR/VR avatar control 📊 Attention analytics for presentations ♿ Accessibility features for disabled users Performance Metrics Detection Accuracy: 95%+ for frontal faces Landmark Precision: ±2-3 pixels Processing Latency: 200-500ms per API call Frame Rate: 30 fps with caching Further Exploration Advanced Topics to Explore: Face Recognition - Identify individuals Age/Gender Detection - Demographic analysis Emotion Detection - Facial expression classification Face Verification - 1:1 identity confirmation Similar Face Search - 1:N face matching Face Grouping - Cluster similar faces Call to Action 📌 Explore these resources to get started: Official Documentation Azure Face API Documentation Face API REST Reference Azure Face SDK for .NET Related Libraries OpenCVSharp - OpenCV wrapper for .NET System.Drawing - .NET image processing Source Code GitHub Repository: ravimodi_microsoft/SmartDriver Sample Code: Included in this articleUpcoming webinar: Maximize the Cost Efficiency of AI Agents on Azure
AI agents are quickly becoming central to how organizations automate work, engage customers, and unlock new insights. But as adoption accelerates, so do questions about cost, ROI, and long-term sustainability. That’s exactly what the Maximize the Cost Efficiency of AI Agents on Azure webinar is designed to address. The webinar will provide practical guidance on building and scaling AI agents on Azure with financial discipline in mind. Rather than focusing only on technology, the session helps learners connect AI design decisions to real business outcomes—covering everything from identifying high-impact use cases and understanding cost drivers to forecasting ROI. Whether you’re just starting your AI journey or expanding AI agents across the enterprise, the session will equip you with strategies to make informed, cost-conscious decisions at every stage—from architecture and model selection to ongoing optimization and governance. Who should attend? If you are in one of these roles and are a decision maker or can influence decision makers in AI decisions or need to show ROI metrics on AI, this session is for you. Developer Administrator Solution Architect AI Engineer Business Analyst Business User Technology Manager Why attending the webinar? In the webinar, you’ll hear how to translate theory into real-world scenarios, walk through common cost pitfalls, and show how organizations are applying these principles in practice. Most importantly, the webinar helps you connect the dots faster, turning what you’ve learned into actionable insights you can apply immediately, ask questions live, and gain clarity on how to maximize ROI while scaling AI responsibly. If you care about building AI agents that are not only innovative but also efficient, governable, and financially sustainable, this training—and this webinar that complements it—are well worth your time. Register for the free webinar today for the event on March 5, 2026, 8:00 AM - 9:00 AM (UTC-08:00) Pacific Time (US & Canada). Who will speak at the webinar? Your speakers will be: Carlotta Castelluccio: Carlotta is a Senior AI Advocate with the mission of helping every developer to succeed with AI, by building innovative solutions responsibly. To achieve this goal, she develops technical content, and she hosts skilling sessions, enabling her audience to take the most out of AI technologies and to have an impact on Microsoft AI products’ roadmap. Nitya Narasimhan: Nitya is a PhD and Polyglot with 25+ years of software research & development experience spanning mobile, web, cloud and AI. She is an innovator (12+ patents), a visual storyteller (@sketchtedocs), and an experienced community builder in the Greater New York area. As a senior AI Advocate on the Core AI Developer Relations team, she acts as "developer 0" for the Microsoft Foundry platform, providing product feedback and empowering AI developers to build trustworthy AI solutions with code samples, open-source curricula and content-initiatives like Model Mondays. Prior to joining Microsoft, she spent a decade in Motorola Labs working on ubiquitous & mobile computing research, founded Google Developer Groups in New York, and consulted for startups building real-time experiences for enterprise. Her current interests span Model understanding & customization, E2E Observability & Safety, and agentic AI workflows for maintainable software. Moderator Lee Stott is a Principal Cloud Advocate at Microsoft, working in the Core AI Developer Relations Team. He helps developers and organizations build responsibly with AI and cloud technologies through open-source projects, technical guidance, and global developer programs. Based in the UK, Lee brings deep hands-on experience across AI, Azure, and developer tooling. .Agents League: Join the Reasoning Agents Track
In a previous blog post, we introduced Agents League, a two‑week AI agent challenge running February 16–27, and gave an overview of the three available tracks. In this post, we’ll zoom in on one of them in particular:🧠 The Reasoning Agents track, built on Microsoft Foundry. If you’re interested in multi‑step reasoning, planning, verification, and multi‑agent collaboration, this is the track designed for you. What Do We Mean by “Reasoning Agents”? Reasoning agents go beyond simple prompt–response interactions. They are agents that can: Plan how to approach a task Break problems into steps Reason across intermediate results Verify or critique their own outputs Collaborate with other agents to solve more complex problems With Microsoft Foundry (via UI or SDK) and/or the Microsoft Agent Framework, you can design agent systems that reflect real‑world decision‑making patterns—closer to how teams of humans work together. Why Build Reasoning Agents on Microsoft Foundry? Microsoft Foundry provides production‑ready building blocks for agentic systems, without locking you into a single way of working. For the Reasoning Agents track, Foundry enables you to: Define agent roles (planner, executor, verifier, critic, etc.) Orchestrate multi‑agent workflows Integrate tools, APIs, and MCP servers Apply structured reasoning patterns Observe and debug agent behavior as it runs You can work visually in the Foundry UI, programmatically via the SDK, or mix both approaches depending on your project. How to get started? Your first step to enter the arena is registering to the Agents League challenge: https://aka.ms/agentsleague/register. After you registered, navigate to the Reasoning Agent Starter Kit, to get more context about the challenge scenario, an example of multi-agent architecture to address it, along with some guidelines on the tech stack to use and useful resources to get started. There’s no single “correct” project, feel free to unleash your creativity and leverage AI-assisted development tools to accelerate your build process (e.g. GitHub Copilot). 👉 View the Reasoning Agents starter kit: https://github.com/microsoft/agentsleague/starter-kits Live Coding Battle: Reasoning Agents 📽️ Wednesday, Feb 18 – 9:00 AM PT During Week 1, we’re hosting a live coding battle dedicated entirely to the Reasoning Agents track. You’ll watch experienced developers from the community: Design agent architectures live Explain reasoning strategies and trade‑offs Make real‑time decisions about agent roles, tools, and flows The session is streamed on Microsoft Reactor and recorded, so you can watch it live (highly recommended for the best experience!) or later at your convenience. AMA Session on Discord 💬 Wednesday, Feb 25 – 9:00 AM PT In Week 2, it’s your turn to build—and ask questions. Join the Reasoning Agents AMA on Discord to: Ask about agent architecture and reasoning patterns Get clarification on Foundry capabilities Discuss MCP integration and multi‑agent design Get unstuck when your agent doesn’t behave as expected Prizes, Badges, and Recognition 🏆 $500 for the Reasoning Agents track winner 🎖️ Digital badge for everyone who registers and submits a project Important reminder: 👉 You must register before submitting to be eligible for prizes and the badge. Beyond the rewards, every participant receives feedback from Microsoft product teams, which is often the most valuable prize of all. Ready to Build Agents That Reason? If you’ve been curious about: Agentic architectures Multi‑step reasoning Verification and self‑reflection Building AI systems that explain their thinking …then the Reasoning Agents track is your arena. 📝 Register here: https://aka.ms/agentsleague/register 💬 Join Discord: https://aka.ms/agentsleague/discord 📽️ Watch live battles: https://aka.ms/agentsleague/battles The league starts February 16. The reasoning begins now.Building Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.Complete Guide to Deploying OpenClaw on Azure Windows 11 Virtual Machine
1. Introduction to OpenClaw OpenClaw is an open-source AI personal assistant platform that runs on your own devices and executes real-world tasks. Unlike traditional cloud-based AI assistants, OpenClaw emphasizes local deployment and privacy protection, giving you complete control over your data. Key Features of OpenClaw Cross-Platform Support: Runs on Windows, macOS, Linux, and other operating systems Multi-Channel Integration: Interact with AI through messaging platforms like WhatsApp, Telegram, and Discord Task Automation: Execute file operations, browser control, system commands, and more Persistent Memory: AI remembers your preferences and contextual information Flexible AI Backends: Supports multiple large language models including Anthropic Claude and OpenAI GPT OpenClaw is built on Node.js and can be quickly installed and deployed via npm. 2. Security Advantages of Running OpenClaw on Azure VM Deploying OpenClaw on an Azure virtual machine instead of your personal computer offers significant security benefits: 1. Environment Isolation Azure VMs provide a completely isolated runtime environment. Even if the AI agent exhibits abnormal behavior or is maliciously exploited, it won't affect your personal computer or local data. This isolation mechanism forms the foundation of a zero-trust security architecture. 2. Network Security Controls Through Azure Network Security Groups (NSGs), you can precisely control which IP addresses can access your virtual machine. The RDP rules configured in the deployment script allow you to securely connect to your Windows 11 VM via Remote Desktop while enabling further restrictions on access sources. 3. Data Persistence and Backup Azure VM managed disks support automatic snapshots and backups. Even if the virtual machine encounters issues, your OpenClaw configuration and data remain safe. 4. Elastic Resource Management You can adjust VM specifications (memory, CPU) at any time based on actual needs, or stop the VM when not in use to save costs, maintaining maximum flexibility. 5. Enterprise-Grade Authentication Azure supports integration with Azure Active Directory (Entra ID) for identity verification, allowing you to assign different access permissions to team members for granular access control. 6. Audit and Compliance Azure provides detailed activity logs and audit trails, making it easy to trace any suspicious activity and meet enterprise compliance requirements. 3. Deployment Steps Explained This deployment script uses Azure CLI to automate the installation of OpenClaw and its dependencies on a Windows 11 virtual machine. Here are the detailed execution steps: Prerequisites Before running the script, ensure you have: Install Azure CLI # Windows users can download the MSI installer https://aka.ms/installazurecliwindows # macOS users brew install azure-cli # Linux users curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash 2. Log in to Azure Account az login 3. Prepare Deployment Script Save the provided deploy-windows11-vm.sh script locally and grant execute permissions: chmod +x deploy-windows11-vm.sh Step 1: Configure Deployment Parameters The script begins by defining key configuration variables that you can modify as needed: RESOURCE_GROUP="Your Azure Resource Group Name" # Resource group name VM_NAME="win11-openclaw-vm" # Virtual machine name LOCATION="Your Azure Regison Name" # Azure region ADMIN_USERNAME="Your Azure VM Administrator Name" # Administrator username ADMIN_PASSWORD="our Azure VM Administrator Password" # Administrator password (change to a strong password) VM_SIZE="Your Azure VM Size" # VM size (4GB memory) Security Recommendations: Always change ADMIN_PASSWORD to your own strong password Passwords should contain uppercase and lowercase letters, numbers, and special characters Never commit scripts containing real passwords to code repositories Step 2: Check and Create Resource Group The script first checks if the specified resource group exists, and creates it automatically if it doesn't: echo "Checking resource group $RESOURCE_GROUP..." az group show --name $RESOURCE_GROUP &> /dev/null if [ $? -ne 0 ]; then echo "Creating resource group $RESOURCE_GROUP..." az group create --name $RESOURCE_GROUP --location $LOCATION fi A resource group is a logical container in Azure used to organize and manage related resources. All associated resources (VMs, networks, storage, etc.) will be created within this resource group. Step 3: Create Windows 11 Virtual Machine This is the core step, using the az vm create command to create a Windows 11 Pro virtual machine: az vm create \ --resource-group $RESOURCE_GROUP \ --name $VM_NAME \ --image MicrosoftWindowsDesktop:windows-11:win11-24h2-pro:latest \ --size $VM_SIZE \ --admin-username $ADMIN_USERNAME \ --admin-password $ADMIN_PASSWORD \ --public-ip-sku Standard \ --nsg-rule RDP Parameter Explanations: --image: Uses the latest Windows 11 24H2 Professional edition image --size: Standard_B2s provides 2 vCPUs and 4GB memory, suitable for running OpenClaw --public-ip-sku Standard: Assigns a standard public IP --nsg-rule RDP: Automatically creates network security group rules allowing RDP (port 3389) inbound traffic Step 4: Retrieve Virtual Machine Public IP After VM creation completes, the script retrieves its public IP address: PUBLIC_IP=$(az vm show -d -g $RESOURCE_GROUP -n $VM_NAME --query publicIps -o tsv) echo "VM Public IP: $PUBLIC_IP" This IP address will be used for subsequent RDP remote connections. Step 5: Install Chocolatey Package Manager Using az vm run-command to execute PowerShell scripts inside the VM, first installing Chocolatey: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString( 'https://community.chocolatey.org/install.ps1'))" Chocolatey is a package manager for Windows, similar to apt or yum on Linux, simplifying subsequent software installations. Step 6: Install Git Git is a dependency for many npm packages, especially those that need to download source code from GitHub for compilation: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install git -y" Step 7: Install CMake and Visual Studio Build Tools Some of OpenClaw's native modules require compilation, necessitating the installation of C++ build toolchain: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install cmake visualstudio2022buildtools visualstudio2022-workload-vctools -y" Component Descriptions: cmake: Cross-platform build system visualstudio2022buildtools: VS 2022 Build Tools visualstudio2022-workload-vctools: C++ development toolchain Step 8: Install Node.js LTS Install the Node.js Long Term Support version, which is the core runtime environment for OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); C:\ProgramData\chocolatey\bin\choco.exe install nodejs-lts -y" The script refreshes environment variables first to ensure Chocolatey is in the PATH, then installs Node.js LTS. Step 9: Globally Install OpenClaw Use npm to globally install OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); npm install -g openclaw" Global installation makes the openclaw command available from anywhere in the system. Step 10: Configure Environment Variables Add Node.js and npm global paths to the system PATH environment variable: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts " $npmGlobalPath = 'C:\Program Files\nodejs'; $npmUserPath = [System.Environment]::GetFolderPath('ApplicationData') + '\npm'; $currentPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine'); if ($currentPath -notlike \"*$npmGlobalPath*\") { $newPath = $currentPath + ';' + $npmGlobalPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added Node.js path to system PATH'; } if ($currentPath -notlike \"*$npmUserPath*\") { $newPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine') + ';' + $npmUserPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added npm global path to system PATH'; } Write-Host 'Environment variables updated successfully!'; " This ensures that node, npm, and openclaw commands can be used directly even in new terminal sessions. Step 11: Verify Installation The script finally verifies that all software is correctly installed: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); Write-Host 'Node.js version:'; node --version; Write-Host 'npm version:'; npm --version; Write-Host 'openclaw:'; npm list -g openclaw" Successful output should look similar to: Node.js version: v20.x.x npm version: 10.x.x openclaw: openclaw@x.x.x Step 12: Connect to Virtual Machine After deployment completes, the script outputs connection information: ============================================ Deployment completed! ============================================ Resource Group: Your Azure Resource Group Name VM Name: win11-openclaw-vm Public IP: xx.xx.xx.xx Admin Username: Your Administrator UserName VM Size: Your VM Size Connect via RDP: mstsc /v:xx.xx.xx.xx ============================================ Connection Methods: Windows Users: Press Win + R to open Run dialog Enter mstsc /v:public_ip and press Enter Log in using the username and password set in the script macOS Users: Download "Windows App" from the App Store Add PC connection with the public IP Log in using the username and password set in the script Linux Users: # Use Remmina or xfreerdp xfreerdp /u:username /v:public_ip Step 13: Initialize OpenClaw After connecting to the VM, run the following in PowerShell or Command Prompt # Initialize OpenClaw openclaw onboard # Configure AI model API key # Edit configuration file: C:\Users\username\.openclaw\openclaw.json notepad $env:USERPROFILE\.openclaw\openclaw.json Add your AI API key in the configuration file: { "agents": { "defaults": { "model": "Your Model Name", "apiKey": "your-api-key-here" } } } Step 14: Start OpenClaw # Start Gateway service openclaw gateway # In another terminal, connect messaging channels (e.g., WhatsApp) openclaw channels login Follow the prompts to scan the QR code and connect OpenClaw to your messaging app. 4. Summary Through this guide, we've successfully implemented the complete process of automatically deploying OpenClaw on an Azure Windows 11 virtual machine. The entire deployment process is highly automated, completing everything from VM creation to installing all dependencies and OpenClaw itself through a single script. Key Takeaways Automation Benefits: Using az vm run-command allows executing configuration scripts immediately after VM creation without manual RDP login Dependency Management: Chocolatey simplifies the Windows package installation workflow Environment Isolation: Running AI agents on cloud VMs protects local computers and data Scalability: Scripted deployment facilitates replication and team collaboration, easily deploying multiple instances Cost Optimization Tips Standard_B2s VMs cost approximately $0.05/hour (~$37/month) on pay-as-you-go pricing When not in use, stop the VM to only pay for storage costs Consider Azure Reserved Instances to save up to 72% Security Hardening Recommendations Change Default Port: Modify RDP port from 3389 to a custom port Enable JIT Access: Use Azure Security Center's just-in-time access feature Configure Firewall Rules: Only allow specific IP addresses to access Regular System Updates: Enable automatic Windows Updates Use Azure Key Vault: Store API keys in Key Vault instead of configuration files 5. Additional Resources Official Documentation OpenClaw Website: https://openclaw.ai OpenClaw GitHub: https://github.com/openclaw/openclaw OpenClaw Documentation: https://docs.openclaw.ai Azure CLI Documentation: https://docs.microsoft.com/cli/azure/ Azure Resources Azure VM Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ Azure Free Account: https://azure.microsoft.com/free/ (new users receive $200 credit) Azure Security Center: https://azure.microsoft.com/services/security-center/ Azure Key Vault: https://azure.microsoft.com/services/key-vault/A Visual Introduction To Azure Fundamentals
Are you a visual learner? Do you like to see "the big picture" before you dive into details? Does seeing visual notes or metaphors help you understand new concepts better, and retain or recall them more effectively? Then this is for you. A Visual Introduction To Azure Fundamentals - the first in a series of visualized modules that I hope will be helpful for anyone exploring Azure Fundamentals, or preparing for the AZ-900 exam! Want to learn more? Check out this accompanying article at A Cloud Guru! Have questions, or want to see other modules visualized similarly? Leave me a comment on this post!12KViews7likes4CommentsHow to Build Safe Natural Language-Driven APIs
TL;DR Building production natural language APIs requires separating semantic parsing from execution. Use LLMs to translate user text into canonical structured requests (via schemas), then execute those requests deterministically. Key patterns: schema completion for clarification, confidence gates to prevent silent failures, code-based ontologies for normalization, and an orchestration layer. This keeps language as input, not as your API contract. Introduction APIs that accept natural language as input are quickly becoming the norm in the age of agentic AI apps and LLMs. From search and recommendations to workflows and automation, users increasingly expect to "just ask" and get results. But treating natural language as an API contract introduces serious risks in production systems: Nondeterministic behavior Prompt-driven business logic Difficult debugging and replay Silent failures that are hard to detect In this post, I'll describe a production-grade architecture for building safe, natural language-driven APIs: one that embraces LLMs for intent discovery and entity extraction while preserving the determinism, observability, and reliability that backend systems require. This approach is based on building real systems using Azure OpenAI and LangGraph, and on lessons learned the hard way. The Core Problem with Natural Language APIs Natural language is an excellent interface for humans. It is a poor interface for systems. When APIs accept raw text directly and execute logic based on it, several problems emerge: The API contract becomes implicit and unversioned Small prompt changes cause behavioral changes Business logic quietly migrates into prompts In short: language becomes the contract, and that's fragile. The solution is not to avoid natural language, but to contain it. A Key Principle: Natural Language Is Input, Not a Contract So how do we contain it? The answer lies in treating natural language fundamentally differently than we treat traditional API inputs. The most important design decision we made was this: Natural language should be translated into structure, not executed directly. That single principle drives the entire architecture. Instead of building "chatty APIs," we split responsibilities clearly: Natural language is used for intent discovery and entity extraction Structured data is used for execution Two Explicit API Layers This principle translates into a concrete architecture with two distinct API layers, each with a single, clear responsibility. 1. Semantic Parse API (Natural Language → Structure) This API: Accepts user text Extracts intent and entities using LLMs Completes a predefined schema Asks clarifying questions when required Returns a canonical, structured request Does not execute business logic Think of this as a compiler, not an engine. 2. Structured Execution API (Structure → Action) This API: Accepts only structured input Calls downstream systems to process the request and get results Is deterministic and versioned Contains no natural language handling Is fully testable and replayable This is where execution happens. Why This Separation Matters Separating these layers gives you: A stable, versionable API contract Freedom to improve NLP without breaking clients Clear ownership boundaries Deterministic execution paths Most importantly, it prevents LLM behavior from leaking into core business logic. Canonical Schemas Are the Backbone Now that we've established the two-layer architecture, let's dive into what makes it work: canonical schemas. Each supported intent is defined by a canonical schema that lives in code. Example (simplified): This schema is used when a user is looking for similar product recommendations. The entities capture which product to use as reference and how to bias the recommendations toward price or quality. { "intent": "recommend_similar", "entities": { "reference_product_id": "string", "price_bias": "number (-1 to 1)", "quality_bias": "number (-1 to 1)" } } Schemas define: Required vs optional fields Allowed ranges and types Validation rules They are the contract, not the prompt. When a user says "show me products like the blue backpack but cheaper", the LLM extracts: Intent: recommend_similar reference_product_id: "blue_backpack_123" price_bias: -0.8 (strongly prefer cheaper) quality_bias: 0.0 (neutral) The schema ensures that even if the user phrased it as "find alternatives to item 123 with better pricing" or "cheaper versions of that blue bag", the output is always the same structure. The natural language variation is absorbed at the semantic layer. The execution layer receives a consistent, validated request every time. This decoupling is what makes the system maintainable. Schema Completion, Not Free-Form Chat But what happens when the user's input doesn't contain all the information needed to complete the schema? This is where structured clarification comes in. A common misconception is that clarification means "chatting until it feels right." In production systems, clarification is schema completion. If required fields are missing or ambiguous, the semantic API responds with: What information is missing A targeted clarification question The current schema state Example response: { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } The state object is the memory. The API itself remains stateless. A Complete Conversation Flow To illustrate how schema completion works in practice, here's a full conversation flow where the user's initial request is missing required information: Initial Request: User: "Show me cheaper alternatives with good quality" API Response (needs clarification): { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } Follow-up Request: User: "The blue backpack" Client sends: { "user_input": "The blue backpack", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } API Response (complete): { "status": "complete", "canonical_request": { "intent": "recommend_similar", "entities": { "reference_product_id": "blue_backpack_123", "price_bias": -0.3, "quality_bias": 0.4 } } } The client passes the state back with each clarification. The API remains stateless, while the client manages the conversation context. Once complete, the canonical_request can be sent directly to the execution API. Why LangGraph Fits This Problem Perfectly With schemas and clarification flows defined, we need a way to orchestrate the semantic parsing workflow reliably. This is where LangGraph becomes valuable. LangGraph allows semantic parsing to be modeled as a structured, deterministic workflow with explicit decision points: Classify intent: Determine what the user wants to do from a predefined set of supported actions Extract candidate entities: Pull out relevant parameters from the natural language input using the LLM Merge into schema state: Map the extracted values into the canonical schema structure Validate required fields: Check if all mandatory fields are present and values are within acceptable ranges Either complete or request clarification: Return the canonical request if complete, or ask a targeted question if information is missing Each node has a single responsibility. Validation and routing are done in code, not by the LLM. LangGraph provides: Explicit state transitions Deterministic routing Observable execution Safe retries Used this way, it becomes a powerful orchestration tool, not a conversational agent. Confidence Gates Prevent Silent Failures Structured workflows handle the process, but there's another critical safety mechanism we need: knowing when the LLM isn't confident about its extraction. Even when outputs are structurally valid, they may not be reliable. We require the semantic layer to emit a confidence score. If confidence falls below a threshold, execution is blocked and clarification is requested. This simple rule eliminates an entire class of silent misinterpretations that are otherwise very hard to detect. Example: When a user says "Show me items similar to the bag", the LLM might extract: { "intent": "recommend_similar", "confidence": 0.55, "entities": { "reference_product_id": "generic_bag_001", "confidence_scores": { "reference_product_id": 0.4 } } } The overall confidence is low (0.55), and the entity confidence for reference_product_id is very low (0.4) because "the bag" is ambiguous. There might be hundreds of bags in the catalog. Instead of proceeding with a potentially wrong guess, the API responds: { "status": "needs_clarification", "reason": "low_confidence", "question": "I found multiple bags. Did you mean the blue backpack, the leather tote, or the travel duffel?", "confidence": 0.55 } This prevents the system from silently executing the wrong recommendation and provides a better user experience. Lightweight Ontologies (Keep Them in Code) Beyond confidence scoring, we need a way to normalize the variety of terms users might use into consistent canonical values. We also introduced lightweight, code-level ontologies: Allowed intents Required entities per intent Synonym-to-canonical mappings Cross-field validation rules These live in code and configuration, not in prompts. LLMs propose values. Code enforces meaning. Example: Consider these user inputs that all mean the same thing: "Show me cheaper options" "Find budget-friendly alternatives" "I want something more affordable" "Give me lower-priced items" The LLM might extract different values: "cheaper", "budget-friendly", "affordable", "lower-priced". The ontology maps all of these to a canonical value: PRICE_BIAS_SYNONYMS = { "cheaper": -0.7, "budget-friendly": -0.7, "affordable": -0.7, "lower-priced": -0.7, "expensive": 0.7, "premium": 0.7, "high-end": 0.7 } When the LLM extracts "budget-friendly", the code normalizes it to -0.7 for the price_bias field. Similarly, cross-field validation catches logical inconsistencies: if entities["price_bias"] < -0.5 and entities["quality_bias"] > 0.5: return clarification("You want cheaper items with higher quality. This might be difficult. Should I prioritize price or quality?") The LLM proposes. The ontology normalizes. The validation enforces business rules. What About Latency? A common concern with multi-step semantic parsing is performance. In practice, we observed: Intent classification: ~40 ms Entity extraction: ~200 ms Validation and routing: ~1 ms Total overhead: ~250–300 ms. For chat-driven user experiences, this is well within acceptable bounds and far cheaper than incorrect or inconsistent execution. Key Takeaways Let's bring it all together. If you're building APIs that accept natural language in production: Do not make language your API contract Translate language into canonical structure Own schema completion server-side Use LLMs for discovery and extraction, not execution Treat safety and determinism as first-class requirements Natural language is an input format. Structure is the contract. Closing Thoughts LLMs make it easy to build impressive demos. Building safe, reliable systems with them requires discipline. By separating semantic interpretation from execution, and by using tools like Azure OpenAI and LangGraph thoughtfully, you can build natural language-driven APIs that scale, evolve, and behave predictably in production. Hopefully, this architecture saves you a few painful iterations.The Perfect Fusion of GitHub Copilot SDK and Cloud Native
In today's rapidly evolving AI landscape, we've witnessed the transformation from simple chatbots to sophisticated agent systems. As a developer and technology evangelist, I've observed an emerging trend—it's not about making AI omnipotent, but about enabling each AI Agent to achieve excellence in specific domains. Today, I want to share an exciting technology stack: GitHub Copilot SDK (a development toolkit that embeds production-grade agent engines into any application) + Agent-to-Agent (A2A) Protocol (a communication standard enabling standardized agent collaboration) + Cloud Native Deployment (the infrastructure foundation for production systems). Together, these three components enable us to build truly collaborative multi-agent systems. 1. From AI Assistants to Agent Engines: Redefining Capability Boundaries Traditional AI assistants often pursue "omnipotence"—attempting to answer any question you throw at them. However, in real production environments, this approach faces serious challenges: Inconsistent Quality: A single model trying to write code, perform data analysis, and generate creative content struggles to achieve professional standards in each domain Context Pollution: Mixing prompts from different tasks leads to unstable model outputs Difficult Optimization: Adjusting prompts for one task type may negatively impact performance on others High Development Barrier: Building agents from scratch requires handling planning, tool orchestration, context management, and other complex logic GitHub proposed a revolutionary approach—instead of forcing developers to build agent frameworks from scratch, provide a production-tested, programmable agent engine. This is the core value of the GitHub Copilot SDK. Evolution from Copilot CLI to SDK GitHub Copilot CLI is a powerful command-line tool that can: Plan projects and features Modify files and execute commands Use custom agents Delegate tasks to cloud execution Integrate with MCP servers The GitHub Copilot SDK extracts the agentic core behind Copilot CLI and offers it as a programmable layer for any application. This means: You're no longer confined to terminal environments You can embed this agent engine into GUI applications, web services, and automation scripts You gain access to the same execution engine validated by millions of users Just like in the real world, we don't expect one person to be a doctor, lawyer, and engineer simultaneously. Instead, we provide professional tools and platforms that enable professionals to excel in their respective domains. 2. GitHub Copilot SDK: Embedding Copilot CLI's Agentic Core into Any App Before diving into multi-agent systems, we need to understand a key technology: GitHub Copilot SDK. What is GitHub Copilot SDK? GitHub Copilot SDK (now in technical preview) is a programmable agent execution platform. It allows developers to embed the production-tested agentic core from GitHub Copilot CLI directly into any application. Simply put, the SDK provides: Out-of-the-box Agent Loop: No need to build planners, tool orchestration, or context management from scratch Multi-model Support: Choose different AI models (like GPT-4, Claude Sonnet) for different task phases Tool and Command Integration: Built-in file editing, command execution, and MCP server integration capabilities Streaming Real-time Responses: Support for progress updates on long-running tasks Multi-language Support: SDKs available for Node.js, Python, Go, and .NET Why is the SDK Critical for Building Agents? Building an agentic workflow from scratch is extremely difficult. You need to handle: Context management across multiple conversation turns Orchestration of tools and commands Routing between different models MCP server integration Permission control, safety boundaries, and error handling GitHub Copilot SDK abstracts away all this underlying complexity. You only need to focus on: Defining agent professional capabilities (through Skill files) Providing domain-specific tools and constraints Implementing business logic SDK Usage Examples Python Example (from actual project implementation): from copilot import CopilotClient # Initialize client copilot_client = CopilotClient() await copilot_client.start() # Create session and load Skill session = await copilot_client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": ["/path/to/skills/blog/SKILL.md"] }) # Send task await session.send_and_wait({ "prompt": "Write a technical blog about multi-agent systems" }, timeout=600) Skill System: Professionalizing Agents While the SDK provides a powerful execution engine, how do we make agents perform professionally in specific domains? The answer is Skill files. A Skill file is a standardized capability definition containing: Capability Declaration: Explicitly tells the system "what I can do" (e.g., blog generation, PPT creation) Domain Knowledge: Preset best practices, standards, and terminology guidelines Workflow: Defines the complete execution path from input to output Output Standards: Ensures generated content meets format and quality requirements Through the combination of Skill files + SDK, we can build truly professional agents rather than generic "jack-of-all-trades assistants." 3. A2A Protocol: Enabling Seamless Agent Collaboration Once we have professional agents, the next challenge is: how do we make them work together? This is the core problem the Agent-to-Agent (A2A) Protocol aims to solve. Three Core Mechanisms of A2A Protocol 1. Agent Discovery (Service Discovery) Each agent exposes its capability card through the standardized /.well-known/agent-card.json endpoint, acting like a business card that tells other agents "what I can do": { "name": "blog_agent", "description": "Blog generation with DeepSearch", "primaryKeywords": ["blog", "article", "write"], "skills": [{ "id": "blog_generation", "tags": ["blog", "writing"], "examples": ["Write a blog about..."] }], "capabilities": { "streaming": true } } 2. Intelligent Routing The Orchestrator matches tasks with agent capabilities through scoring. The project's routing algorithm implements keyword matching and exclusion detection: Positive Matching: If a task contains an agent's primaryKeywords, score +0.5 Negative Exclusion: If a task contains other agents' keywords, score -0.3 This way, when users say "write a blog about cloud native," the system automatically selects the Blog Agent; when they say "create a tech presentation PPT," it routes to the PPT Agent. 3. SSE Streaming (Real-time Streaming) For time-consuming tasks (like generating a 5000-word blog), A2A uses Server-Sent Events to push real-time progress, allowing users to see the agent working instead of just waiting. This is crucial for user experience. 4. Cloud Native Deployment: Making Agent Systems Production-Ready Even the most powerful technology is just a toy if it can't be deployed to production environments. This project demonstrates a complete deployment of a multi-agent system to a cloud-native platform (Azure Container Apps). Why Choose Cloud Native? Elastic Scaling: When blog generation requests surge, the Blog Agent can auto-scale; it scales down to zero during idle times to save costs Independent Evolution: Each agent has its own Docker image and deployment pipeline; updating the Blog Agent doesn't affect the PPT Agent Fault Isolation: If one agent crashes, it won't bring down the entire system; the Orchestrator automatically degrades Global Distribution: Through Azure Container Apps, agents can be deployed across multiple global regions to reduce latency Container Deployment Essentials Each agent in the project has a standardized Dockerfile: FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8001 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"] Combined with the deploy-to-aca.sh script, one-click deployment to Azure: # Build and push image az acr build --registry myregistry --image blog-agent:latest . # Deploy to Container Apps az containerapp create \ --name blog-agent \ --resource-group my-rg \ --environment my-env \ --image myregistry.azurecr.io/blog-agent:latest \ --secrets github-token=$COPILOT_TOKEN \ --env-vars COPILOT_GITHUB_TOKEN=secretref:github-token 5. Real-World Results: From "Works" to "Works Well" Let's see how this system performs in real scenarios. Suppose a user initiates a request: "Write a technical blog about Kubernetes multi-tenancy security, including code examples and best practices" System Execution Flow: Orchestrator receives the request and scans all agents' capability cards Keyword matching: "write" + "blog" → Blog Agent scores 1.0, PPT Agent scores 0.0 Routes to Blog Agent, loads technical writing Skill Blog Agent initiates DeepSearch to collect latest K8s security materials SSE real-time push: "Collecting materials..." → "Generating outline..." → "Writing content..." Returns complete blog after 5 minutes, including code highlighting, citation sources, and best practices summary Compared to traditional "omnipotent" AI assistants, this system's advantages: ✅ Professionalism: Blog Agent trained with technical writing Skills produces content with clear structure, accurate terminology, and executable code ✅ Visibility: Users see progress throughout, knowing what the AI is doing ✅ Extensibility: Adding new agents (video script, data analysis) in the future requires no changes to existing architecture 6. Key Technical Challenges and Solutions Challenge 1: Inaccurate Agent Capability Descriptions Leading to Routing Errors Solution: Define clear primaryKeywords and examples in Agent Cards Implement exclusion detection mechanism to prevent tasks from being routed to unsuitable agents Challenge 2: Poor User Experience for Long-Running Tasks Solution: Fully adopt SSE streaming, pushing working/completed/error status in real-time Display progress hints in status messages so users know what the system is doing Challenge 3: Sensitive Information Leakage Risk Solution: Use Azure Key Vault or Container Apps Secrets to manage GitHub Tokens Inject via environment variables, never hardcode in code or images Check required environment variables in deployment scripts to prevent configuration errors 7. Future Outlook: SDK-Driven Multi-Agent Ecosystem This project is just the beginning. As GitHub Copilot SDK and A2A Protocol mature, we can build richer agent ecosystems: Actual SDK Application Scenarios According to GitHub's official blog, development teams have already used the Copilot SDK to build: YouTube Chapter Generator: Automatically generates timestamped chapter markers for videos Custom Agent GUIs: Visual agent interfaces for specific business scenarios Speech-to-Command Workflows: Control desktop applications through voice AI Battle Games: Interactive competitive experiences with AI Intelligent Summary Tools: Automatic extraction and summarization of key information Multi-Agent System Evolution Directions 🏪 Agent Marketplace: Developers can publish specialized agents (legal documents, medical reports, etc.) that plug-and-play via A2A protocol 🔗 Cascade Orchestration: Orchestrator automatically breaks down complex tasks, calling multiple agents collaboratively (e.g., "write blog + generate images + create PPT") 🌐 Cross-Platform Interoperability: Based on A2A standards, agents developed by different companies can call each other, breaking down data silos ⚙️ Automated Workflows: Delegate routine repetitive work to agent chains, letting humans focus on creative work 🎯 Vertical Domain Specialization: Combined with Skill files, build high-precision agents in professional fields like finance, healthcare, and legal Core Value of the SDK The significance of GitHub Copilot SDK lies in: it empowers every developer to become a builder of agent systems. You don't need deep learning experts, you don't need to implement agent frameworks yourself, and you don't even need to manage GPU clusters. You only need to: Install the SDK (npm install github/copilot-sdk) Define your business logic and tools Write Skill files describing professional capabilities Call the SDK's execution engine And you can build production-grade intelligent agent applications. Summary: From Demo to Production GitHub Copilot SDK + A2A + Cloud Native isn't three independent technology stacks, but a complete methodology: GitHub Copilot SDK provides an out-of-the-box agent execution engine—handling planning, tool orchestration, context management, and other underlying complexity Skill files enable agents with domain-specific professional capabilities—defining best practices, workflows, and output standards A2A Protocol enables standardized communication and collaboration between agents—implementing service discovery, intelligent routing, and streaming Cloud Native makes the entire system production-ready—containerization, elastic scaling, fault isolation For developers, this means we no longer need to build agent frameworks from scratch or struggle with the black magic of prompt engineering. We only need to: Use GitHub Copilot SDK to obtain a production-grade agent execution engine Write domain-specific Skill files to define professional capabilities Follow A2A protocol to implement standard interfaces between agents Deploy to cloud platforms through containerization And we can build AI Agent systems that are truly usable, well-designed, and production-ready. 🚀 Start Building Complete project code is open source: https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotAgents_A2A Follow the README guide and deploy your first Multi-Agent system in 30 minutes! References GitHub Copilot SDK Official Announcement - Build an agent into any app with the GitHub Copilot SDK GitHub Copilot SDK Repository - github.com/github/copilot-sdk A2A Protocol Official Specification - a2a-protocol.org/latest/ Project Source Code - Multi-AI-Agents-Cloud-Native Azure Container Apps Documentation - learn.microsoft.com/azure/container-apps712Views0likes0Comments