tips and tricks

131 Topics

Optimising AI Costs with Microsoft Foundry Model Router
Microsoft Foundry Model Router analyses each prompt in real-time and forwards it to the most appropriate LLM from a pool of underlying models. Simple requests go to fast, cheap models; complex requests go to premium ones, all automatically. I built an interactive demo app so you can see the routing decisions, measure latencies, and compare costs yourself. This post walks through how it works, what we measured, and when it makes sense to use. The Problem: One Model for Everything Is Wasteful Traditional deployments force a single choice: Strategy Upside Downside Use a small model Fast, cheap Struggles with complex tasks Use a large model Handles everything Overpay for simple tasks Build your own router Full control Maintenance burden; hard to optimise Most production workloads are mixed-complexity. Classification, FAQ look-ups, and data extraction sit alongside code analysis, multi-constraint planning, and long-document summarisation. Paying premium-model prices for the simple 40% is money left on the table. The Solution: Model Router Model Router is a trained language model deployed as a single Azure endpoint. For each incoming request it: Analyses the prompt — complexity, task type, context length Selects an underlying model from the routing pool Forwards the request and returns the response Exposes the choice via the response.model field You interact with one deployment. No if/else routing logic in your code. Routing Modes Mode Goal Trade-off Balanced (default) Best cost-quality ratio General-purpose Cost Minimise spend May use smaller models more aggressively Quality Maximise accuracy Higher cost for complex tasks Modes are configured in the Foundry Portal, no code change needed to switch. Building the Demo To make routing decisions tangible, we built a React + TypeScript app that sends the same prompt through both Model Router and a fixed standard deployment (e.g. GPT-5-nano), then compares: Which model the router selected Latency (ms) Token usage (prompt + completion) Estimated cost (based on per-model pricing) Select a prompt, choose a routing mode, and hit Run Both to compare side-by-side What You Can Do 10 pre-built prompts spanning simple classification to complex multi-constraint planning Custom prompt input enter any text and benchmarks run automatically Three routing modes switch and re-run to see how distribution changes Batch mode run all 10 prompts in one click to gather aggregate stats API Integration The integration is a standard Azure OpenAI chat completion call. The only difference is the deployment name ( model-router instead of a specific model): const response = await fetch( `${endpoint}/openai/deployments/model-router/chat/completions?api-version=2024-10-21`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ messages: [{ role: 'user', content: prompt }], max_completion_tokens: 1024, }), } ); const data = await response.json(); // The key insight: response.model reveals the underlying model const selectedModel = data.model; // e.g. "gpt-5-nano-2025-08-07" That data.model field is what makes cost tracking and distribution analysis possible. Results: What the Data Shows We ran all 10 prompts through both Model Router (Balanced mode) and a fixed standard deployment. Note: Results vary by run, region, model versions, and Azure load. These numbers are from a representative sample run. Side-by-side comparison across all 10 prompts in Balanced mode Summary Metric Router (Balanced) Standard (GPT-5-nano) Avg Latency ~7,800 ms ~7,700 ms Total Cost (10 prompts) ~$0.029 ~$0.030 Cost Savings ~4.5% — Models Used 4 1 Model Distribution The router used 4 different models across 10 prompts: Model Requests Share Typical Use gpt-5-nano 5 50% Classification, summarisation, planning gpt-5-mini 2 20% FAQ answers, data extraction gpt-oss-120b 2 20% Long-context analysis, creative tasks gpt-4.1-mini 1 10% Complex debugging & reasoning Routing distribution chart — the router favours efficient models for simple prompts Across All Three Modes Metric Balanced Cost-Optimised Quality-Optimised Cost Savings ~4.5% ~4.7% ~14.2% Avg Latency (Router) ~7,800 ms ~7,800 ms ~6,800 ms Avg Latency (Standard) ~7,700 ms ~7,300 ms ~8,300 ms Primary Goal Balance cost + quality Minimise spend Maximise accuracy Model Selection Mixed (4 models) Prefers cheaper Prefers premium Cost-optimised mode — routes more aggressively to nano/mini models Quality-optimised mode — routes to larger models for complex tasks Analysis What Worked Well Intelligent distribution The router didn't just default to one model. It used 4 different models and mapped prompt complexity to model capability: simple classification → nano, FAQ answers → mini, long-context documents → oss-120b, complex debugging → 4.1-mini. Measurable cost savings across all modes 4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode. Quality mode was the surprise winner by choosing faster, cheaper models for simple prompts, it actually saved the most while still routing complex requests to capable models. Zero routing logic in application code One endpoint, one deployment name. The complexity lives in Azure's infrastructure, not yours. Operational flexibility Switch between Balanced, Cost, and Quality modes in the Foundry Portal without redeploying your app. Need to cut costs for a high-traffic period? Switch to Cost mode. Need accuracy for a compliance run? Switch to Quality. Future-proofing As Azure adds new models to the routing pool, your deployment benefits automatically. No code changes needed. Trade-offs to Consider Latency is comparable, not always faster In Balanced mode, Router averaged ~7,800 ms vs Standard's ~7,700 ms nearly identical. In Quality mode, the Router was actually faster (~6,800 ms vs ~8,300 ms) because it chose more efficient models for simple prompts. The delta depends on which models the router selects. Savings scale with workload diversity Our 10-prompt test set showed 4.5–14.2% savings. Production workloads with a wider spread of simple vs complex prompts should see larger savings, since the router has more opportunity to route simple requests to cheaper models. Opaque routing decisions You can see which model was picked via response.model , but you can't see why. For most applications this is fine; for debugging edge cases you may want to test specific prompts in the demo first. Custom Prompt Testing One of the most practical features of the demo is testing your own prompts before committing to Model Router in production. Enter any prompt `the quantum computing example is a medium-complexity educational prompt` Benchmarks execute automatically, showing the selected model, latency, tokens, and cost Workflow: Click ✏️ Custom in the prompt selector Enter your production-representative prompt Click ✓ Use This Prompt — Router and Standard run automatically Compare results — repeat with different routing modes Use the data to inform your deployment strategy This lets you predict costs and validate routing behaviour with your actual workload before going to production. When to Use Model Router Great Fit Mixed-complexity workloads — chatbots, customer service, content pipelines Cost-sensitive deployments — where even single-digit percentage savings matter at scale Teams wanting simplicity — one endpoint beats managing multi-model routing logic Rapid experimentation — try new models without changing application code Consider Carefully Ultra-low-latency requirements — if you need sub-second responses, the routing overhead matters Single-task, single-model workloads — if one model is clearly optimal for 100% of your traffic, a router adds complexity without benefit Full control over model selection — if you need deterministic model choice per request Mode Selection Guide Is accuracy critical (compliance, legal, medical)? Is accuracy critical (compliance, legal, medical)? └─ YES → Quality-Optimised └─ NO → Strict budget constraints? └─ YES → Cost-Optimised └─ NO → Balanced (recommended) Best Practices Start with Balanced mode — measure actual results, then optimise Test with your real prompts — use the Custom Prompt feature to validate routing before production Monitor model distribution — track which models handle your traffic over time Compare against a baseline — always keep a standard deployment to measure savings Review regularly — as new models enter the routing pool, distributions shift Technical Stack Technology Purpose React 19 + TypeScript 5.9 UI and type safety Vite 7 Dev server and build tool Tailwind CSS 4 Styling Recharts 3 Distribution and comparison charts Azure OpenAI API (2024-10-21) Model Router and standard completions Security measures include an ErrorBoundary for crash resilience, sanitised API error messages, AbortController request timeouts, input length validation, and restrictive security headers. API keys are loaded from environment variables and gitignored. Source: leestott/router-demo-app: An interactive web application demonstrating the power of Microsoft Foundry Model Router - an intelligent routing system that automatically selects the optimal language model for each request based on complexity, reasoning requirements, and task type. ⚠️ This demo calls Azure OpenAI directly from the browser. This is fine for local development. For production, proxy through a backend and use Managed Identity. Try It Yourself Quick Start git clone https://github.com/leestott/router-demo-app/ cd router-demo-app # Option A: Use the setup script (recommended) # Windows: .\setup.ps1 -StartDev # macOS/Linux: chmod +x setup.sh && ./setup.sh --start-dev # Option B: Manual npm install cp .env.example .env.local # Edit .env.local with your Azure credentials npm run dev Open http://localhost:5173 , select a prompt, and click ⚡ Run Both. Get Your Credentials Go to ai.azure.com → open your project Copy the Project connection string (endpoint URL) Navigate to Deployments → confirm model-router is deployed Get your API key from Project Settings → Keys Configuration Edit .env.local : VITE_ROUTER_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_ROUTER_API_KEY=your-api-key VITE_ROUTER_DEPLOYMENT=model-router VITE_STANDARD_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_STANDARD_API_KEY=your-api-key VITE_STANDARD_DEPLOYMENT=gpt-5-nano Ideas for Enhancement Historical analysis — persist results to track routing trends over time Cost projections — estimate monthly spend based on prompt patterns and volume A/B testing framework — compare modes with statistical significance Streaming support — show model selection for streaming responses Export reports — download benchmark data as CSV/JSON for further analysis Conclusion Model Router addresses a real problem: most AI workloads have mixed complexity, but most deployments use a single model. By routing each request to the right model automatically, you get: Cost savings (~4.5–14.2% measured across modes, scaling with volume) Intelligent distribution (4 models used, zero routing code) Operational simplicity (one endpoint, mode changes via portal) Future-proofing (new models added to the pool automatically) The latency trade-off is minimal — in Quality mode, the Router was actually faster than the standard deployment. The real value is flexibility: tune for cost, quality, or balance without touching your code. Ready to try it? Clone the demo repository, plug in your Azure credentials, and test with your own prompts. Resources Model Router Benchmark Sample Sample App Model Router Concepts Official documentation Model Router How-To Deployment guide Microsoft Foundry Portal Deploy and manage Model Router in the Catalog Model listing Azure OpenAI Managed Identity Production auth Built to explore Model Router and share findings with the developer community. Feedback and contributions welcome, open an issue or PR on GitHub.
Lee_Stott
Feb 27, 2026 Place Microsoft Developer Community Blog
183Views
0likes
0Comments
Exploring Azure Face API: Facial Landmark Detection and Real-Time Analysis with C#
In today’s world, applications that understand and respond to human facial cues are no longer science fiction—they’re becoming a reality in domains like security, driver monitoring, gaming, and AR/VR. With Azure Face API, developers can leverage powerful cloud-based facial recognition and analysis tools without building complex machine learning models from scratch. In this blog, we’ll explore how to use C# to detect faces, identify key facial landmarks, estimate head pose, track eye and mouth movements, and process real-time video streams. Using OpenCV for visualization, we’ll show how to overlay landmarks, draw bounding boxes, and calculate metrics like Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR)—all in real time. You'll learn to: Set up Azure Face API Detect 27 facial landmarks Estimate head pose (yaw, pitch, roll) Calculate eye aspect ratio (EAR) and mouth openness Draw bounding boxes around features using OpenCV Process real-time video Prerequisites .NET 8 SDK installed Azure subscription with Face API resource Visual Studio 2022 or later Webcam for testing (optional) Basic understanding of C# and computer vision concepts Part 1: Azure Face API Setup 1.1 Install Required NuGet Packages dotnet add package Azure.AI.Vision.Face dotnet add package OpenCvSharp4 dotnet add package OpenCvSharp4.runtime.win 1.2 Create Azure Face API Resource Navigate to Azure Portal Search for "Face" and create a new Face API resource Choose your pricing tier (Free tier: 20 calls/min, 30K calls/month) Copy the Endpoint URL and API Key 1.3 Configure in .NET Application appsettings.json: { "Azure": { "FaceApi": { "Endpoint": "https://your-resource.cognitiveservices.azure.com/", "ApiKey": "your-api-key-here" } } } Initialize Face Client: using Azure; using Azure.AI.Vision.Face; using Microsoft.Extensions.Configuration; public class FaceAnalysisService { private readonly FaceClient _faceClient; private readonly ILogger<FaceAnalysisService> _logger; public FaceAnalysisService(ILogger<FaceAnalysisService> logger, IConfiguration configuration) { _logger = logger; string endpoint = configuration["Azure:FaceApi:Endpoint"]; string apiKey = configuration["Azure:FaceApi:ApiKey"]; _faceClient = new FaceClient(new Uri(endpoint), new AzureKeyCredential(apiKey)); _logger.LogInformation("FaceClient initialized with endpoint: {Endpoint}", endpoint); } } Part 2: Understanding Face Detection Models 2.1 Basic Face Detection public async Task<List<FaceDetectionResult>> DetectFacesAsync(byte[] imageBytes) { using var stream = new MemoryStream(imageBytes); var response = await _faceClient.DetectAsync( BinaryData.FromStream(stream), FaceDetectionModel.Detection03, FaceRecognitionModel.Recognition04, returnFaceId: false, returnFaceAttributes: new FaceAttributeType[] { FaceAttributeType.HeadPose }, returnFaceLandmarks: true, returnRecognitionModel: false ); _logger.LogInformation("Detected {Count} faces", response.Value.Count); return response.Value.ToList(); } Part 3: Facial Landmarks - The 27 Key Points 3.1 Understanding Facial Landmarks 3.2 Accessing Landmarks in Code public void PrintLandmarks(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; if (landmarks == null) { _logger.LogWarning("No landmarks detected"); return; } // Eye landmarks Console.WriteLine($"Left Eye Outer: ({landmarks.EyeLeftOuter.X}, {landmarks.EyeLeftOuter.Y})"); Console.WriteLine($"Left Eye Inner: ({landmarks.EyeLeftInner.X}, {landmarks.EyeLeftInner.Y})"); Console.WriteLine($"Left Eye Top: ({landmarks.EyeLeftTop.X}, {landmarks.EyeLeftTop.Y})"); Console.WriteLine($"Left Eye Bottom: ({landmarks.EyeLeftBottom.X}, {landmarks.EyeLeftBottom.Y})"); // Mouth landmarks Console.WriteLine($"Upper Lip Top: ({landmarks.UpperLipTop.X}, {landmarks.UpperLipTop.Y})"); Console.WriteLine($"Under Lip Bottom: ({landmarks.UnderLipBottom.X}, {landmarks.UnderLipBottom.Y})"); // Nose landmarks Console.WriteLine($"Nose Tip: ({landmarks.NoseTip.X}, {landmarks.NoseTip.Y})"); } 3.3 Visualizing All Landmarks public void DrawAllLandmarks(FaceLandmarks landmarks, Mat frame) { void DrawPoint(FaceLandmarkCoordinate point, Scalar color) { if (point != null) { Cv2.Circle(frame, new Point((int)point.X, (int)point.Y), radius: 3, color: color, thickness: -1); } } // Eyes (Green) DrawPoint(landmarks.EyeLeftOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftBottom, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightBottom, new Scalar(0, 255, 0)); // Eyebrows (Cyan) DrawPoint(landmarks.EyebrowLeftOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowLeftInner, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightInner, new Scalar(255, 255, 0)); // Nose (Yellow) DrawPoint(landmarks.NoseTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootLeft, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootRight, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseLeftAlarOutTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRightAlarOutTip, new Scalar(0, 255, 255)); // Mouth (Blue) DrawPoint(landmarks.UpperLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UpperLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthLeft, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthRight, new Scalar(255, 0, 0)); // Pupils (Red) DrawPoint(landmarks.PupilLeft, new Scalar(0, 0, 255)); DrawPoint(landmarks.PupilRight, new Scalar(0, 0, 255)); } Part 4: Drawing Bounding Boxes Around Features 4.1 Eye Bounding Boxes /// <summary> /// Draws rectangles around eyes using OpenCV. /// </summary> public void DrawEyeBoxes(FaceLandmarks landmarks, Mat frame) { int boxWidth = 60; int boxHeight = 35; // Calculate Rectangles var leftEyeRect = new Rect((int)landmarks.EyeLeftOuter.X - boxWidth / 2, (int)landmarks.EyeLeftOuter.Y - boxHeight / 2, boxWidth, boxHeight); var rightEyeRect = new Rect((int)landmarks.EyeRightOuter.X - boxWidth / 2, (int)landmarks.EyeRightOuter.Y - boxHeight / 2, boxWidth, boxHeight); // Draw Rectangles (Green in BGR) Cv2.Rectangle(frame, leftEyeRect, new Scalar(0, 255, 0), 2); Cv2.Rectangle(frame, rightEyeRect, new Scalar(0, 255, 0), 2); // Add Labels Cv2.PutText(frame, "Left Eye", new Point(leftEyeRect.X, leftEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); Cv2.PutText(frame, "Right Eye", new Point(rightEyeRect.X, rightEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); } 4.2 Mouth Bounding Box /// <summary> /// Draws rectangle around mouth region. /// </summary> public void DrawMouthBox(FaceLandmarks landmarks, Mat frame) { int boxWidth = 80; int boxHeight = 50; // Calculate center based on the vertical lip landmarks int centerX = (int)((landmarks.UpperLipTop.X + landmarks.UnderLipBottom.X) / 2); int centerY = (int)((landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2); var mouthRect = new Rect(centerX - boxWidth / 2, centerY - boxHeight / 2, boxWidth, boxHeight); // Draw Mouth Box (Blue in BGR) Cv2.Rectangle(frame, mouthRect, new Scalar(255, 0, 0), 2); // Add Label Cv2.PutText(frame, "Mouth", new Point(mouthRect.X, mouthRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(255, 0, 0), 1); } 4.3 Face Bounding Box /// <summary> /// Draws rectangle around entire face using the face rectangle from API. /// </summary> public void DrawFaceBox(FaceDetectionResult face, Mat frame) { var faceRect = face.FaceRectangle; if (faceRect == null) { return; } var rect = new Rect( faceRect.Left, faceRect.Top, faceRect.Width, faceRect.Height ); // Draw Face Bounding Box (Red in BGR) Cv2.Rectangle(frame, rect, new Scalar(0, 0, 255), 2); // Add Label with dimensions Cv2.PutText(frame, $"Face {faceRect.Width}x{faceRect.Height}", new Point(rect.X, rect.Y - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(0, 0, 255), 2); } 4.4 Nose Bounding Box /// <summary> /// Draws bounding box around nose using nose landmarks. /// </summary> public void DrawNoseBox(FaceLandmarks landmarks, Mat frame) { // Calculate horizontal bounds from Alar tips int minX = (int)Math.Min(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); int maxX = (int)Math.Max(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); // Calculate vertical bounds from Root to Tip int minY = (int)Math.Min(landmarks.NoseRootLeft.Y, landmarks.NoseTip.Y); int maxY = (int)landmarks.NoseTip.Y; // Create Rect with a 10px padding buffer var noseRect = new Rect( minX - 10, minY - 10, (maxX - minX) + 20, (maxY - minY) + 20 ); // Draw Nose Box (Yellow in BGR) Cv2.Rectangle(frame, noseRect, new Scalar(0, 255, 255), 2); } Part 5: Geometric Calculations with Landmarks 5.1 Calculating Euclidean Distance /// <summary> /// Calculates distance between two landmark points. /// </summary> public static double CalculateDistance(dynamic point1, dynamic point2) { double dx = point1.X - point2.X; double dy = point1.Y - point2.Y; return Math.Sqrt(dx * dx + dy * dy); } 5.2 Eye Aspect Ratio (EAR) Formula /// <summary> /// Calculates the Eye Aspect Ratio (EAR) to detect eye closure. /// </summary> public double CalculateEAR( FaceLandmarkCoordinate top1, FaceLandmarkCoordinate top2, FaceLandmarkCoordinate bottom1, FaceLandmarkCoordinate bottom2, FaceLandmarkCoordinate inner, FaceLandmarkCoordinate outer) { // Vertical distances double v1 = CalculateDistance(top1, bottom1); double v2 = CalculateDistance(top2, bottom2); // Horizontal distance double h = CalculateDistance(inner, outer); // EAR formula: (||p2-p6|| + ||p3-p5||) / (2 * ||p1-p4||) return (v1 + v2) / (2.0 * h); } Simplified Implementation: /// <summary> /// Calculates Eye Aspect Ratio (EAR) for a single eye. /// Reference: "Real-Time Eye Blink Detection using Facial Landmarks" (Soukupová & Čech, 2016) /// </summary> public double ComputeEAR(FaceLandmarks landmarks, bool isLeftEye) { var top = isLeftEye ? landmarks.EyeLeftTop : landmarks.EyeRightTop; var bottom = isLeftEye ? landmarks.EyeLeftBottom : landmarks.EyeRightBottom; var inner = isLeftEye ? landmarks.EyeLeftInner : landmarks.EyeRightInner; var outer = isLeftEye ? landmarks.EyeLeftOuter : landmarks.EyeRightOuter; if (top == null || bottom == null || inner == null || outer == null) { _logger.LogWarning("Missing eye landmarks"); return 1.0; // Return 1.0 (open) to prevent false positives for drowsiness } double verticalDist = CalculateDistance(top, bottom); double horizontalDist = CalculateDistance(inner, outer); // Simplified EAR for Azure 27-point model double ear = verticalDist / horizontalDist; _logger.LogDebug( "EAR for {Eye}: {Value:F3}", isLeftEye ? "left" : "right", ear ); return ear; } Usage Example: var leftEAR = ComputeEAR(landmarks, isLeftEye: true); var rightEAR = ComputeEAR(landmarks, isLeftEye: false); var avgEAR = (leftEAR + rightEAR) / 2.0; Console.WriteLine($"Average EAR: {avgEAR:F3}"); // Open eyes: ~0.25-0.30 // Closed eyes: ~0.10-0.15 5.3 Mouth Aspect Ratio (MAR) /// <summary> /// Calculates Mouth Aspect Ratio relative to face height. /// </summary> public double CalculateMouthAspectRatio(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthWidth = CalculateDistance(landmarks.MouthLeft, landmarks.MouthRight); double mouthOpenRatio = mouthHeight / faceRect.Height; double mouthWidthRatio = mouthWidth / faceRect.Width; _logger.LogDebug( "Mouth - Height ratio: {HeightRatio:F3}, Width ratio: {WidthRatio:F3}", mouthOpenRatio, mouthWidthRatio ); return mouthOpenRatio; } 5.4 Inter-Eye Distance /// <summary> /// Calculates the distance between pupils (inter-pupillary distance). /// </summary> public double CalculateInterEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.PupilLeft, landmarks.PupilRight); } /// <summary> /// Calculates distance between inner eye corners. /// </summary> public double CalculateInnerEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.EyeLeftInner, landmarks.EyeRightInner); } 5.5 Face Symmetry Analysis /// <summary> /// Analyzes facial symmetry by comparing left and right sides. /// </summary> public FaceSymmetryMetrics AnalyzeFaceSymmetry(FaceLandmarks landmarks) { double centerX = landmarks.NoseTip.X; double leftEyeDistance = CalculateDistance(landmarks.EyeLeftInner, new { X = centerX, Y = landmarks.EyeLeftInner.Y }); double leftMouthDistance = CalculateDistance(landmarks.MouthLeft, new { X = centerX, Y = landmarks.MouthLeft.Y }); double rightEyeDistance = CalculateDistance(landmarks.EyeRightInner, new { X = centerX, Y = landmarks.EyeRightInner.Y }); double rightMouthDistance = CalculateDistance(landmarks.MouthRight, new { X = centerX, Y = landmarks.MouthRight.Y }); return new FaceSymmetryMetrics { EyeSymmetryRatio = leftEyeDistance / rightEyeDistance, MouthSymmetryRatio = leftMouthDistance / rightMouthDistance, IsSymmetric = Math.Abs(leftEyeDistance - rightEyeDistance) < 5.0 }; } public class FaceSymmetryMetrics { public double EyeSymmetryRatio { get; set; } public double MouthSymmetryRatio { get; set; } public bool IsSymmetric { get; set; } } Part 6: Head Pose Estimation 6.1 Understanding Head Pose Angles Azure Face API provides three Euler angles for head orientation: 6.2 Accessing Head Pose Data public void AnalyzeHeadPose(FaceDetectionResult face) { var headPose = face.FaceAttributes?.HeadPose; if (headPose == null) { _logger.LogWarning("Head pose not available"); return; } double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; Console.WriteLine("Head Pose:"); Console.WriteLine($" Yaw: {yaw:F2}° (Left/Right)"); Console.WriteLine($" Pitch: {pitch:F2}° (Up/Down)"); Console.WriteLine($" Roll: {roll:F2}° (Tilt)"); InterpretHeadPose(yaw, pitch, roll); } 6.3 Interpreting Head Pose public string InterpretHeadPose(double yaw, double pitch, double roll) { var directions = new List<string>(); // Interpret Yaw (horizontal) if (Math.Abs(yaw) < 10) directions.Add("Looking Forward"); else if (yaw < -20) directions.Add($"Turned Left ({Math.Abs(yaw):F0}°)"); else if (yaw > 20) directions.Add($"Turned Right ({yaw:F0}°)"); // Interpret Pitch (vertical) if (Math.Abs(pitch) < 10) directions.Add("Level"); else if (pitch < -15) directions.Add($"Looking Down ({Math.Abs(pitch):F0}°)"); else if (pitch > 15) directions.Add($"Looking Up ({pitch:F0}°)"); // Interpret Roll (tilt) if (Math.Abs(roll) > 15) { string side = roll < 0 ? "Left" : "Right"; directions.Add($"Tilted {side} ({Math.Abs(roll):F0}°)"); } return string.Join(", ", directions); } 6.4 Visualizing Head Pose on Frame /// <summary> /// Draws head pose information with color-coded indicators. /// </summary> public void DrawHeadPoseInfo(Mat frame, HeadPose headPose, FaceRectangle faceRect) { double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; int centerX = faceRect.Left + faceRect.Width / 2; int centerY = faceRect.Top + faceRect.Height / 2; string poseText = $"Yaw: {yaw:F1}° Pitch: {pitch:F1}° Roll: {roll:F1}°"; Cv2.PutText(frame, poseText, new Point(faceRect.Left, faceRect.Top - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(255, 255, 255), 1); int arrowLength = 50; double yawRadians = yaw * Math.PI / 180.0; int arrowEndX = centerX + (int)(arrowLength * Math.Sin(yawRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(arrowEndX, centerY), new Scalar(0, 255, 0), 2, tipLength: 0.3); double pitchRadians = -pitch * Math.PI / 180.0; int arrowPitchEndY = centerY + (int)(arrowLength * Math.Sin(pitchRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(centerX, arrowPitchEndY), new Scalar(255, 0, 0), 2, tipLength: 0.3); } 6.5 Detecting Head Orientation States public enum HeadOrientation { Forward, Left, Right, Up, Down, TiltedLeft, TiltedRight, UpLeft, UpRight, DownLeft, DownRight } public List<HeadOrientation> DetectHeadOrientation(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; var orientations = new List<HeadOrientation>(); if (!lookingUp && !lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Forward); if (lookingUp && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Up); if (lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Down); if (lookingLeft && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Left); if (lookingRight && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Right); if (lookingUp && lookingLeft) orientations.Add(HeadOrientation.UpLeft); if (lookingUp && lookingRight) orientations.Add(HeadOrientation.UpRight); if (lookingDown && lookingLeft) orientations.Add(HeadOrientation.DownLeft); if (lookingDown && lookingRight) orientations.Add(HeadOrientation.DownRight); return orientations; } Part 7: Real-Time Video Processing 7.1 Setting Up Video Capture using OpenCvSharp; public class RealTimeFaceAnalyzer : IDisposable { private VideoCapture? _capture; private Mat? _frame; private readonly FaceClient _faceClient; private bool _isRunning; public async Task StartAsync() { _capture = new VideoCapture(0); _frame = new Mat(); _isRunning = true; await Task.Run(() => ProcessVideoLoop()); } private async Task ProcessVideoLoop() { while (_isRunning) { if (_capture == null || !_capture.IsOpened()) break; _capture.Read(_frame); if (_frame == null || _frame.Empty()) { await Task.Delay(1); // Minimal delay to prevent CPU spiking continue; } Cv2.Resize(_frame, _frame, new Size(640, 480)); // Ensure we don't await indefinitely in the rendering loop _ = ProcessFrameAsync(_frame.Clone()); Cv2.ImShow("Face Analysis", _frame); if (Cv2.WaitKey(30) == 'q') break; } Dispose(); } private async Task ProcessFrameAsync(Mat frame) { // This is where your DrawFaceBox, DrawAllLandmarks, and EAR logic will sit. // Remember to use try-catch here to prevent API errors from crashing the loop. } public void Dispose() { _isRunning = false; _capture?.Dispose(); _frame?.Dispose(); Cv2.DestroyAllWindows(); } } 7.2 Optimizing API Calls Problem: Calling Azure Face API on every frame (30 fps) is expensive and slow. Solution: Call API once per second, cache results for 30 frames. private List<FaceDetectionResult> _cachedFaces = new(); private DateTime _lastDetectionTime = DateTime.MinValue; private readonly object _cacheLock = new(); private async Task ProcessFrameAsync(Mat frame) { if ((DateTime.Now - _lastDetectionTime).TotalSeconds >= 1.0) { _lastDetectionTime = DateTime.Now; byte[] imageBytes; Cv2.ImEncode(".jpg", frame, out imageBytes); var faces = await DetectFacesAsync(imageBytes); lock (_cacheLock) { _cachedFaces = faces; } } List<FaceDetectionResult> facesToProcess; lock (_cacheLock) { facesToProcess = _cachedFaces.ToList(); } foreach (var face in facesToProcess) { DrawFaceAnnotations(face, frame); } } Performance Improvement: 30x fewer API calls (1/sec instead of 30/sec) ~$0.02/hour instead of ~$0.60/hour Smooth 30 fps rendering < 100ms latency for visual updates 7.3 Drawing Complete Face Annotations private void DrawFaceAnnotations(FaceDetectionResult face, Mat frame) { DrawFaceBox(face, frame); if (face.FaceLandmarks != null) { DrawAllLandmarks(face.FaceLandmarks, frame); DrawEyeBoxes(face.FaceLandmarks, frame); DrawMouthBox(face.FaceLandmarks, frame); DrawNoseBox(face.FaceLandmarks, frame); double leftEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: true); double rightEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; Cv2.PutText(frame, $"EAR: {avgEAR:F3}", new Point(10, 30), HersheyFonts.HersheySimplex, 0.6, new Scalar(0, 255, 0), 2); } if (face.FaceAttributes?.HeadPose != null) { DrawHeadPoseInfo(frame, face.FaceAttributes.HeadPose, face.FaceRectangle); string orientation = InterpretHeadPose(face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll); Cv2.PutText(frame, orientation, new Point(10, 60), HersheyFonts.HersheySimplex, 0.6, new Scalar(255, 255, 0), 2); } } Part 8: Advanced Features and Use Cases 8.1 Face Tracking Across Frames public class FaceTracker { private class TrackedFace { public FaceRectangle Rectangle { get; set; } public DateTime LastSeen { get; set; } public int TrackId { get; set; } } private List<TrackedFace> _trackedFaces = new(); private int _nextTrackId = 1; public int TrackFace(FaceRectangle newFace) { const int MATCH_THRESHOLD = 50; var match = _trackedFaces.FirstOrDefault(tf => { double distance = Math.Sqrt(Math.Pow(tf.Rectangle.Left - newFace.Left, 2) + Math.Pow(tf.Rectangle.Top - newFace.Top, 2)); return distance < MATCH_THRESHOLD; }); if (match != null) { match.Rectangle = newFace; match.LastSeen = DateTime.Now; return match.TrackId; } var newTrack = new TrackedFace { Rectangle = newFace, LastSeen = DateTime.Now, TrackId = _nextTrackId++ }; _trackedFaces.Add(newTrack); return newTrack.TrackId; } public void RemoveOldTracks(TimeSpan maxAge) { _trackedFaces.RemoveAll(tf => DateTime.Now - tf.LastSeen > maxAge); } } 8.2 Multi-Face Detection and Analysis public async Task<FaceAnalysisReport> AnalyzeMultipleFacesAsync(byte[] imageBytes) { var faces = await DetectFacesAsync(imageBytes); var report = new FaceAnalysisReport { TotalFacesDetected = faces.Count, Timestamp = DateTime.Now, Faces = new List<SingleFaceAnalysis>() }; for (int i = 0; i < faces.Count; i++) { var face = faces[i]; var analysis = new SingleFaceAnalysis { FaceIndex = i, FaceLocation = face.FaceRectangle, FaceSize = face.FaceRectangle.Width * face.FaceRectangle.Height }; if (face.FaceLandmarks != null) { analysis.LeftEyeEAR = ComputeEAR(face.FaceLandmarks, true); analysis.RightEyeEAR = ComputeEAR(face.FaceLandmarks, false); analysis.InterPupillaryDistance = CalculateInterEyeDistance(face.FaceLandmarks); } if (face.FaceAttributes?.HeadPose != null) { analysis.HeadYaw = face.FaceAttributes.HeadPose.Yaw; analysis.HeadPitch = face.FaceAttributes.HeadPose.Pitch; analysis.HeadRoll = face.FaceAttributes.HeadPose.Roll; } report.Faces.Add(analysis); } report.Faces = report.Faces.OrderByDescending(f => f.FaceSize).ToList(); return report; } public class FaceAnalysisReport { public int TotalFacesDetected { get; set; } public DateTime Timestamp { get; set; } public List<SingleFaceAnalysis> Faces { get; set; } } public class SingleFaceAnalysis { public int FaceIndex { get; set; } public FaceRectangle FaceLocation { get; set; } public int FaceSize { get; set; } public double LeftEyeEAR { get; set; } public double RightEyeEAR { get; set; } public double InterPupillaryDistance { get; set; } public double HeadYaw { get; set; } public double HeadPitch { get; set; } public double HeadRoll { get; set; } } 8.3 Exporting Landmark Data to JSON using System.Text.Json; public string ExportLandmarksToJson(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; var landmarkData = new { Face = new { Rectangle = new { face.FaceRectangle.Left, face.FaceRectangle.Top, face.FaceRectangle.Width, face.FaceRectangle.Height } }, Eyes = new { Left = new { Outer = new { landmarks.EyeLeftOuter.X, landmarks.EyeLeftOuter.Y }, Inner = new { landmarks.EyeLeftInner.X, landmarks.EyeLeftInner.Y }, Top = new { landmarks.EyeLeftTop.X, landmarks.EyeLeftTop.Y }, Bottom = new { landmarks.EyeLeftBottom.X, landmarks.EyeLeftBottom.Y } }, Right = new { Outer = new { landmarks.EyeRightOuter.X, landmarks.EyeRightOuter.Y }, Inner = new { landmarks.EyeRightInner.X, landmarks.EyeRightInner.Y }, Top = new { landmarks.EyeRightTop.X, landmarks.EyeRightTop.Y }, Bottom = new { landmarks.EyeRightBottom.X, landmarks.EyeRightBottom.Y } } }, Mouth = new { UpperLipTop = new { landmarks.UpperLipTop.X, landmarks.UpperLipTop.Y }, UnderLipBottom = new { landmarks.UnderLipBottom.X, landmarks.UnderLipBottom.Y }, Left = new { landmarks.MouthLeft.X, landmarks.MouthLeft.Y }, Right = new { landmarks.MouthRight.X, landmarks.MouthRight.Y } }, Nose = new { Tip = new { landmarks.NoseTip.X, landmarks.NoseTip.Y }, RootLeft = new { landmarks.NoseRootLeft.X, landmarks.NoseRootLeft.Y }, RootRight = new { landmarks.NoseRootRight.X, landmarks.NoseRootRight.Y } }, HeadPose = face.FaceAttributes?.HeadPose != null ? new { face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll } : null }; return JsonSerializer.Serialize(landmarkData, new JsonSerializerOptions { WriteIndented = true }); } Part 9: Practical Applications 9.1 Gaze Direction Estimation public enum GazeDirection { Center, Left, Right, Up, Down, UpLeft, UpRight, DownLeft, DownRight } public GazeDirection EstimateGazeDirection(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; if (lookingUp && lookingLeft) return GazeDirection.UpLeft; if (lookingUp && lookingRight) return GazeDirection.UpRight; if (lookingDown && lookingLeft) return GazeDirection.DownLeft; if (lookingDown && lookingRight) return GazeDirection.DownRight; if (lookingUp) return GazeDirection.Up; if (lookingDown) return GazeDirection.Down; if (lookingLeft) return GazeDirection.Left; if (lookingRight) return GazeDirection.Right; return GazeDirection.Center; } 9.2 Expression Analysis Using Landmarks public class ExpressionAnalyzer { public bool IsSmiling(FaceLandmarks landmarks) { double mouthCenterY = (landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2; double leftCornerY = landmarks.MouthLeft.Y; double rightCornerY = landmarks.MouthRight.Y; return leftCornerY < mouthCenterY && rightCornerY < mouthCenterY; } public bool IsMouthOpen(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthOpenRatio = mouthHeight / faceRect.Height; return mouthOpenRatio > 0.08; // 8% of face height } public bool AreEyesClosed(FaceLandmarks landmarks) { double leftEAR = ComputeEAR(landmarks, isLeftEye: true); double rightEAR = ComputeEAR(landmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; return avgEAR < 0.18; // Threshold for closed eyes } } 9.3 Face Orientation for AR/VR Applications public class FaceOrientationFor3D { public (Vector3 forward, Vector3 up, Vector3 right) GetFaceOrientation(HeadPose headPose) { double yawRad = headPose.Yaw * Math.PI / 180.0; double pitchRad = headPose.Pitch * Math.PI / 180.0; double rollRad = headPose.Roll * Math.PI / 180.0; var forward = new Vector3((float)(Math.Sin(yawRad) * Math.Cos(pitchRad)), (float)(-Math.Sin(pitchRad)), (float)(Math.Cos(yawRad) * Math.Cos(pitchRad))); var up = new Vector3((float)(Math.Sin(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) - Math.Cos(yawRad) * Math.Sin(rollRad)), (float)(Math.Cos(pitchRad) * Math.Cos(rollRad)), (float)(Math.Cos(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) + Math.Sin(yawRad) * Math.Sin(rollRad))); var right = Vector3.Cross(up, forward); return (forward, up, right); } } public struct Vector3 { public float X, Y, Z; public Vector3(float x, float y, float z) { X = x; Y = y; Z = z; } public static Vector3 Cross(Vector3 a, Vector3 b) => new Vector3(a.Y * b.Z - a.Z * b.Y, a.Z * b.X - a.X * b.Z, a.X * b.Y - a.Y * b.X); } Conclusion This technical guide has explored the capabilities of Azure Face API for facial analysis in C#. We've covered: Key Capabilities Demonstrated Facial Landmark Detection - Accessing 27 precise points on the face Head Pose Estimation - Tracking yaw, pitch, and roll angles Geometric Calculations - Computing EAR, distances, and ratios Visual Annotations - Drawing bounding boxes with OpenCV Real-Time Processing - Optimized video stream analysis Technical Achievements Computer Vision Math: Euclidean distance calculations Eye Aspect Ratio (EAR) formula Mouth aspect ratio measurements Face symmetry analysis OpenCV Integration: Drawing bounding boxes and landmarks Color-coded feature highlighting Real-time annotation overlays Video capture and processing Practical Applications This technology enables: 👁️ Gaze tracking for UI/UX studies 🎮 Head-controlled game interfaces 📸 Auto-focus camera systems 🎭 Expression analysis for feedback 🥽 AR/VR avatar control 📊 Attention analytics for presentations ♿ Accessibility features for disabled users Performance Metrics Detection Accuracy: 95%+ for frontal faces Landmark Precision: ±2-3 pixels Processing Latency: 200-500ms per API call Frame Rate: 30 fps with caching Further Exploration Advanced Topics to Explore: Face Recognition - Identify individuals Age/Gender Detection - Demographic analysis Emotion Detection - Facial expression classification Face Verification - 1:1 identity confirmation Similar Face Search - 1:N face matching Face Grouping - Cluster similar faces Call to Action 📌 Explore these resources to get started: Official Documentation Azure Face API Documentation Face API REST Reference Azure Face SDK for .NET Related Libraries OpenCVSharp - OpenCV wrapper for .NET System.Drawing - .NET image processing Source Code GitHub Repository: ravimodi_microsoft/SmartDriver Sample Code: Included in this article
ravimodi
Feb 26, 2026 Place Microsoft Developer Community Blog
93Views
0likes
0Comments
Upcoming webinar: Maximize the Cost Efficiency of AI Agents on Azure
AI agents are quickly becoming central to how organizations automate work, engage customers, and unlock new insights. But as adoption accelerates, so do questions about cost, ROI, and long-term sustainability. That’s exactly what the Maximize the Cost Efficiency of AI Agents on Azure webinar is designed to address. The webinar will provide practical guidance on building and scaling AI agents on Azure with financial discipline in mind. Rather than focusing only on technology, the session helps learners connect AI design decisions to real business outcomes—covering everything from identifying high-impact use cases and understanding cost drivers to forecasting ROI. Whether you’re just starting your AI journey or expanding AI agents across the enterprise, the session will equip you with strategies to make informed, cost-conscious decisions at every stage—from architecture and model selection to ongoing optimization and governance. Who should attend? If you are in one of these roles and are a decision maker or can influence decision makers in AI decisions or need to show ROI metrics on AI, this session is for you. Developer Administrator Solution Architect AI Engineer Business Analyst Business User Technology Manager Why attending the webinar? In the webinar, you’ll hear how to translate theory into real-world scenarios, walk through common cost pitfalls, and show how organizations are applying these principles in practice. Most importantly, the webinar helps you connect the dots faster, turning what you’ve learned into actionable insights you can apply immediately, ask questions live, and gain clarity on how to maximize ROI while scaling AI responsibly. If you care about building AI agents that are not only innovative but also efficient, governable, and financially sustainable, this training—and this webinar that complements it—are well worth your time. Register for the free webinar today for the event on March 5, 2026, 8:00 AM - 9:00 AM (UTC-08:00) Pacific Time (US & Canada). Who will speak at the webinar? Your speakers will be: Carlotta Castelluccio: Carlotta is a Senior AI Advocate with the mission of helping every developer to succeed with AI, by building innovative solutions responsibly. To achieve this goal, she develops technical content, and she hosts skilling sessions, enabling her audience to take the most out of AI technologies and to have an impact on Microsoft AI products’ roadmap. Nitya Narasimhan: Nitya is a PhD and Polyglot with 25+ years of software research & development experience spanning mobile, web, cloud and AI. She is an innovator (12+ patents), a visual storyteller (@sketchtedocs), and an experienced community builder in the Greater New York area. As a senior AI Advocate on the Core AI Developer Relations team, she acts as "developer 0" for the Microsoft Foundry platform, providing product feedback and empowering AI developers to build trustworthy AI solutions with code samples, open-source curricula and content-initiatives like Model Mondays. Prior to joining Microsoft, she spent a decade in Motorola Labs working on ubiquitous & mobile computing research, founded Google Developer Groups in New York, and consulted for startups building real-time experiences for enterprise. Her current interests span Model understanding & customization, E2E Observability & Safety, and agentic AI workflows for maintainable software. Moderator Lee Stott is a Principal Cloud Advocate at Microsoft, working in the Core AI Developer Relations Team. He helps developers and organizations build responsibly with AI and cloud technologies through open-source projects, technical guidance, and global developer programs. Based in the UK, Lee brings deep hands-on experience across AI, Azure, and developer tooling. .
carlottacaste
Feb 18, 2026 Place Microsoft Developer Community Blog
175Views
0likes
0Comments
Complete Guide to Deploying OpenClaw on Azure Windows 11 Virtual Machine
1. Introduction to OpenClaw OpenClaw is an open-source AI personal assistant platform that runs on your own devices and executes real-world tasks. Unlike traditional cloud-based AI assistants, OpenClaw emphasizes local deployment and privacy protection, giving you complete control over your data. Key Features of OpenClaw Cross-Platform Support: Runs on Windows, macOS, Linux, and other operating systems Multi-Channel Integration: Interact with AI through messaging platforms like WhatsApp, Telegram, and Discord Task Automation: Execute file operations, browser control, system commands, and more Persistent Memory: AI remembers your preferences and contextual information Flexible AI Backends: Supports multiple large language models including Anthropic Claude and OpenAI GPT OpenClaw is built on Node.js and can be quickly installed and deployed via npm. 2. Security Advantages of Running OpenClaw on Azure VM Deploying OpenClaw on an Azure virtual machine instead of your personal computer offers significant security benefits: 1. Environment Isolation Azure VMs provide a completely isolated runtime environment. Even if the AI agent exhibits abnormal behavior or is maliciously exploited, it won't affect your personal computer or local data. This isolation mechanism forms the foundation of a zero-trust security architecture. 2. Network Security Controls Through Azure Network Security Groups (NSGs), you can precisely control which IP addresses can access your virtual machine. The RDP rules configured in the deployment script allow you to securely connect to your Windows 11 VM via Remote Desktop while enabling further restrictions on access sources. 3. Data Persistence and Backup Azure VM managed disks support automatic snapshots and backups. Even if the virtual machine encounters issues, your OpenClaw configuration and data remain safe. 4. Elastic Resource Management You can adjust VM specifications (memory, CPU) at any time based on actual needs, or stop the VM when not in use to save costs, maintaining maximum flexibility. 5. Enterprise-Grade Authentication Azure supports integration with Azure Active Directory (Entra ID) for identity verification, allowing you to assign different access permissions to team members for granular access control. 6. Audit and Compliance Azure provides detailed activity logs and audit trails, making it easy to trace any suspicious activity and meet enterprise compliance requirements. 3. Deployment Steps Explained This deployment script uses Azure CLI to automate the installation of OpenClaw and its dependencies on a Windows 11 virtual machine. Here are the detailed execution steps: Prerequisites Before running the script, ensure you have: Install Azure CLI # Windows users can download the MSI installer https://aka.ms/installazurecliwindows # macOS users brew install azure-cli # Linux users curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash 2. Log in to Azure Account az login 3. Prepare Deployment Script Save the provided deploy-windows11-vm.sh script locally and grant execute permissions: chmod +x deploy-windows11-vm.sh Step 1: Configure Deployment Parameters The script begins by defining key configuration variables that you can modify as needed: RESOURCE_GROUP="Your Azure Resource Group Name" # Resource group name VM_NAME="win11-openclaw-vm" # Virtual machine name LOCATION="Your Azure Regison Name" # Azure region ADMIN_USERNAME="Your Azure VM Administrator Name" # Administrator username ADMIN_PASSWORD="our Azure VM Administrator Password" # Administrator password (change to a strong password) VM_SIZE="Your Azure VM Size" # VM size (4GB memory) Security Recommendations: Always change ADMIN_PASSWORD to your own strong password Passwords should contain uppercase and lowercase letters, numbers, and special characters Never commit scripts containing real passwords to code repositories Step 2: Check and Create Resource Group The script first checks if the specified resource group exists, and creates it automatically if it doesn't: echo "Checking resource group $RESOURCE_GROUP..." az group show --name $RESOURCE_GROUP &> /dev/null if [ $? -ne 0 ]; then echo "Creating resource group $RESOURCE_GROUP..." az group create --name $RESOURCE_GROUP --location $LOCATION fi A resource group is a logical container in Azure used to organize and manage related resources. All associated resources (VMs, networks, storage, etc.) will be created within this resource group. Step 3: Create Windows 11 Virtual Machine This is the core step, using the az vm create command to create a Windows 11 Pro virtual machine: az vm create \ --resource-group $RESOURCE_GROUP \ --name $VM_NAME \ --image MicrosoftWindowsDesktop:windows-11:win11-24h2-pro:latest \ --size $VM_SIZE \ --admin-username $ADMIN_USERNAME \ --admin-password $ADMIN_PASSWORD \ --public-ip-sku Standard \ --nsg-rule RDP Parameter Explanations: --image: Uses the latest Windows 11 24H2 Professional edition image --size: Standard_B2s provides 2 vCPUs and 4GB memory, suitable for running OpenClaw --public-ip-sku Standard: Assigns a standard public IP --nsg-rule RDP: Automatically creates network security group rules allowing RDP (port 3389) inbound traffic Step 4: Retrieve Virtual Machine Public IP After VM creation completes, the script retrieves its public IP address: PUBLIC_IP=$(az vm show -d -g $RESOURCE_GROUP -n $VM_NAME --query publicIps -o tsv) echo "VM Public IP: $PUBLIC_IP" This IP address will be used for subsequent RDP remote connections. Step 5: Install Chocolatey Package Manager Using az vm run-command to execute PowerShell scripts inside the VM, first installing Chocolatey: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString( 'https://community.chocolatey.org/install.ps1'))" Chocolatey is a package manager for Windows, similar to apt or yum on Linux, simplifying subsequent software installations. Step 6: Install Git Git is a dependency for many npm packages, especially those that need to download source code from GitHub for compilation: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install git -y" Step 7: Install CMake and Visual Studio Build Tools Some of OpenClaw's native modules require compilation, necessitating the installation of C++ build toolchain: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install cmake visualstudio2022buildtools visualstudio2022-workload-vctools -y" Component Descriptions: cmake: Cross-platform build system visualstudio2022buildtools: VS 2022 Build Tools visualstudio2022-workload-vctools: C++ development toolchain Step 8: Install Node.js LTS Install the Node.js Long Term Support version, which is the core runtime environment for OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); C:\ProgramData\chocolatey\bin\choco.exe install nodejs-lts -y" The script refreshes environment variables first to ensure Chocolatey is in the PATH, then installs Node.js LTS. Step 9: Globally Install OpenClaw Use npm to globally install OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); npm install -g openclaw" Global installation makes the openclaw command available from anywhere in the system. Step 10: Configure Environment Variables Add Node.js and npm global paths to the system PATH environment variable: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts " $npmGlobalPath = 'C:\Program Files\nodejs'; $npmUserPath = [System.Environment]::GetFolderPath('ApplicationData') + '\npm'; $currentPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine'); if ($currentPath -notlike \"*$npmGlobalPath*\") { $newPath = $currentPath + ';' + $npmGlobalPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added Node.js path to system PATH'; } if ($currentPath -notlike \"*$npmUserPath*\") { $newPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine') + ';' + $npmUserPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added npm global path to system PATH'; } Write-Host 'Environment variables updated successfully!'; " This ensures that node, npm, and openclaw commands can be used directly even in new terminal sessions. Step 11: Verify Installation The script finally verifies that all software is correctly installed: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); Write-Host 'Node.js version:'; node --version; Write-Host 'npm version:'; npm --version; Write-Host 'openclaw:'; npm list -g openclaw" Successful output should look similar to: Node.js version: v20.x.x npm version: 10.x.x openclaw: openclaw@x.x.x Step 12: Connect to Virtual Machine After deployment completes, the script outputs connection information: ============================================ Deployment completed! ============================================ Resource Group: Your Azure Resource Group Name VM Name: win11-openclaw-vm Public IP: xx.xx.xx.xx Admin Username: Your Administrator UserName VM Size: Your VM Size Connect via RDP: mstsc /v:xx.xx.xx.xx ============================================ Connection Methods: Windows Users: Press Win + R to open Run dialog Enter mstsc /v:public_ip and press Enter Log in using the username and password set in the script macOS Users: Download "Windows App" from the App Store Add PC connection with the public IP Log in using the username and password set in the script Linux Users: # Use Remmina or xfreerdp xfreerdp /u:username /v:public_ip Step 13: Initialize OpenClaw After connecting to the VM, run the following in PowerShell or Command Prompt # Initialize OpenClaw openclaw onboard # Configure AI model API key # Edit configuration file: C:\Users\username\.openclaw\openclaw.json notepad $env:USERPROFILE\.openclaw\openclaw.json Add your AI API key in the configuration file: { "agents": { "defaults": { "model": "Your Model Name", "apiKey": "your-api-key-here" } } } Step 14: Start OpenClaw # Start Gateway service openclaw gateway # In another terminal, connect messaging channels (e.g., WhatsApp) openclaw channels login Follow the prompts to scan the QR code and connect OpenClaw to your messaging app. 4. Summary Through this guide, we've successfully implemented the complete process of automatically deploying OpenClaw on an Azure Windows 11 virtual machine. The entire deployment process is highly automated, completing everything from VM creation to installing all dependencies and OpenClaw itself through a single script. Key Takeaways Automation Benefits: Using az vm run-command allows executing configuration scripts immediately after VM creation without manual RDP login Dependency Management: Chocolatey simplifies the Windows package installation workflow Environment Isolation: Running AI agents on cloud VMs protects local computers and data Scalability: Scripted deployment facilitates replication and team collaboration, easily deploying multiple instances Cost Optimization Tips Standard_B2s VMs cost approximately $0.05/hour (~$37/month) on pay-as-you-go pricing When not in use, stop the VM to only pay for storage costs Consider Azure Reserved Instances to save up to 72% Security Hardening Recommendations Change Default Port: Modify RDP port from 3389 to a custom port Enable JIT Access: Use Azure Security Center's just-in-time access feature Configure Firewall Rules: Only allow specific IP addresses to access Regular System Updates: Enable automatic Windows Updates Use Azure Key Vault: Store API keys in Key Vault instead of configuration files 5. Additional Resources Official Documentation OpenClaw Website: https://openclaw.ai OpenClaw GitHub: https://github.com/openclaw/openclaw OpenClaw Documentation: https://docs.openclaw.ai Azure CLI Documentation: https://docs.microsoft.com/cli/azure/ Azure Resources Azure VM Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ Azure Free Account: https://azure.microsoft.com/free/ (new users receive $200 credit) Azure Security Center: https://azure.microsoft.com/services/security-center/ Azure Key Vault: https://azure.microsoft.com/services/key-vault/
kinfey
Feb 12, 2026 Place Microsoft Developer Community Blog
6.7KViews
3likes
2Comments
Adding AI Personality to Browser Games
Introduction Browser games traditionally follow predictable patterns, fixed text messages, static tutorials, scripted NPC responses. Players see the same "Game Over" message whether they nearly won or failed spectacularly. Tutorial text remains identical regardless of player skill level. The game experience, while fun, lacks the dynamic reactivity of human-moderated gameplay. What if your Space Invaders game could comment on gameplay in real-time? Taunt players when they miss easy shots? Celebrate close victories with personalized messages? Adjust difficulty suggestions based on actual performance metrics? This article demonstrates exactly that: integrating AI-powered dynamic commentary into a browser game using Spaceinvaders-FoundryLocal, vanilla JavaScript, and Microsoft Foundry Local. You'll learn how to integrate local AI into client-side games, design AI personality systems that enhance rather than distract, implement context-aware commentary generation, and architect optional AI features that don't break core gameplay when unavailable. Whether you're building educational games, interactive training simulations, or simply adding personality to entertainment projects, this approach provides a blueprint for AI-enhanced gaming experiences. Why Local AI Transforms Browser Gaming Adding AI to games sounds expensive, cloud API costs scale with player counts, introducing per-gameplay pricing that makes free-to-play models challenging. Privacy concerns emerge when gameplay data leaves user devices. Latency affects real-time experiences, waiting 2 seconds for commentary after an action breaks immersion. Network requirements exclude offline play. Local AI solves all these challenges simultaneously. Foundry Local runs Small Language Models (SLMs) entirely on player devices, no API costs, no data leaving the machine, no network dependency. Inference happens in milliseconds, enabling truly real-time responses. Games work offline after initial load, perfect for mobile or low-connectivity scenarios. SLMs excel at personality-driven tasks like game commentary. They don't need perfect factual recall or complex reasoning, they generate entertaining, contextually relevant text based on game state. A 1.5B parameter model produces engaging taunts and celebration messages indistinguishable from hand-written content, while running easily on mid-range laptops. Integrating AI as an optional enhancement demonstrates good architecture. Core gameplay must function perfectly without AI, commentary enhances the experience but failure doesn't break the game. This graceful degradation pattern ensures maximum compatibility while offering AI features to capable devices. Architecture: Progressive Enhancement with AI The Spaceinvaders-FoundryLocal implementation uses progressive enhancement, the game fully works without AI, but adds dynamic personality when available: The base game implements classic Space Invaders mechanics entirely in vanilla JavaScript. Player ship movement, bullet physics, enemy patterns, collision detection, scoring, and power-up systems all operate independently of AI. This ensures universal compatibility across browsers, devices, and network conditions. The AI layer adds dynamic commentary through a backend Node.js proxy. The proxy runs locally, communicates with Foundry Local, and provides game context to the AI for generating personalized messages. The game polls the proxy periodically, sending current game state (score, accuracy, wave number, power-up usage) and receiving commentary responses. The architecture flow for AI-enhanced gameplay: Player Action (e.g., destroys enemy) ↓ Game Updates State (score += 100, accuracy tracked) ↓ Game Checks AI Status (polling every 5 seconds) ↓ If AI Available: Send Game Context to Backend → { event: 'wave_complete', score: 2500, accuracy: 78%, wave: 3 } ↓ Backend builds prompt with context ↓ Foundry Local generates comment ↓ Return commentary to game → "Wave 3 conquered! Your 78% accuracy shows improving skills." ↓ Display in game UI (animated text bubble) This design demonstrates several key patterns: Zero-dependency core: Game playable immediately, AI adds value incrementally Graceful degradation: If AI unavailable, game shows generic messages Asynchronous enhancement: AI runs in background, never blocks gameplay Context-aware generation: Commentary reflects actual player performance Local-first architecture: Everything runs on player's machine—no servers, no tracking Implementing Context-Aware AI Commentary Effective game commentary requires understanding current gameplay context. The AI needs to know what just happened, how the player is performing, and what makes this moment interesting: // llm.js - AI integration module export class GameAI { constructor() { this.baseURL = 'http://localhost:3001'; // Local proxy server this.available = false; this.checkAvailability(); } async checkAvailability() { try { const response = await fetch(`${this.baseURL}/health`, { method: 'GET', timeout: 2000 }); this.available = response.ok; return this.available; } catch (error) { console.log('AI server not available (optional feature)'); this.available = false; return false; } } async generateComment(gameContext) { if (!this.available) { return this.getFallbackComment(gameContext.event); } try { const response = await fetch(`${this.baseURL}/api/comment`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(gameContext) }); if (!response.ok) { throw new Error('AI request failed'); } const data = await response.json(); return data.comment; } catch (error) { console.error('AI comment generation failed:', error); return this.getFallbackComment(gameContext.event); } } getFallbackComment(event) { // Static messages when AI unavailable const fallbacks = { 'wave_complete': 'Wave cleared!', 'player_hit': 'Shields damaged!', 'game_over': 'Game Over. Try again!', 'high_score': 'New high score!', 'power_up': 'Power-up collected!' }; return fallbacks[event] || 'Good job!'; } } The backend processes game context and generates contextually relevant commentary: // server.js - Node.js backend proxy import express from 'express'; import { FoundryLocalClient } from 'foundry-local-sdk'; const app = express(); const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT || 'http://127.0.0.1:5272' }); app.use(express.json()); app.use(express.cors()); // Allow browser game to connect app.get('/health', (req, res) => { res.json({ status: 'AI available', model: 'phi-3.5-mini' }); }); app.post('/api/comment', async (req, res) => { const { event, score, accuracy, wave, lives, combo } = req.body; // Build context-rich prompt const prompt = buildCommentPrompt(event, { score, accuracy, wave, lives, combo }); try { const completion = await foundry.chat.completions.create({ model: 'phi-3.5-mini', messages: [ { role: 'system', content: `You are an AI commander providing brief, encouraging commentary for a Space Invaders game. Be energetic, supportive, and sometimes humorous. Keep responses to 1-2 sentences maximum. Reference specific game metrics when relevant.` }, { role: 'user', content: prompt } ], temperature: 0.9, // High temperature for creative variety max_tokens: 50 }); const comment = completion.choices[0].message.content.trim(); res.json({ comment, model: 'phi-3.5-mini', timestamp: new Date().toISOString() }); } catch (error) { console.error('AI generation error:', error); res.status(500).json({ error: 'Commentary generation failed' }); } }); function buildCommentPrompt(event, context) { switch(event) { case 'wave_complete': return `The player just completed wave ${context.wave} with score ${context.score}. Their shooting accuracy is ${context.accuracy}%. ${context.lives} lives remaining. Generate an encouraging comment about their progress.`; case 'player_hit': return `The player got hit by an enemy! They now have ${context.lives} lives left. Score: ${context.score}. Provide a brief motivational comment to keep them engaged.`; case 'game_over': if (context.accuracy > 70) { return `Game over at wave ${context.wave}, score ${context.score}. The player had ${context.accuracy}% accuracy - pretty good! Generate an encouraging comment acknowledging their skill.`; } else { return `Game over at wave ${context.wave}, score ${context.score}. Accuracy was ${context.accuracy}%. Provide a supportive comment with a tip for improvement.`; } case 'combo_streak': return `Player achieved a ${context.combo}x combo streak! Score: ${context.score}. Generate an excited celebration comment.`; case 'power_up_used': return `Player activated a ${context.power_up_type} power-up. Generate a brief tactical comment about using it effectively.`; default: return `General gameplay comment. Score: ${context.score}, Wave: ${context.wave}.`; } } const PORT = 3001; app.listen(PORT, () => { console.log(`✓ Game AI server running on http://localhost:${PORT}`); console.log(`✓ Foundry Local endpoint: ${process.env.FOUNDRY_LOCAL_ENDPOINT || 'http://127.0.0.1:5272'}`); }); This backend demonstrates several best practices: Context-sensitive prompting: Different events get different prompt templates with relevant metrics Personality consistency: System message establishes tone and style guidelines Brevity constraints: max_tokens: 50 ensures comments don't overwhelm UI Creative variety: High temperature (0.9) produces diverse commentary on repeated events Performance-aware feedback: Comments adapt based on accuracy, lives remaining, combo streaks Integrating AI into Game Loop Without Performance Impact Games require 60 FPS to feel smooth, any blocking operation creates stutter. AI integration must be completely asynchronous and non-blocking: // game.js - Main game loop class SpaceInvadersGame { constructor() { this.ai = new GameAI(); this.lastAIUpdate = 0; this.aiUpdateInterval = 5000; // Poll AI every 5 seconds this.pendingAIRequest = false; // ... other game state } update(deltaTime) { // Core game logic (always runs) this.updatePlayer(deltaTime); this.updateEnemies(deltaTime); this.updateBullets(deltaTime); this.checkCollisions(); this.updatePowerUps(deltaTime); // AI commentary (optional, async) this.updateAI(deltaTime); } updateAI(deltaTime) { this.lastAIUpdate += deltaTime; // Only check AI periodically, never block gameplay if (this.lastAIUpdate >= this.aiUpdateInterval && !this.pendingAIRequest) { this.requestAICommentary(); } } async requestAICommentary() { // Check if there's an interesting event to comment on const event = this.getSignificantEvent(); if (!event) return; this.pendingAIRequest = true; // Fire-and-forget async request this.ai.generateComment({ event: event.type, score: this.score, accuracy: this.calculateAccuracy(), wave: this.currentWave, lives: this.lives, combo: this.comboMultiplier }) .then(comment => { this.displayAIComment(comment); this.lastAIUpdate = 0; }) .catch(error => { console.log('AI comment failed (non-critical):', error); }) .finally(() => { this.pendingAIRequest = false; }); } getSignificantEvent() { // Determine what's worth commenting on if (this.justCompletedWave) { this.justCompletedWave = false; return { type: 'wave_complete' }; } if (this.justGotHit) { this.justGotHit = false; return { type: 'player_hit' }; } if (this.comboMultiplier >= 5) { return { type: 'combo_streak' }; } return null; // Nothing interesting right now } displayAIComment(comment) { // Show comment in animated text bubble const bubble = document.createElement('div'); bubble.className = 'ai-comment-bubble'; bubble.textContent = comment; document.getElementById('game-container').appendChild(bubble); // Animate in setTimeout(() => bubble.classList.add('show'), 50); // Remove after 4 seconds setTimeout(() => { bubble.classList.remove('show'); setTimeout(() => bubble.remove(), 500); }, 4000); } calculateAccuracy() { if (this.shotsFired === 0) return 0; return Math.round((this.shotsHit / this.shotsFired) * 100); } } This integration pattern ensures: Zero gameplay impact: AI runs completely asynchronously—game never waits for AI Periodic updates only: Check AI every 5 seconds, not every frame (60 FPS → minimal CPU overhead) Event-driven commentary: Only request comments for significant moments, not continuous chatter Non-blocking display: Comments appear as animated overlays that don't interrupt gameplay Graceful failure: AI errors logged but never shown to players—game continues normally Designing AI Personality Systems Effective game AI has consistent personality that enhances rather than distracts. The system message establishes tone, response templates ensure variety, and context awareness makes commentary relevant: // Enhanced system message for consistent personality const AI_COMMANDER_PERSONALITY = ` You are AEGIS, an AI defense commander providing tactical commentary for a Space Invaders-style game. Your personality traits: - Enthusiastic but professional military commander tone - Celebrate victories with tactical language ("Excellent flanking maneuver!") - Acknowledge defeats with constructive feedback ("Regroup and maintain formation!") - Reference specific metrics to show you're paying attention - Keep responses to 1-2 sentences maximum - Use occasional humor but stay in character - Be encouraging even when player struggles Examples of your style: - "Wave neutralized! Your 85% accuracy shows precision targeting." - "Shield integrity compromised! Fall back and reassess the battlefield." - "Impressive combo multiplier! Sustained fire superiority achieved." - "That power-up spread pattern cleared the sector perfectly." `; // Context-aware response variety const RESPONSE_TEMPLATES = { wave_complete: { high_performance: [ "Your {accuracy}% accuracy led to decisive victory, Commander!", "Wave {wave} eliminated with tactical excellence!", "Strategic brilliance! {accuracy}% hit rate maintained." ], medium_performance: [ "Wave {wave} cleared. Solid tactics, Commander.", "Sector secured. Your {accuracy}% accuracy shows improvement potential.", "Objective achieved. Recommend tightening shot discipline." ], low_performance: [ "Wave {wave} cleared, but {accuracy}% accuracy needs work.", "Victory secured. Focus on accuracy in next engagement.", "Mission accomplished, though your hit rate needs improvement." ] }, player_hit: { lives_critical: [ "Critical damage! Only {lives} lives remain - exercise extreme caution!", "Shields failing! {lives} backup systems active.", "Red alert! Hull integrity at {lives} units." ], lives_okay: [ "Shields damaged. {lives} lives remaining. Stay focused!", "Hit sustained. {lives} backup systems online.", "Damage taken. Maintain defensive posture." ] } }; function selectResponseTemplate(event, context) { const templates = RESPONSE_TEMPLATES[event]; if (!templates) return; // Choose template category based on context let category; if (event === 'wave_complete') { if (context.accuracy >= 75) category = templates.high_performance; else if (context.accuracy >= 50) category = templates.medium_performance; else category = templates.low_performance; } else if (event === 'player_hit') { category = context.lives <= 2 ? templates.lives_critical : templates.lives_okay; } // Randomly select from category for variety const template = category[Math.floor(Math.random() * category.length)]; // Fill in context variables return template .replace('{accuracy}', context.accuracy) .replace('{wave}', context.wave) .replace('{lives}', context.lives); } This personality system creates: Consistent character: AEGIS always sounds like a military commander, never breaks character Context-appropriate responses: Different situations trigger different tones (celebration vs concern) Natural variety: Template randomization prevents repetitive commentary Metric awareness: Specific references to accuracy, lives, waves show AI is "watching" Encouraging feedback: Even in failure scenarios, provides constructive guidance Key Takeaways and Game AI Design Patterns Integrating AI into browser games demonstrates that advanced features don't require cloud services or complex infrastructure. Local AI enables personality-driven enhancements that run entirely on player devices, cost nothing at scale, and work offline. Essential principles for game AI integration: Progressive enhancement architecture: Core gameplay must work perfectly without AI—commentary enhances but isn't required Asynchronous-only integration: Never block game loop for AI—60 FPS gameplay is non-negotiable Context-aware generation: Commentary reflecting actual game state feels intelligent, generic messages feel robotic Personality consistency: Well-defined character voice creates memorable experiences Graceful failure handling: AI errors should be invisible to players—fallback to static messages Performance-conscious polling: Check AI every few seconds, not every frame Event-driven commentary: Only generate responses for significant moments This pattern extends beyond games, any interactive application benefits from context-aware AI personality: educational software providing personalized encouragement, fitness apps offering adaptive coaching, productivity tools giving motivational feedback. The complete implementation with game engine, AI integration, backend proxy, and deployment instructions is available at github.com/leestott/Spaceinvaders-FoundryLocal. Clone the repository to experience AI-enhanced gaming—just open index.html and start playing immediately, then optionally enable AI features for dynamic commentary. Resources and Further Reading Space Invaders with AI Repository - Complete game with AI integration Quick Start Guide - Play immediately or enable AI features Microsoft Foundry Local Documentation - SDK and model reference MDN Game Development - Browser game development patterns HTML5 Game Devs Forum - Community discussions and techniques
Lee_Stott
Feb 10, 2026 Place Microsoft Developer Community Blog
180Views
2likes
0Comments
How To: Send requests to Azure Storage from Azure API Management
In this How To, I will show a simple mechanism for writing a payload to Azure Blob Storage from Azure API Management. Some examples where this is useful is implementing a Claim-Check pattern for large messages or to support message logging when Application Insights is not suitable.
chilberto
Feb 05, 2026 Place Microsoft Developer Community Blog
29KViews
1like
12Comments
How to Build Safe Natural Language-Driven APIs
TL;DR Building production natural language APIs requires separating semantic parsing from execution. Use LLMs to translate user text into canonical structured requests (via schemas), then execute those requests deterministically. Key patterns: schema completion for clarification, confidence gates to prevent silent failures, code-based ontologies for normalization, and an orchestration layer. This keeps language as input, not as your API contract. Introduction APIs that accept natural language as input are quickly becoming the norm in the age of agentic AI apps and LLMs. From search and recommendations to workflows and automation, users increasingly expect to "just ask" and get results. But treating natural language as an API contract introduces serious risks in production systems: Nondeterministic behavior Prompt-driven business logic Difficult debugging and replay Silent failures that are hard to detect In this post, I'll describe a production-grade architecture for building safe, natural language-driven APIs: one that embraces LLMs for intent discovery and entity extraction while preserving the determinism, observability, and reliability that backend systems require. This approach is based on building real systems using Azure OpenAI and LangGraph, and on lessons learned the hard way. The Core Problem with Natural Language APIs Natural language is an excellent interface for humans. It is a poor interface for systems. When APIs accept raw text directly and execute logic based on it, several problems emerge: The API contract becomes implicit and unversioned Small prompt changes cause behavioral changes Business logic quietly migrates into prompts In short: language becomes the contract, and that's fragile. The solution is not to avoid natural language, but to contain it. A Key Principle: Natural Language Is Input, Not a Contract So how do we contain it? The answer lies in treating natural language fundamentally differently than we treat traditional API inputs. The most important design decision we made was this: Natural language should be translated into structure, not executed directly. That single principle drives the entire architecture. Instead of building "chatty APIs," we split responsibilities clearly: Natural language is used for intent discovery and entity extraction Structured data is used for execution Two Explicit API Layers This principle translates into a concrete architecture with two distinct API layers, each with a single, clear responsibility. 1. Semantic Parse API (Natural Language → Structure) This API: Accepts user text Extracts intent and entities using LLMs Completes a predefined schema Asks clarifying questions when required Returns a canonical, structured request Does not execute business logic Think of this as a compiler, not an engine. 2. Structured Execution API (Structure → Action) This API: Accepts only structured input Calls downstream systems to process the request and get results Is deterministic and versioned Contains no natural language handling Is fully testable and replayable This is where execution happens. Why This Separation Matters Separating these layers gives you: A stable, versionable API contract Freedom to improve NLP without breaking clients Clear ownership boundaries Deterministic execution paths Most importantly, it prevents LLM behavior from leaking into core business logic. Canonical Schemas Are the Backbone Now that we've established the two-layer architecture, let's dive into what makes it work: canonical schemas. Each supported intent is defined by a canonical schema that lives in code. Example (simplified): This schema is used when a user is looking for similar product recommendations. The entities capture which product to use as reference and how to bias the recommendations toward price or quality. { "intent": "recommend_similar", "entities": { "reference_product_id": "string", "price_bias": "number (-1 to 1)", "quality_bias": "number (-1 to 1)" } } Schemas define: Required vs optional fields Allowed ranges and types Validation rules They are the contract, not the prompt. When a user says "show me products like the blue backpack but cheaper", the LLM extracts: Intent: recommend_similar reference_product_id: "blue_backpack_123" price_bias: -0.8 (strongly prefer cheaper) quality_bias: 0.0 (neutral) The schema ensures that even if the user phrased it as "find alternatives to item 123 with better pricing" or "cheaper versions of that blue bag", the output is always the same structure. The natural language variation is absorbed at the semantic layer. The execution layer receives a consistent, validated request every time. This decoupling is what makes the system maintainable. Schema Completion, Not Free-Form Chat But what happens when the user's input doesn't contain all the information needed to complete the schema? This is where structured clarification comes in. A common misconception is that clarification means "chatting until it feels right." In production systems, clarification is schema completion. If required fields are missing or ambiguous, the semantic API responds with: What information is missing A targeted clarification question The current schema state Example response: { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } The state object is the memory. The API itself remains stateless. A Complete Conversation Flow To illustrate how schema completion works in practice, here's a full conversation flow where the user's initial request is missing required information: Initial Request: User: "Show me cheaper alternatives with good quality" API Response (needs clarification): { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } Follow-up Request: User: "The blue backpack" Client sends: { "user_input": "The blue backpack", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } API Response (complete): { "status": "complete", "canonical_request": { "intent": "recommend_similar", "entities": { "reference_product_id": "blue_backpack_123", "price_bias": -0.3, "quality_bias": 0.4 } } } The client passes the state back with each clarification. The API remains stateless, while the client manages the conversation context. Once complete, the canonical_request can be sent directly to the execution API. Why LangGraph Fits This Problem Perfectly With schemas and clarification flows defined, we need a way to orchestrate the semantic parsing workflow reliably. This is where LangGraph becomes valuable. LangGraph allows semantic parsing to be modeled as a structured, deterministic workflow with explicit decision points: Classify intent: Determine what the user wants to do from a predefined set of supported actions Extract candidate entities: Pull out relevant parameters from the natural language input using the LLM Merge into schema state: Map the extracted values into the canonical schema structure Validate required fields: Check if all mandatory fields are present and values are within acceptable ranges Either complete or request clarification: Return the canonical request if complete, or ask a targeted question if information is missing Each node has a single responsibility. Validation and routing are done in code, not by the LLM. LangGraph provides: Explicit state transitions Deterministic routing Observable execution Safe retries Used this way, it becomes a powerful orchestration tool, not a conversational agent. Confidence Gates Prevent Silent Failures Structured workflows handle the process, but there's another critical safety mechanism we need: knowing when the LLM isn't confident about its extraction. Even when outputs are structurally valid, they may not be reliable. We require the semantic layer to emit a confidence score. If confidence falls below a threshold, execution is blocked and clarification is requested. This simple rule eliminates an entire class of silent misinterpretations that are otherwise very hard to detect. Example: When a user says "Show me items similar to the bag", the LLM might extract: { "intent": "recommend_similar", "confidence": 0.55, "entities": { "reference_product_id": "generic_bag_001", "confidence_scores": { "reference_product_id": 0.4 } } } The overall confidence is low (0.55), and the entity confidence for reference_product_id is very low (0.4) because "the bag" is ambiguous. There might be hundreds of bags in the catalog. Instead of proceeding with a potentially wrong guess, the API responds: { "status": "needs_clarification", "reason": "low_confidence", "question": "I found multiple bags. Did you mean the blue backpack, the leather tote, or the travel duffel?", "confidence": 0.55 } This prevents the system from silently executing the wrong recommendation and provides a better user experience. Lightweight Ontologies (Keep Them in Code) Beyond confidence scoring, we need a way to normalize the variety of terms users might use into consistent canonical values. We also introduced lightweight, code-level ontologies: Allowed intents Required entities per intent Synonym-to-canonical mappings Cross-field validation rules These live in code and configuration, not in prompts. LLMs propose values. Code enforces meaning. Example: Consider these user inputs that all mean the same thing: "Show me cheaper options" "Find budget-friendly alternatives" "I want something more affordable" "Give me lower-priced items" The LLM might extract different values: "cheaper", "budget-friendly", "affordable", "lower-priced". The ontology maps all of these to a canonical value: PRICE_BIAS_SYNONYMS = { "cheaper": -0.7, "budget-friendly": -0.7, "affordable": -0.7, "lower-priced": -0.7, "expensive": 0.7, "premium": 0.7, "high-end": 0.7 } When the LLM extracts "budget-friendly", the code normalizes it to -0.7 for the price_bias field. Similarly, cross-field validation catches logical inconsistencies: if entities["price_bias"] < -0.5 and entities["quality_bias"] > 0.5: return clarification("You want cheaper items with higher quality. This might be difficult. Should I prioritize price or quality?") The LLM proposes. The ontology normalizes. The validation enforces business rules. What About Latency? A common concern with multi-step semantic parsing is performance. In practice, we observed: Intent classification: ~40 ms Entity extraction: ~200 ms Validation and routing: ~1 ms Total overhead: ~250–300 ms. For chat-driven user experiences, this is well within acceptable bounds and far cheaper than incorrect or inconsistent execution. Key Takeaways Let's bring it all together. If you're building APIs that accept natural language in production: Do not make language your API contract Translate language into canonical structure Own schema completion server-side Use LLMs for discovery and extraction, not execution Treat safety and determinism as first-class requirements Natural language is an input format. Structure is the contract. Closing Thoughts LLMs make it easy to build impressive demos. Building safe, reliable systems with them requires discipline. By separating semantic interpretation from execution, and by using tools like Azure OpenAI and LangGraph thoughtfully, you can build natural language-driven APIs that scale, evolve, and behave predictably in production. Hopefully, this architecture saves you a few painful iterations.
pratikpanda
Feb 03, 2026 Place Microsoft Developer Community Blog
395Views
0likes
0Comments
Benchmarking Local AI Models
Introduction Selecting the right AI model for your application requires more than reading benchmark leaderboards. Published benchmarks measure academic capabilities, question answering, reasoning, coding, but your application has specific requirements: latency budgets, hardware constraints, quality thresholds. How do you know if Phi-4 provides acceptable quality for your document summarization use case? Will Qwen2.5-0.5B meet your 100ms response time requirement? Does your edge device have sufficient memory for Phi-3.5 Mini? The answer lies in empirical testing: running actual models on your hardware with your workload patterns. This article demonstrates building a comprehensive model benchmarking platform using FLPerformance, Node.js, React, and Microsoft Foundry Local. You'll learn how to implement scientific performance measurement, design meaningful benchmark suites, visualize multi-dimensional comparisons, and make data-driven model selection decisions. Whether you're evaluating models for production deployment, optimizing inference costs, or validating hardware specifications, this platform provides the tools for rigorous performance analysis. Why Model Benchmarking Requires Purpose-Built Tools You cannot assess model performance by running a few manual tests and noting the results. Scientific benchmarking demands controlled conditions, statistically significant sample sizes, multi-dimensional metrics, and reproducible methodology. Understand why purpose-built tooling is essential. Performance is multi-dimensional. A model might excel at throughput (tokens per second) but suffer at latency (time to first token). Another might generate high-quality outputs slowly. Your application might prioritize consistency over average performance, a model with variable response times (high p95/p99 latency) creates poor user experiences even if averages look good. Measuring all dimensions simultaneously enables informed tradeoffs. Hardware matters enormously. Benchmark results from NVIDIA A100 GPUs don't predict performance on consumer laptops. NPU acceleration changes the picture again. Memory constraints affect which models can even load. Test on your actual deployment hardware or comparable specifications to get actionable results. Concurrency reveals bottlenecks. A model handling one request excellently might struggle with ten concurrent requests. Real applications experience variable load, measuring only single-threaded performance misses critical scalability constraints. Controlled concurrency testing reveals these limits. Statistical rigor prevents false conclusions. Running a prompt once and noting the response time tells you nothing about performance distribution. Was this result typical? An outlier? You need dozens or hundreds of trials to establish p50/p95/p99 percentiles, understand variance, and detect stability issues. Comparison requires controlled experiments. Different prompts, different times of day, different system loads, all introduce confounding variables. Scientific comparison runs identical workloads across models sequentially, controlling for external factors. Architecture: Three-Layer Performance Testing Platform FLPerformance implements a clean separation between orchestration, measurement, and presentation: The frontend React application provides model management, benchmark configuration, test execution, and results visualization. Users add models from the Foundry Local catalog, configure benchmark parameters (iterations, concurrency, timeout values), launch test runs, and view real-time progress. The results dashboard displays comparison tables, latency distribution charts, throughput graphs, and "best model for..." recommendations. The backend Node.js/Express server orchestrates tests and captures metrics. It manages the single Foundry Local service instance, loads/unloads models as needed, executes benchmark suites with controlled concurrency, measures comprehensive metrics (TTFT, TPOT, total latency, throughput, error rates), and persists results to JSON storage. WebSocket connections provide real-time progress updates during long benchmark runs. Foundry Local SDK integration uses the official foundry-local-sdk npm package. The SDK manages service lifecycle, starting, stopping, health checkin, and handles model operations, downloading, loading into memory, unloading. It provides OpenAI-compatible inference APIs for consistent request formatting across models. The architecture supports simultaneous testing of multiple models by loading them one at a time, running identical benchmarks, and aggregating results for comparison: User Initiates Benchmark Run ↓ Backend receives {models: [...], suite: "default", iterations: 10} ↓ For each model: 1. Load model into Foundry Local 2. Execute benchmark suite - For each prompt in suite: * Run N iterations * Measure TTFT, TPOT, total time * Track errors and timeouts * Calculate tokens/second 3. Aggregate statistics (mean, p50, p95, p99) 4. Unload model ↓ Store results with metadata ↓ Return comparison data to frontend ↓ Visualize performance metrics Implementing Scientific Measurement Infrastructure Accurate performance measurement requires instrumentation that captures multiple dimensions without introducing measurement overhead: // src/server/benchmark.js import { performance } from 'perf_hooks'; export class BenchmarkExecutor { constructor(foundryClient, options = {}) { this.client = foundryClient; this.options = { iterations: options.iterations || 10, concurrency: options.concurrency || 1, timeout_ms: options.timeout_ms || 30000, warmup_iterations: options.warmup_iterations || 2 }; } async runBenchmarkSuite(modelId, prompts) { const results = []; // Warmup phase (exclude from results) console.log(`Running ${this.options.warmup_iterations} warmup iterations...`); for (let i = 0; i < this.options.warmup_iterations; i++) { await this.executePrompt(modelId, prompts[0].text); } // Actual benchmark runs for (const prompt of prompts) { console.log(`Benchmarking prompt: ${prompt.id}`); const measurements = []; for (let i = 0; i < this.options.iterations; i++) { const measurement = await this.executeMeasuredPrompt( modelId, prompt.text ); measurements.push(measurement); // Small delay between iterations to stabilize await sleep(100); } results.push({ prompt_id: prompt.id, prompt_text: prompt.text, measurements, statistics: this.calculateStatistics(measurements) }); } return { model_id: modelId, timestamp: new Date().toISOString(), config: this.options, results }; } async executeMeasuredPrompt(modelId, promptText) { const measurement = { success: false, error: null, ttft_ms: null, // Time to first token tpot_ms: null, // Time per output token total_ms: null, tokens_generated: 0, tokens_per_second: 0 }; try { const startTime = performance.now(); let firstTokenTime = null; let tokenCount = 0; // Streaming completion to measure TTFT const stream = await this.client.chat.completions.create({ model: modelId, messages: [{ role: 'user', content: promptText }], max_tokens: 200, temperature: 0.7, stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { if (firstTokenTime === null) { firstTokenTime = performance.now(); measurement.ttft_ms = firstTokenTime - startTime; } tokenCount++; } } const endTime = performance.now(); measurement.total_ms = endTime - startTime; measurement.tokens_generated = tokenCount; if (tokenCount > 1 && firstTokenTime) { // TPOT = time after first token / (tokens - 1) const timeAfterFirstToken = endTime - firstTokenTime; measurement.tpot_ms = timeAfterFirstToken / (tokenCount - 1); measurement.tokens_per_second = 1000 / measurement.tpot_ms; } measurement.success = true; } catch (error) { measurement.error = error.message; measurement.success = false; } return measurement; } calculateStatistics(measurements) { const successful = measurements.filter(m => m.success); const total = measurements.length; if (successful.length === 0) { return { success_rate: 0, error_rate: 1.0, sample_size: total }; } const ttfts = successful.map(m => m.ttft_ms).sort((a, b) => a - b); const tpots = successful.map(m => m.tpot_ms).filter(v => v !== null).sort((a, b) => a - b); const totals = successful.map(m => m.total_ms).sort((a, b) => a - b); const throughputs = successful.map(m => m.tokens_per_second).filter(v => v > 0); return { success_rate: successful.length / total, error_rate: (total - successful.length) / total, sample_size: total, ttft: { mean: mean(ttfts), median: percentile(ttfts, 50), p95: percentile(ttfts, 95), p99: percentile(ttfts, 99), min: Math.min(...ttfts), max: Math.max(...ttfts) }, tpot: tpots.length > 0 ? { mean: mean(tpots), median: percentile(tpots, 50), p95: percentile(tpots, 95) } : null, total_latency: { mean: mean(totals), median: percentile(totals, 50), p95: percentile(totals, 95), p99: percentile(totals, 99) }, throughput: { mean_tps: mean(throughputs), median_tps: percentile(throughputs, 50) } }; } } function mean(arr) { return arr.reduce((sum, val) => sum + val, 0) / arr.length; } function percentile(sortedArr, p) { const index = Math.ceil((sortedArr.length * p) / 100) - 1; return sortedArr[Math.max(0, index)]; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } This measurement infrastructure captures: Time to First Token (TTFT): Critical for perceived responsiveness—users notice delays before output begins Time Per Output Token (TPOT): Determines generation speed after first token—affects throughput Total latency: End-to-end time—matters for batch processing and high-volume scenarios Tokens per second: Overall throughput metric—useful for capacity planning Statistical distributions: Mean alone masks variability—p95/p99 reveal tail latencies that impact user experience Success/error rates: Stability metrics—some models timeout or crash under load Designing Meaningful Benchmark Suites Benchmark quality depends on prompt selection. Generic prompts don't reflect real application behavior. Design suites that mirror actual use cases: // benchmarks/suites/default.json { "name": "default", "description": "General-purpose benchmark covering diverse scenarios", "prompts": [ { "id": "short-factual", "text": "What is the capital of France?", "category": "factual", "expected_tokens": 5 }, { "id": "medium-explanation", "text": "Explain how photosynthesis works in 3-4 sentences.", "category": "explanation", "expected_tokens": 80 }, { "id": "long-reasoning", "text": "Analyze the economic factors that led to the 2008 financial crisis. Discuss at least 5 major causes with supporting details.", "category": "reasoning", "expected_tokens": 250 }, { "id": "code-generation", "text": "Write a Python function that finds the longest palindrome in a string. Include docstring and example usage.", "category": "coding", "expected_tokens": 150 }, { "id": "creative-writing", "text": "Write a short story (3 paragraphs) about a robot learning to paint.", "category": "creative", "expected_tokens": 200 } ] } This suite covers multiple dimensions: Length variation: Short (5 tokens), medium (80), long (250)—tests models across output ranges Task diversity: Factual recall, explanation, reasoning, code, creative—reveals capability breadth Token predictability: Expected token counts enable throughput calculations For production applications, create custom suites matching your actual workload: { "name": "customer-support", "description": "Simulates actual customer support queries", "prompts": [ { "id": "product-question", "text": "How do I reset my password for the customer portal?" }, { "id": "troubleshooting", "text": "I'm getting error code 503 when trying to upload files. What should I do?" }, { "id": "policy-inquiry", "text": "What is your refund policy for annual subscriptions?" } ] } Visualizing Multi-Dimensional Performance Comparisons Raw numbers don't reveal insights—visualization makes patterns obvious. The frontend implements several comparison views: Comparison Table shows side-by-side metrics: // frontend/src/components/ResultsTable.jsx export function ResultsTable({ results }) { return ( {results.map(result => ( ))} Model TTFT (ms) TPOT (ms) Throughput (tok/s) P95 Latency Error Rate {result.model_id} {result.stats.ttft.median.toFixed(0)} (p95: {result.stats.ttft.p95.toFixed(0)}) {result.stats.tpot?.median.toFixed(1) || 'N/A'} {result.stats.throughput.median_tps.toFixed(1)} {result.stats.total_latency.p95.toFixed(0)} ms 0.05 ? 'error' : 'success'}> {(result.stats.error_rate * 100).toFixed(1)}% ); } Latency Distribution Chart reveals performance consistency: // Using Chart.js for visualization export function LatencyChart({ results }) { const data = { labels: results.map(r => r.model_id), datasets: [ { label: 'Median (p50)', data: results.map(r => r.stats.total_latency.median), backgroundColor: 'rgba(75, 192, 192, 0.5)' }, { label: 'p95', data: results.map(r => r.stats.total_latency.p95), backgroundColor: 'rgba(255, 206, 86, 0.5)' }, { label: 'p99', data: results.map(r => r.stats.total_latency.p99), backgroundColor: 'rgba(255, 99, 132, 0.5)' } ] }; return ( ); } Recommendations Engine synthesizes multi-dimensional comparison: export function generateRecommendations(results) { const recommendations = []; // Find fastest TTFT (best perceived responsiveness) const fastestTTFT = results.reduce((best, r) => r.stats.ttft.median < best.stats.ttft.median ? r : best ); recommendations.push({ category: 'Fastest Response', model: fastestTTFT.model_id, reason: `Lowest median TTFT: ${fastestTTFT.stats.ttft.median.toFixed(0)}ms` }); // Find highest throughput const highestThroughput = results.reduce((best, r) => r.stats.throughput.median_tps > best.stats.throughput.median_tps ? r : best ); recommendations.push({ category: 'Best Throughput', model: highestThroughput.model_id, reason: `Highest tok/s: ${highestThroughput.stats.throughput.median_tps.toFixed(1)}` }); // Find most consistent (lowest p95-p50 spread) const mostConsistent = results.reduce((best, r) => { const spread = r.stats.total_latency.p95 - r.stats.total_latency.median; const bestSpread = best.stats.total_latency.p95 - best.stats.total_latency.median; return spread < bestSpread ? r : best; }); recommendations.push({ category: 'Most Consistent', model: mostConsistent.model_id, reason: 'Lowest latency variance (p95-p50 spread)' }); return recommendations; } Key Takeaways and Benchmarking Best Practices Effective model benchmarking requires scientific methodology, comprehensive metrics, and application-specific testing. FLPerformance demonstrates that rigorous performance measurement is accessible to any development team. Critical principles for model evaluation: Test on target hardware: Results from cloud GPUs don't predict laptop performance Measure multiple dimensions: TTFT, TPOT, throughput, consistency all matter Use statistical rigor: Single runs mislead—capture distributions with adequate sample sizes Design realistic workloads: Generic benchmarks don't predict your application's behavior Include warmup iterations: Model loading and JIT compilation affect early measurements Control concurrency: Real applications handle multiple requests—test at realistic loads Document methodology: Reproducible results require documented procedures and configurations The complete benchmarking platform with model management, measurement infrastructure, visualization dashboards, and comprehensive documentation is available at github.com/leestott/FLPerformance. Clone the repository and run the startup script to begin evaluating models on your hardware. Resources and Further Reading FLPerformance Repository - Complete benchmarking platform Quick Start Guide - Setup and first benchmark run Microsoft Foundry Local Documentation - SDK reference and model catalog Architecture Guide - System design and SDK integration Benchmarking Best Practices - Methodology and troubleshooting
Lee_Stott
Feb 02, 2026 Place Microsoft Developer Community Blog
494Views
1like
0Comments
Leverage AI for faster, more productive coding with GitHub Copilot
GitHub Copilot serves as an invaluable learning tool for developers, especially those who are still learning a particular programming language or framework. By providing context-aware suggestions, it helps learners understand the syntax, structure, and logic behind different code snippets. While fundamental knowledge of programming concepts and syntax services should remain a prerequisite, a developer exploring a new framework would be able to describe the functionality they want to implement, and GitHub Copilot will generate code that demonstrates how to achieve it. This accelerates the learning process and empowers developers to gain proficiency in new technologies more efficiently.
AaronStark
Dec 16, 2025 Place Microsoft Developer Community Blog
9.1KViews
1like
0Comments
Exploring the Future of AI Agents with Microsoft Foundry
Why Agentic AI Matters AI agents are no longer a distant vision—they’re here and transforming how businesses operate. According to industry analysts: Over 1 billion AI agents are expected to be in use by 2028. 80% of organisations plan to integrate agents within the next 2–3 years. By 2026, 40% of enterprise apps will include task-specific AI agents. Why this surge? Agents address critical challenges such as inefficiencies in manual processes, human error, lack of visibility, and scalability issues. They enable autonomous decision-making, with projections suggesting that by 2028, half of day-to-day work decisions will be made autonomously. From Chatbots to Intelligent Agents As Mary Joe highlighted, early chatbots relied on rigid rules and regular expressions, often leading to frustrating user experiences. The introduction of large language models (LLMs) changed the game, making interactions more natural. But true autonomy, where systems act on our behalf, required more than conversational AI. Agentic AI combines: Reasoning and planning capabilities. Tools and APIs for real-world actions. Memory for learning and improving over time. This evolution moves us beyond simple input-output interactions to intelligent systems that can execute workflows, validate data, and deliver outcomes. Microsoft Foundry: Your Platform for Building Agents Microsoft Foundry offers a Platform-as-a-Service (PaaS) approach for creating AI agents, striking a balance between control and ease of use. Key components include: Model Catalogue: Access models from OpenAI, Anthropic, Mistral, and more. Foundry Agent Service: Build and customise agents with integrated tools. Foundry IQ: Knowledge grounding for accurate responses. Control Plane: Ensures safety, trust, and observability in production. Whether you need full control (Infrastructure-as-a-Service) or simplicity (Software-as-a-Service via Copilot Studio), Foundry provides flexibility for diverse scenarios. What Makes an AI Solution Agentic? Unlike traditional AI apps that perform narrow tasks (e.g., extracting text from receipts), agentic solutions: Analyse inputs using LLMs and system instructions. Integrate tools for actions like file search, code execution, or API calls. Retain memory for contextual learning. Operate autonomously across workflows. Real-World Use Cases Agentic AI unlocks new possibilities across industries: Expense Management: Automate claims and approvals. Employee Onboarding: Personalised learning paths and skills navigation. Customer Support: Intelligent assistants for FAQs and troubleshooting. Data Analytics: Interactive insights and reporting with Fabric agents. Multi-agent systems can coordinate complex tasks, with specialised agents handling subtasks under a central orchestrator. Getting Started with Microsoft Foundry Creating your first agent is simple: Sign in at https://ai.azure.com and create a Foundry project. Select a model (e.g., GPT-4.1 mini) and configure deployment options. Customise instructions to define your agent’s persona and tasks. Add tools like file search or code interpreter for extended functionality. Test and iterate using the agent playground, then export code to Visual Studio Code for deployment. For detailed guidance, explore the https://learn.microsoft.com/training. Follow the skilling plan for this series Plans | Microsoft Learn Get started with AI Agents https://aka.ms/ai-agents-fundamentals Join the Community Stay connected and keep learning: Discord: Engage with developers building agents. https://aka.ms/foundry/discord GitHub Discussions: Share ideas and troubleshoot. https://aka.ms/foundrydevs Office Hours: Get direct support from product teams. Final Thoughts Agentic AI is reshaping the way we work, enabling systems to act, learn, and collaborate. With Microsoft Foundry, developers have the tools to build secure, scalable, and intelligent agents today not tomorrow. Join the sessions at https://aka.ms/AzureSkilling-Ignite/25
Lee_Stott
Dec 10, 2025 Place Microsoft Developer Community Blog
662Views
0likes
0Comments