tips and tricks
764 TopicsMCP vs mcp-cli: Dynamic Tool Discovery for Token-Efficient AI Agents
Introduction The AI agent ecosystem is evolving rapidly, and with it comes a scaling challenge that many developers are hitting context window bloat. When building systems that integrate with multiple MCP (Model Context Protocol) servers, you're forced to load all tool definitions upfront—consuming thousands of tokens just to describe what tools could be available. mcp-cli: a lightweight tool that changes how we interact with MCP servers. But before diving into mcp-cli, it's essential to understand the foundational protocol itself, the design trade-offs between static and dynamic approaches, and how they differ fundamentally. Part 1: Understanding MCP (Model Context Protocol) What is MCP? The Model Context Protocol (MCP) is an open standard for connecting AI agents and applications to external tools, APIs, and data sources. Think of it as a universal interface that allows: AI Agents (Claude, Gemini, etc.) to discover and call tools Tool Providers to expose capabilities in a standardized way Seamless Integration between diverse systems without custom adapters New to MCP see https://aka.ms/mcp-for-beginners How MCP Works MCP operates on a simple premise: define tools with clear schemas and let clients discover and invoke them. Basic MCP Flow: Tool Provider (MCP Server) ↓ [Tool Definitions + Schemas] ↓ AI Agent / Client ↓ [Discover Tools] → [Invoke Tools] → [Get Results] Example: A GitHub MCP server exposes tools like: search_repositories - Search GitHub repos create_issue - Create a GitHub issue list_pull_requests - List open PRs Each tool comes with a JSON schema describing its parameters, types, and requirements. The Static Integration Problem Traditionally, MCP integration works like this: Startup: Load ALL tool definitions from all servers Context Window: Send every tool schema to the AI model Invocation: Model chooses which tool to call Execution: Tool is invoked and result returned The Problem: When you have multiple MCP servers, the token cost becomes substantial: Scenario Token Count 6 MCP Servers, 60 tools (static loading) ~47,000 tokens After dynamic discovery ~400 tokens Token Reduction 99% 🚀 For a production system with 10+ servers exposing 100+ tools, you're burning through thousands of tokens just describing capabilities, leaving less context for actual reasoning and problem-solving. Key Issues: ❌ Reduced effective context length for actual work ❌ More frequent context compactions ❌ Hard limits on simultaneous MCP servers ❌ Higher API costs Part 2: Enter mcp-cli – Dynamic Context Discovery What is mcp-cli? mcp-cli is a lightweight CLI tool (written in Bun, compiled to a single binary) that implements dynamic context discovery for MCP servers. Instead of loading everything upfront, it pulls in information only when needed. Static vs. Dynamic: The Paradigm Shift Traditional MCP (Static Context): AI Agent Says: "Load all tool definitions from all servers" ↓ Context Window Bloat ❌ ↓ Limited space for reasoning mcp-cli (Dynamic Discovery): AI Agent Says: "What servers exist?" ↓ mcp-cli responds AI Agent Says: "What are the params for tool X?" ↓ mcp-cli responds AI Agent Says: "Execute tool X" mcp-cli executes and responds Result: You only pay for information you actually use. ✅ Core Capabilities mcp-cli provides three primary commands: 1. Discover - What servers and tools exist? mcp-cli Lists all configured MCP servers and their tools. 2. Inspect - What does a specific tool do? mcp-cli info <server> <tool> Returns the full JSON schema for a tool (parameters, descriptions, types). 3. Execute - Run a tool mcp-cli call <server> <tool> '{"arg": "value"}' Executes the tool and returns results. Key Features of mcp-cli Feature Benefit Stdio & HTTP Support Works with both local and remote MCP servers Connection Pooling Lazy-spawn daemon avoids repeated startup overhead Tool Filtering Control which tools are available via allowedTools/disabledTools Glob Searching Find tools matching patterns: mcp-cli grep "*mail*" AI Agent Ready Designed for use in system instructions and agent skills Lightweight Single binary, minimal dependencies Part 3: Detailed Comparison Table Aspect Traditional MCP mcp-cli Protocol HTTP/REST or Stdio Stdio/HTTP (via CLI) Context Loading Static (upfront) Dynamic (on-demand) Tool Discovery All at once Lazy enumeration Schema Inspection Pre-loaded On-request Token Usage High (~47k for 60 tools) Low (~400 for 60 tools) Best For Direct server integration AI agent tool use Implementation Server-side focus CLI-side focus Complexity Medium Low (CLI handles it) Startup Time One call Multiple calls (optimized) Scaling Limited by context Unlimited (pay per use) Integration Custom implementation Pre-built mcp-cli Part 4: When to Use Each Approach Use Traditional MCP (HTTP Endpoints) when: ✅ Building a direct server integration ✅ You have few tools (< 10) and don't care about context waste ✅ You need full control over HTTP requests/responses ✅ You're building a specialized integration (not AI agents) ✅ Real-time synchronous calls are required Use mcp-cli when: ✅ Integrating with AI agents (Claude, Gemini, etc.) ✅ You have multiple MCP servers (> 2-3) ✅ Token efficiency is critical ✅ You want a standardized, battle-tested tool ✅ You prefer CLI-based automation ✅ Connection pooling and lazy loading are beneficial ✅ You're building agent skills or system instructions Conclusion MCP (Model Context Protocol) defines the standard for tool sharing and discovery. mcp-cli is the practical tool that makes MCP efficient for AI agents by implementing dynamic context discovery. The fundamental difference: MCP mcp-cli What The protocol standard The CLI tool Where Both server and client Client-side CLI Problem Solved Tool standardization Context bloat Architecture Protocol Implementation Think of it this way: MCP is the language, mcp-cli is the interpreter that speaks fluently. For AI agent systems, dynamic discovery via mcp-cli is becoming the standard. For direct integrations, traditional MCP HTTP endpoints work fine. The choice depends on your use case, but increasingly, the industry is trending toward mcp-cli for its efficiency and scalability. Resources MCP Specification mcp-cli GitHub New to MCP see https://aka.ms/mcp-for-beginners Practical demo: AnveshMS/mcp-cli-exampleComplete Guide to Deploying OpenClaw on Azure Windows 11 Virtual Machine
1. Introduction to OpenClaw OpenClaw is an open-source AI personal assistant platform that runs on your own devices and executes real-world tasks. Unlike traditional cloud-based AI assistants, OpenClaw emphasizes local deployment and privacy protection, giving you complete control over your data. Key Features of OpenClaw Cross-Platform Support: Runs on Windows, macOS, Linux, and other operating systems Multi-Channel Integration: Interact with AI through messaging platforms like WhatsApp, Telegram, and Discord Task Automation: Execute file operations, browser control, system commands, and more Persistent Memory: AI remembers your preferences and contextual information Flexible AI Backends: Supports multiple large language models including Anthropic Claude and OpenAI GPT OpenClaw is built on Node.js and can be quickly installed and deployed via npm. 2. Security Advantages of Running OpenClaw on Azure VM Deploying OpenClaw on an Azure virtual machine instead of your personal computer offers significant security benefits: 1. Environment Isolation Azure VMs provide a completely isolated runtime environment. Even if the AI agent exhibits abnormal behavior or is maliciously exploited, it won't affect your personal computer or local data. This isolation mechanism forms the foundation of a zero-trust security architecture. 2. Network Security Controls Through Azure Network Security Groups (NSGs), you can precisely control which IP addresses can access your virtual machine. The RDP rules configured in the deployment script allow you to securely connect to your Windows 11 VM via Remote Desktop while enabling further restrictions on access sources. 3. Data Persistence and Backup Azure VM managed disks support automatic snapshots and backups. Even if the virtual machine encounters issues, your OpenClaw configuration and data remain safe. 4. Elastic Resource Management You can adjust VM specifications (memory, CPU) at any time based on actual needs, or stop the VM when not in use to save costs, maintaining maximum flexibility. 5. Enterprise-Grade Authentication Azure supports integration with Azure Active Directory (Entra ID) for identity verification, allowing you to assign different access permissions to team members for granular access control. 6. Audit and Compliance Azure provides detailed activity logs and audit trails, making it easy to trace any suspicious activity and meet enterprise compliance requirements. 3. Deployment Steps Explained This deployment script uses Azure CLI to automate the installation of OpenClaw and its dependencies on a Windows 11 virtual machine. Here are the detailed execution steps: Prerequisites Before running the script, ensure you have: Install Azure CLI # Windows users can download the MSI installer https://aka.ms/installazurecliwindows # macOS users brew install azure-cli # Linux users curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash 2. Log in to Azure Account az login 3. Prepare Deployment Script Save the provided deploy-windows11-vm.sh script locally and grant execute permissions: chmod +x deploy-windows11-vm.sh Step 1: Configure Deployment Parameters The script begins by defining key configuration variables that you can modify as needed: RESOURCE_GROUP="Your Azure Resource Group Name" # Resource group name VM_NAME="win11-openclaw-vm" # Virtual machine name LOCATION="Your Azure Regison Name" # Azure region ADMIN_USERNAME="Your Azure VM Administrator Name" # Administrator username ADMIN_PASSWORD="our Azure VM Administrator Password" # Administrator password (change to a strong password) VM_SIZE="Your Azure VM Size" # VM size (4GB memory) Security Recommendations: Always change ADMIN_PASSWORD to your own strong password Passwords should contain uppercase and lowercase letters, numbers, and special characters Never commit scripts containing real passwords to code repositories Step 2: Check and Create Resource Group The script first checks if the specified resource group exists, and creates it automatically if it doesn't: echo "Checking resource group $RESOURCE_GROUP..." az group show --name $RESOURCE_GROUP &> /dev/null if [ $? -ne 0 ]; then echo "Creating resource group $RESOURCE_GROUP..." az group create --name $RESOURCE_GROUP --location $LOCATION fi A resource group is a logical container in Azure used to organize and manage related resources. All associated resources (VMs, networks, storage, etc.) will be created within this resource group. Step 3: Create Windows 11 Virtual Machine This is the core step, using the az vm create command to create a Windows 11 Pro virtual machine: az vm create \ --resource-group $RESOURCE_GROUP \ --name $VM_NAME \ --image MicrosoftWindowsDesktop:windows-11:win11-24h2-pro:latest \ --size $VM_SIZE \ --admin-username $ADMIN_USERNAME \ --admin-password $ADMIN_PASSWORD \ --public-ip-sku Standard \ --nsg-rule RDP Parameter Explanations: --image: Uses the latest Windows 11 24H2 Professional edition image --size: Standard_B2s provides 2 vCPUs and 4GB memory, suitable for running OpenClaw --public-ip-sku Standard: Assigns a standard public IP --nsg-rule RDP: Automatically creates network security group rules allowing RDP (port 3389) inbound traffic Step 4: Retrieve Virtual Machine Public IP After VM creation completes, the script retrieves its public IP address: PUBLIC_IP=$(az vm show -d -g $RESOURCE_GROUP -n $VM_NAME --query publicIps -o tsv) echo "VM Public IP: $PUBLIC_IP" This IP address will be used for subsequent RDP remote connections. Step 5: Install Chocolatey Package Manager Using az vm run-command to execute PowerShell scripts inside the VM, first installing Chocolatey: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString( 'https://community.chocolatey.org/install.ps1'))" Chocolatey is a package manager for Windows, similar to apt or yum on Linux, simplifying subsequent software installations. Step 6: Install Git Git is a dependency for many npm packages, especially those that need to download source code from GitHub for compilation: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install git -y" Step 7: Install CMake and Visual Studio Build Tools Some of OpenClaw's native modules require compilation, necessitating the installation of C++ build toolchain: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install cmake visualstudio2022buildtools visualstudio2022-workload-vctools -y" Component Descriptions: cmake: Cross-platform build system visualstudio2022buildtools: VS 2022 Build Tools visualstudio2022-workload-vctools: C++ development toolchain Step 8: Install Node.js LTS Install the Node.js Long Term Support version, which is the core runtime environment for OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); C:\ProgramData\chocolatey\bin\choco.exe install nodejs-lts -y" The script refreshes environment variables first to ensure Chocolatey is in the PATH, then installs Node.js LTS. Step 9: Globally Install OpenClaw Use npm to globally install OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); npm install -g openclaw" Global installation makes the openclaw command available from anywhere in the system. Step 10: Configure Environment Variables Add Node.js and npm global paths to the system PATH environment variable: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts " $npmGlobalPath = 'C:\Program Files\nodejs'; $npmUserPath = [System.Environment]::GetFolderPath('ApplicationData') + '\npm'; $currentPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine'); if ($currentPath -notlike \"*$npmGlobalPath*\") { $newPath = $currentPath + ';' + $npmGlobalPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added Node.js path to system PATH'; } if ($currentPath -notlike \"*$npmUserPath*\") { $newPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine') + ';' + $npmUserPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added npm global path to system PATH'; } Write-Host 'Environment variables updated successfully!'; " This ensures that node, npm, and openclaw commands can be used directly even in new terminal sessions. Step 11: Verify Installation The script finally verifies that all software is correctly installed: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); Write-Host 'Node.js version:'; node --version; Write-Host 'npm version:'; npm --version; Write-Host 'openclaw:'; npm list -g openclaw" Successful output should look similar to: Node.js version: v20.x.x npm version: 10.x.x openclaw: openclaw@x.x.x Step 12: Connect to Virtual Machine After deployment completes, the script outputs connection information: ============================================ Deployment completed! ============================================ Resource Group: Your Azure Resource Group Name VM Name: win11-openclaw-vm Public IP: xx.xx.xx.xx Admin Username: Your Administrator UserName VM Size: Your VM Size Connect via RDP: mstsc /v:xx.xx.xx.xx ============================================ Connection Methods: Windows Users: Press Win + R to open Run dialog Enter mstsc /v:public_ip and press Enter Log in using the username and password set in the script macOS Users: Download "Windows App" from the App Store Add PC connection with the public IP Log in using the username and password set in the script Linux Users: # Use Remmina or xfreerdp xfreerdp /u:username /v:public_ip Step 13: Initialize OpenClaw After connecting to the VM, run the following in PowerShell or Command Prompt # Initialize OpenClaw openclaw onboard # Configure AI model API key # Edit configuration file: C:\Users\username\.openclaw\openclaw.json notepad $env:USERPROFILE\.openclaw\openclaw.json Add your AI API key in the configuration file: { "agents": { "defaults": { "model": "Your Model Name", "apiKey": "your-api-key-here" } } } Step 14: Start OpenClaw # Start Gateway service openclaw gateway # In another terminal, connect messaging channels (e.g., WhatsApp) openclaw channels login Follow the prompts to scan the QR code and connect OpenClaw to your messaging app. 4. Summary Through this guide, we've successfully implemented the complete process of automatically deploying OpenClaw on an Azure Windows 11 virtual machine. The entire deployment process is highly automated, completing everything from VM creation to installing all dependencies and OpenClaw itself through a single script. Key Takeaways Automation Benefits: Using az vm run-command allows executing configuration scripts immediately after VM creation without manual RDP login Dependency Management: Chocolatey simplifies the Windows package installation workflow Environment Isolation: Running AI agents on cloud VMs protects local computers and data Scalability: Scripted deployment facilitates replication and team collaboration, easily deploying multiple instances Cost Optimization Tips Standard_B2s VMs cost approximately $0.05/hour (~$37/month) on pay-as-you-go pricing When not in use, stop the VM to only pay for storage costs Consider Azure Reserved Instances to save up to 72% Security Hardening Recommendations Change Default Port: Modify RDP port from 3389 to a custom port Enable JIT Access: Use Azure Security Center's just-in-time access feature Configure Firewall Rules: Only allow specific IP addresses to access Regular System Updates: Enable automatic Windows Updates Use Azure Key Vault: Store API keys in Key Vault instead of configuration files 5. Additional Resources Official Documentation OpenClaw Website: https://openclaw.ai OpenClaw GitHub: https://github.com/openclaw/openclaw OpenClaw Documentation: https://docs.openclaw.ai Azure CLI Documentation: https://docs.microsoft.com/cli/azure/ Azure Resources Azure VM Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ Azure Free Account: https://azure.microsoft.com/free/ (new users receive $200 credit) Azure Security Center: https://azure.microsoft.com/services/security-center/ Azure Key Vault: https://azure.microsoft.com/services/key-vault/7.6KViews3likes3CommentsPhi-4-Reasoning-Vision-15B: Use Cases In-Depth
Phi-4-Reasoning-vision-15B is Microsoft's latest vision reasoning model released on Microsoft Foundry. It combines high-resolution visual perception with selective, task-aware reasoning, making it the first model in the Phi-4 family to simultaneously achieve both "seeing clearly" and "thinking deeply" as a small language model (SLM). Traditional vision models only perform passive perception — recognizing "what's in" an image. Phi-4-Reasoning-Vision-15B goes further by performing structured, multi-step reasoning: understanding visual structure in images, connecting it with textual context, and reaching actionable conclusions. This enables developers to build intelligent applications ranging from chart analysis to GUI automation. Core Design Features 2.1 Selective Reasoning The model's most critical design feature is its hybrid reasoning behavior. It can switch between "reasoning mode" and "non-reasoning mode" based on the prompt: When deep reasoning is needed (e.g., math problems, logical analysis) → Multi-step reasoning chain is activated When fast perception is sufficient (e.g., OCR, element localization) → Direct output with reduced latency 2.2 Three Thinking Modes (from Notebook Examples) Developers can precisely control reasoning behavior via the thinking_mode parameter: Mode Trigger Description Best For hybrid (Mixed) Default Model autonomously decides whether deep reasoning is needed General use, balancing speed and accuracy think (Deep Thinking) Appends <think> token Forces full reasoning chain Complex math / science / logic problems nothink (Fast Response) Appends <nothink> token Skips reasoning chain, outputs directly Low-latency perception tasks, simple Q&A The corresponding code implementation: def run_inference(processor, model, prompt, image, thinking_mode="hybrid"): ## FORM MESSAGE AND LOAD IMAGE messages = [ { "role": "user", "content": prompt, } ] ## PROCESS INPUTS prompt = processor.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, return_dict=False, ) if thinking_mode == "think": prompt = str(prompt) + "<think>" elif thinking_mode == "nothink": prompt = str(prompt) + "<|dummy_84|>" print(f"Prompt: {prompt}") inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device) ## GENERATE RESPONSE output_ids = model.generate( **inputs, max_new_tokens=1024, temperature=None, top_p=None, do_sample=False, use_cache=False, ) ## DECODE RESPONSE sequence_length = inputs["input_ids"].shape[1] sequence_length -= 1 if thinking_mode == "think" else 0 # remove the extra token for nothink mode new_output_ids = output_ids[:, sequence_length:] model_output = processor.batch_decode( new_output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False )[0] return model_output This design allows developers to dynamically balance latency and accuracy at runtime — essential for real-time interactive applications. Key Use Cases Use Case 1: GUI Agents (Computer Use Agents) This is one of the model's most important application areas.The model receives a screenshot and a natural language instruction, then outputs the normalized bounding box coordinates for the target UI element. The Notebook also provides a plot_boxes() visualization function that compares model predictions (red box) against ground truth annotations (green box). Real-World Example — E-Commerce Shopping Agent: As described in the official documentation, in retail scenarios the model serves as the perception layer for computer-use agents: Screen comprehension: Identifies products, prices, filters, promotions, buttons, and cart states Grounded output: Produces actionable coordinates for upstream agent models (e.g., Fara-7B) to execute clicks, scrolls, and other interactions Real-time decision support: Compact model size and low-latency inference, suitable for navigating dense product listings and comparing options Use Case 2: Mathematical and Scientific Visual Reasoning Typical applications: Interpreting geometric figures and function graphs for problem-solving Analyzing scientific experiment diagrams and data charts Education: Students photograph and upload problems; the model shows the complete reasoning process and solution steps Use Case 3: Document, Chart, and Table Understanding Typical applications: IT Operations: Interpreting monitoring dashboards, performance charts, and incident reports to assist diagnosis and decision-making Financial Analysis: Extracting metrics from report screenshots and interpreting trends Enterprise Report Automation: Processing scanned documents and tables to generate structured summaries Samples 1. Using Phi-4-Reasoning-Vision-15B to detect jaywalking Go to - Sample Code 2. Using Phi-4-Reasoning-Vision-15B to math Go to - Sample Code 3. Using Phi-4-Reasoning-Vision-15B for GUI Agent Go to - Sample Code Model Comparison at a Glance Below is a comparison of Phi-4-Reasoning-Vision-15B against comparable models on key tasks: No Thinking Mode Thinking Mode Phi-4-Reasoning-Vision-15B shows clear advantages in math reasoning and GUI grounding tasks while remaining competitive in general multimodal understanding. Summary Phi-4-Reasoning-Vision-15B represents a significant milestone for small vision reasoning models: Sees clearly: High-resolution visual perception supporting documents, charts, UI screenshots, and more Thinks deeply: Selective multi-step reasoning chains that rival larger models on complex tasks Runs fast: 15B parameters + NoThink mode, suitable for real-time interactive applications Adapts flexibly: Three thinking modes switchable on the fly, letting developers dynamically balance accuracy and latency at runtime Whether building e-commerce shopping agents, IT operations assistants, or educational tutoring tools, this model provides a complete capability chain from "seeing" to "understanding" to "acting." Resources 1. Read official Blog - Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model 2. Learn more about Phi-4-reasoning-vision in Huggingface - https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B 3. Learn more about Microsoft Phi Family - Microsoft Phi CookBook335Views0likes0CommentsCopilot, Excel and photos
We have a number of networking devices, all the same type, that we are deploying within an office. To speed up asset management, engineers are putting a label on the back under the MAC and serial numbers then taking a photo so it can be documented later by admin staff. Through Excel I've tried with a single photo and multiple photos to extract the MAC details successfully and put them in to cells at the same time. However, this doesn't tell us which device it is as it doesn't process the photos in any order. Therefore my next step is to be able to capture the label info we have put on and tie this info together with the serial number each time so its all from the same equipment. Is it possible to do this either one photo at a time or across multiple photos? TIA26Views0likes0CommentsCopilot Makes Branding Easy! New PowerPoint Brand Images Feature
🚀 New PowerPoint Feature: Brand Images + Copilot Enhances Brand Consistency! Microsoft has just released a powerful new capability in PowerPoint: Brand Images, allowing users to insert company‑approved logos, icons, and photos directly from within the app. No more searching through folders, Teams chats, or old emails to find the “right” logo. With the help of Microsoft 365 Copilot, users can now build professional, brand‑consistent presentations faster than ever. 📌 What this means for organizations: Centralized, controlled access to brand assets Ensures up‑to‑date logos and visuals Eliminates incorrect or outdated branding Seamless user experience inside PowerPoint To enable this feature, admins must configure the Organization Assets Library (OAL) in SharePoint and upload approved brand assets. Adding metadata and tags greatly improves search and filtering. I’ve created a full video breaking down everything you need to know: 👉 Feature overview 👉 Admin setup steps 👉 Best practices for asset organization 👉 How Copilot enhances the experience 🎥 Watch the full video here: https://youtu.be/Uv8JpHCvgg0 #Microsoft365 #PowerPoint #Copilot #BrandManagement #M365Admin #SharePoint #Productivity #ModernWorkplace #MicrosoftUpdates #GiulianoDeLuca280Views1like2CommentsMicrosoft 365, Copilot & Copilot Studio News
February News Roundup for Technical and Business Leaders (2026) February 2026 was a significant month for enterprise AI across Microsoft 365, Microsoft Copilot, and Microsoft Copilot Studio. From large-scale enterprise deployments to governance updates and expanded model flexibility, Microsoft continues transitioning from AI experimentation to enterprise AI at scale. Here’s your strategic and technical breakdown of what mattered most this month. https://dellenny.com/microsoft-365-copilot-copilot-studio-news/74Views0likes0CommentsOptimising AI Costs with Microsoft Foundry Model Router
Microsoft Foundry Model Router analyses each prompt in real-time and forwards it to the most appropriate LLM from a pool of underlying models. Simple requests go to fast, cheap models; complex requests go to premium ones, all automatically. I built an interactive demo app so you can see the routing decisions, measure latencies, and compare costs yourself. This post walks through how it works, what we measured, and when it makes sense to use. The Problem: One Model for Everything Is Wasteful Traditional deployments force a single choice: Strategy Upside Downside Use a small model Fast, cheap Struggles with complex tasks Use a large model Handles everything Overpay for simple tasks Build your own router Full control Maintenance burden; hard to optimise Most production workloads are mixed-complexity. Classification, FAQ look-ups, and data extraction sit alongside code analysis, multi-constraint planning, and long-document summarisation. Paying premium-model prices for the simple 40% is money left on the table. The Solution: Model Router Model Router is a trained language model deployed as a single Azure endpoint. For each incoming request it: Analyses the prompt — complexity, task type, context length Selects an underlying model from the routing pool Forwards the request and returns the response Exposes the choice via the response.model field You interact with one deployment. No if/else routing logic in your code. Routing Modes Mode Goal Trade-off Balanced (default) Best cost-quality ratio General-purpose Cost Minimise spend May use smaller models more aggressively Quality Maximise accuracy Higher cost for complex tasks Modes are configured in the Foundry Portal, no code change needed to switch. Building the Demo To make routing decisions tangible, we built a React + TypeScript app that sends the same prompt through both Model Router and a fixed standard deployment (e.g. GPT-5-nano), then compares: Which model the router selected Latency (ms) Token usage (prompt + completion) Estimated cost (based on per-model pricing) Select a prompt, choose a routing mode, and hit Run Both to compare side-by-side What You Can Do 10 pre-built prompts spanning simple classification to complex multi-constraint planning Custom prompt input enter any text and benchmarks run automatically Three routing modes switch and re-run to see how distribution changes Batch mode run all 10 prompts in one click to gather aggregate stats API Integration The integration is a standard Azure OpenAI chat completion call. The only difference is the deployment name ( model-router instead of a specific model): const response = await fetch( `${endpoint}/openai/deployments/model-router/chat/completions?api-version=2024-10-21`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ messages: [{ role: 'user', content: prompt }], max_completion_tokens: 1024, }), } ); const data = await response.json(); // The key insight: response.model reveals the underlying model const selectedModel = data.model; // e.g. "gpt-5-nano-2025-08-07" That data.model field is what makes cost tracking and distribution analysis possible. Results: What the Data Shows We ran all 10 prompts through both Model Router (Balanced mode) and a fixed standard deployment. Note: Results vary by run, region, model versions, and Azure load. These numbers are from a representative sample run. Side-by-side comparison across all 10 prompts in Balanced mode Summary Metric Router (Balanced) Standard (GPT-5-nano) Avg Latency ~7,800 ms ~7,700 ms Total Cost (10 prompts) ~$0.029 ~$0.030 Cost Savings ~4.5% — Models Used 4 1 Model Distribution The router used 4 different models across 10 prompts: Model Requests Share Typical Use gpt-5-nano 5 50% Classification, summarisation, planning gpt-5-mini 2 20% FAQ answers, data extraction gpt-oss-120b 2 20% Long-context analysis, creative tasks gpt-4.1-mini 1 10% Complex debugging & reasoning Routing distribution chart — the router favours efficient models for simple prompts Across All Three Modes Metric Balanced Cost-Optimised Quality-Optimised Cost Savings ~4.5% ~4.7% ~14.2% Avg Latency (Router) ~7,800 ms ~7,800 ms ~6,800 ms Avg Latency (Standard) ~7,700 ms ~7,300 ms ~8,300 ms Primary Goal Balance cost + quality Minimise spend Maximise accuracy Model Selection Mixed (4 models) Prefers cheaper Prefers premium Cost-optimised mode — routes more aggressively to nano/mini models Quality-optimised mode — routes to larger models for complex tasks Analysis What Worked Well Intelligent distribution The router didn't just default to one model. It used 4 different models and mapped prompt complexity to model capability: simple classification → nano, FAQ answers → mini, long-context documents → oss-120b, complex debugging → 4.1-mini. Measurable cost savings across all modes 4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode. Quality mode was the surprise winner by choosing faster, cheaper models for simple prompts, it actually saved the most while still routing complex requests to capable models. Zero routing logic in application code One endpoint, one deployment name. The complexity lives in Azure's infrastructure, not yours. Operational flexibility Switch between Balanced, Cost, and Quality modes in the Foundry Portal without redeploying your app. Need to cut costs for a high-traffic period? Switch to Cost mode. Need accuracy for a compliance run? Switch to Quality. Future-proofing As Azure adds new models to the routing pool, your deployment benefits automatically. No code changes needed. Trade-offs to Consider Latency is comparable, not always faster In Balanced mode, Router averaged ~7,800 ms vs Standard's ~7,700 ms nearly identical. In Quality mode, the Router was actually faster (~6,800 ms vs ~8,300 ms) because it chose more efficient models for simple prompts. The delta depends on which models the router selects. Savings scale with workload diversity Our 10-prompt test set showed 4.5–14.2% savings. Production workloads with a wider spread of simple vs complex prompts should see larger savings, since the router has more opportunity to route simple requests to cheaper models. Opaque routing decisions You can see which model was picked via response.model , but you can't see why. For most applications this is fine; for debugging edge cases you may want to test specific prompts in the demo first. Custom Prompt Testing One of the most practical features of the demo is testing your own prompts before committing to Model Router in production. Enter any prompt `the quantum computing example is a medium-complexity educational prompt` Benchmarks execute automatically, showing the selected model, latency, tokens, and cost Workflow: Click ✏️ Custom in the prompt selector Enter your production-representative prompt Click ✓ Use This Prompt — Router and Standard run automatically Compare results — repeat with different routing modes Use the data to inform your deployment strategy This lets you predict costs and validate routing behaviour with your actual workload before going to production. When to Use Model Router Great Fit Mixed-complexity workloads — chatbots, customer service, content pipelines Cost-sensitive deployments — where even single-digit percentage savings matter at scale Teams wanting simplicity — one endpoint beats managing multi-model routing logic Rapid experimentation — try new models without changing application code Consider Carefully Ultra-low-latency requirements — if you need sub-second responses, the routing overhead matters Single-task, single-model workloads — if one model is clearly optimal for 100% of your traffic, a router adds complexity without benefit Full control over model selection — if you need deterministic model choice per request Mode Selection Guide Is accuracy critical (compliance, legal, medical)? Is accuracy critical (compliance, legal, medical)? └─ YES → Quality-Optimised └─ NO → Strict budget constraints? └─ YES → Cost-Optimised └─ NO → Balanced (recommended) Best Practices Start with Balanced mode — measure actual results, then optimise Test with your real prompts — use the Custom Prompt feature to validate routing before production Monitor model distribution — track which models handle your traffic over time Compare against a baseline — always keep a standard deployment to measure savings Review regularly — as new models enter the routing pool, distributions shift Technical Stack Technology Purpose React 19 + TypeScript 5.9 UI and type safety Vite 7 Dev server and build tool Tailwind CSS 4 Styling Recharts 3 Distribution and comparison charts Azure OpenAI API (2024-10-21) Model Router and standard completions Security measures include an ErrorBoundary for crash resilience, sanitised API error messages, AbortController request timeouts, input length validation, and restrictive security headers. API keys are loaded from environment variables and gitignored. Source: leestott/router-demo-app: An interactive web application demonstrating the power of Microsoft Foundry Model Router - an intelligent routing system that automatically selects the optimal language model for each request based on complexity, reasoning requirements, and task type. ⚠️ This demo calls Azure OpenAI directly from the browser. This is fine for local development. For production, proxy through a backend and use Managed Identity. Try It Yourself Quick Start git clone https://github.com/leestott/router-demo-app/ cd router-demo-app # Option A: Use the setup script (recommended) # Windows: .\setup.ps1 -StartDev # macOS/Linux: chmod +x setup.sh && ./setup.sh --start-dev # Option B: Manual npm install cp .env.example .env.local # Edit .env.local with your Azure credentials npm run dev Open http://localhost:5173 , select a prompt, and click ⚡ Run Both. Get Your Credentials Go to ai.azure.com → open your project Copy the Project connection string (endpoint URL) Navigate to Deployments → confirm model-router is deployed Get your API key from Project Settings → Keys Configuration Edit .env.local : VITE_ROUTER_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_ROUTER_API_KEY=your-api-key VITE_ROUTER_DEPLOYMENT=model-router VITE_STANDARD_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_STANDARD_API_KEY=your-api-key VITE_STANDARD_DEPLOYMENT=gpt-5-nano Ideas for Enhancement Historical analysis — persist results to track routing trends over time Cost projections — estimate monthly spend based on prompt patterns and volume A/B testing framework — compare modes with statistical significance Streaming support — show model selection for streaming responses Export reports — download benchmark data as CSV/JSON for further analysis Conclusion Model Router addresses a real problem: most AI workloads have mixed complexity, but most deployments use a single model. By routing each request to the right model automatically, you get: Cost savings (~4.5–14.2% measured across modes, scaling with volume) Intelligent distribution (4 models used, zero routing code) Operational simplicity (one endpoint, mode changes via portal) Future-proofing (new models added to the pool automatically) The latency trade-off is minimal — in Quality mode, the Router was actually faster than the standard deployment. The real value is flexibility: tune for cost, quality, or balance without touching your code. Ready to try it? Clone the demo repository, plug in your Azure credentials, and test with your own prompts. Resources Model Router Benchmark Sample Sample App Model Router Concepts Official documentation Model Router How-To Deployment guide Microsoft Foundry Portal Deploy and manage Model Router in the Catalog Model listing Azure OpenAI Managed Identity Production auth Built to explore Model Router and share findings with the developer community. Feedback and contributions welcome, open an issue or PR on GitHub.Exploring Azure Face API: Facial Landmark Detection and Real-Time Analysis with C#
In today’s world, applications that understand and respond to human facial cues are no longer science fiction—they’re becoming a reality in domains like security, driver monitoring, gaming, and AR/VR. With Azure Face API, developers can leverage powerful cloud-based facial recognition and analysis tools without building complex machine learning models from scratch. In this blog, we’ll explore how to use C# to detect faces, identify key facial landmarks, estimate head pose, track eye and mouth movements, and process real-time video streams. Using OpenCV for visualization, we’ll show how to overlay landmarks, draw bounding boxes, and calculate metrics like Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR)—all in real time. You'll learn to: Set up Azure Face API Detect 27 facial landmarks Estimate head pose (yaw, pitch, roll) Calculate eye aspect ratio (EAR) and mouth openness Draw bounding boxes around features using OpenCV Process real-time video Prerequisites .NET 8 SDK installed Azure subscription with Face API resource Visual Studio 2022 or later Webcam for testing (optional) Basic understanding of C# and computer vision concepts Part 1: Azure Face API Setup 1.1 Install Required NuGet Packages dotnet add package Azure.AI.Vision.Face dotnet add package OpenCvSharp4 dotnet add package OpenCvSharp4.runtime.win 1.2 Create Azure Face API Resource Navigate to Azure Portal Search for "Face" and create a new Face API resource Choose your pricing tier (Free tier: 20 calls/min, 30K calls/month) Copy the Endpoint URL and API Key 1.3 Configure in .NET Application appsettings.json: { "Azure": { "FaceApi": { "Endpoint": "https://your-resource.cognitiveservices.azure.com/", "ApiKey": "your-api-key-here" } } } Initialize Face Client: using Azure; using Azure.AI.Vision.Face; using Microsoft.Extensions.Configuration; public class FaceAnalysisService { private readonly FaceClient _faceClient; private readonly ILogger<FaceAnalysisService> _logger; public FaceAnalysisService(ILogger<FaceAnalysisService> logger, IConfiguration configuration) { _logger = logger; string endpoint = configuration["Azure:FaceApi:Endpoint"]; string apiKey = configuration["Azure:FaceApi:ApiKey"]; _faceClient = new FaceClient(new Uri(endpoint), new AzureKeyCredential(apiKey)); _logger.LogInformation("FaceClient initialized with endpoint: {Endpoint}", endpoint); } } Part 2: Understanding Face Detection Models 2.1 Basic Face Detection public async Task<List<FaceDetectionResult>> DetectFacesAsync(byte[] imageBytes) { using var stream = new MemoryStream(imageBytes); var response = await _faceClient.DetectAsync( BinaryData.FromStream(stream), FaceDetectionModel.Detection03, FaceRecognitionModel.Recognition04, returnFaceId: false, returnFaceAttributes: new FaceAttributeType[] { FaceAttributeType.HeadPose }, returnFaceLandmarks: true, returnRecognitionModel: false ); _logger.LogInformation("Detected {Count} faces", response.Value.Count); return response.Value.ToList(); } Part 3: Facial Landmarks - The 27 Key Points 3.1 Understanding Facial Landmarks 3.2 Accessing Landmarks in Code public void PrintLandmarks(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; if (landmarks == null) { _logger.LogWarning("No landmarks detected"); return; } // Eye landmarks Console.WriteLine($"Left Eye Outer: ({landmarks.EyeLeftOuter.X}, {landmarks.EyeLeftOuter.Y})"); Console.WriteLine($"Left Eye Inner: ({landmarks.EyeLeftInner.X}, {landmarks.EyeLeftInner.Y})"); Console.WriteLine($"Left Eye Top: ({landmarks.EyeLeftTop.X}, {landmarks.EyeLeftTop.Y})"); Console.WriteLine($"Left Eye Bottom: ({landmarks.EyeLeftBottom.X}, {landmarks.EyeLeftBottom.Y})"); // Mouth landmarks Console.WriteLine($"Upper Lip Top: ({landmarks.UpperLipTop.X}, {landmarks.UpperLipTop.Y})"); Console.WriteLine($"Under Lip Bottom: ({landmarks.UnderLipBottom.X}, {landmarks.UnderLipBottom.Y})"); // Nose landmarks Console.WriteLine($"Nose Tip: ({landmarks.NoseTip.X}, {landmarks.NoseTip.Y})"); } 3.3 Visualizing All Landmarks public void DrawAllLandmarks(FaceLandmarks landmarks, Mat frame) { void DrawPoint(FaceLandmarkCoordinate point, Scalar color) { if (point != null) { Cv2.Circle(frame, new Point((int)point.X, (int)point.Y), radius: 3, color: color, thickness: -1); } } // Eyes (Green) DrawPoint(landmarks.EyeLeftOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeLeftBottom, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightOuter, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightInner, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightTop, new Scalar(0, 255, 0)); DrawPoint(landmarks.EyeRightBottom, new Scalar(0, 255, 0)); // Eyebrows (Cyan) DrawPoint(landmarks.EyebrowLeftOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowLeftInner, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightOuter, new Scalar(255, 255, 0)); DrawPoint(landmarks.EyebrowRightInner, new Scalar(255, 255, 0)); // Nose (Yellow) DrawPoint(landmarks.NoseTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootLeft, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRootRight, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseLeftAlarOutTip, new Scalar(0, 255, 255)); DrawPoint(landmarks.NoseRightAlarOutTip, new Scalar(0, 255, 255)); // Mouth (Blue) DrawPoint(landmarks.UpperLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UpperLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipTop, new Scalar(255, 0, 0)); DrawPoint(landmarks.UnderLipBottom, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthLeft, new Scalar(255, 0, 0)); DrawPoint(landmarks.MouthRight, new Scalar(255, 0, 0)); // Pupils (Red) DrawPoint(landmarks.PupilLeft, new Scalar(0, 0, 255)); DrawPoint(landmarks.PupilRight, new Scalar(0, 0, 255)); } Part 4: Drawing Bounding Boxes Around Features 4.1 Eye Bounding Boxes /// <summary> /// Draws rectangles around eyes using OpenCV. /// </summary> public void DrawEyeBoxes(FaceLandmarks landmarks, Mat frame) { int boxWidth = 60; int boxHeight = 35; // Calculate Rectangles var leftEyeRect = new Rect((int)landmarks.EyeLeftOuter.X - boxWidth / 2, (int)landmarks.EyeLeftOuter.Y - boxHeight / 2, boxWidth, boxHeight); var rightEyeRect = new Rect((int)landmarks.EyeRightOuter.X - boxWidth / 2, (int)landmarks.EyeRightOuter.Y - boxHeight / 2, boxWidth, boxHeight); // Draw Rectangles (Green in BGR) Cv2.Rectangle(frame, leftEyeRect, new Scalar(0, 255, 0), 2); Cv2.Rectangle(frame, rightEyeRect, new Scalar(0, 255, 0), 2); // Add Labels Cv2.PutText(frame, "Left Eye", new Point(leftEyeRect.X, leftEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); Cv2.PutText(frame, "Right Eye", new Point(rightEyeRect.X, rightEyeRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(0, 255, 0), 1); } 4.2 Mouth Bounding Box /// <summary> /// Draws rectangle around mouth region. /// </summary> public void DrawMouthBox(FaceLandmarks landmarks, Mat frame) { int boxWidth = 80; int boxHeight = 50; // Calculate center based on the vertical lip landmarks int centerX = (int)((landmarks.UpperLipTop.X + landmarks.UnderLipBottom.X) / 2); int centerY = (int)((landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2); var mouthRect = new Rect(centerX - boxWidth / 2, centerY - boxHeight / 2, boxWidth, boxHeight); // Draw Mouth Box (Blue in BGR) Cv2.Rectangle(frame, mouthRect, new Scalar(255, 0, 0), 2); // Add Label Cv2.PutText(frame, "Mouth", new Point(mouthRect.X, mouthRect.Y - 5), HersheyFonts.HersheySimplex, 0.4, new Scalar(255, 0, 0), 1); } 4.3 Face Bounding Box /// <summary> /// Draws rectangle around entire face using the face rectangle from API. /// </summary> public void DrawFaceBox(FaceDetectionResult face, Mat frame) { var faceRect = face.FaceRectangle; if (faceRect == null) { return; } var rect = new Rect( faceRect.Left, faceRect.Top, faceRect.Width, faceRect.Height ); // Draw Face Bounding Box (Red in BGR) Cv2.Rectangle(frame, rect, new Scalar(0, 0, 255), 2); // Add Label with dimensions Cv2.PutText(frame, $"Face {faceRect.Width}x{faceRect.Height}", new Point(rect.X, rect.Y - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(0, 0, 255), 2); } 4.4 Nose Bounding Box /// <summary> /// Draws bounding box around nose using nose landmarks. /// </summary> public void DrawNoseBox(FaceLandmarks landmarks, Mat frame) { // Calculate horizontal bounds from Alar tips int minX = (int)Math.Min(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); int maxX = (int)Math.Max(landmarks.NoseLeftAlarOutTip.X, landmarks.NoseRightAlarOutTip.X); // Calculate vertical bounds from Root to Tip int minY = (int)Math.Min(landmarks.NoseRootLeft.Y, landmarks.NoseTip.Y); int maxY = (int)landmarks.NoseTip.Y; // Create Rect with a 10px padding buffer var noseRect = new Rect( minX - 10, minY - 10, (maxX - minX) + 20, (maxY - minY) + 20 ); // Draw Nose Box (Yellow in BGR) Cv2.Rectangle(frame, noseRect, new Scalar(0, 255, 255), 2); } Part 5: Geometric Calculations with Landmarks 5.1 Calculating Euclidean Distance /// <summary> /// Calculates distance between two landmark points. /// </summary> public static double CalculateDistance(dynamic point1, dynamic point2) { double dx = point1.X - point2.X; double dy = point1.Y - point2.Y; return Math.Sqrt(dx * dx + dy * dy); } 5.2 Eye Aspect Ratio (EAR) Formula /// <summary> /// Calculates the Eye Aspect Ratio (EAR) to detect eye closure. /// </summary> public double CalculateEAR( FaceLandmarkCoordinate top1, FaceLandmarkCoordinate top2, FaceLandmarkCoordinate bottom1, FaceLandmarkCoordinate bottom2, FaceLandmarkCoordinate inner, FaceLandmarkCoordinate outer) { // Vertical distances double v1 = CalculateDistance(top1, bottom1); double v2 = CalculateDistance(top2, bottom2); // Horizontal distance double h = CalculateDistance(inner, outer); // EAR formula: (||p2-p6|| + ||p3-p5||) / (2 * ||p1-p4||) return (v1 + v2) / (2.0 * h); } Simplified Implementation: /// <summary> /// Calculates Eye Aspect Ratio (EAR) for a single eye. /// Reference: "Real-Time Eye Blink Detection using Facial Landmarks" (Soukupová & Čech, 2016) /// </summary> public double ComputeEAR(FaceLandmarks landmarks, bool isLeftEye) { var top = isLeftEye ? landmarks.EyeLeftTop : landmarks.EyeRightTop; var bottom = isLeftEye ? landmarks.EyeLeftBottom : landmarks.EyeRightBottom; var inner = isLeftEye ? landmarks.EyeLeftInner : landmarks.EyeRightInner; var outer = isLeftEye ? landmarks.EyeLeftOuter : landmarks.EyeRightOuter; if (top == null || bottom == null || inner == null || outer == null) { _logger.LogWarning("Missing eye landmarks"); return 1.0; // Return 1.0 (open) to prevent false positives for drowsiness } double verticalDist = CalculateDistance(top, bottom); double horizontalDist = CalculateDistance(inner, outer); // Simplified EAR for Azure 27-point model double ear = verticalDist / horizontalDist; _logger.LogDebug( "EAR for {Eye}: {Value:F3}", isLeftEye ? "left" : "right", ear ); return ear; } Usage Example: var leftEAR = ComputeEAR(landmarks, isLeftEye: true); var rightEAR = ComputeEAR(landmarks, isLeftEye: false); var avgEAR = (leftEAR + rightEAR) / 2.0; Console.WriteLine($"Average EAR: {avgEAR:F3}"); // Open eyes: ~0.25-0.30 // Closed eyes: ~0.10-0.15 5.3 Mouth Aspect Ratio (MAR) /// <summary> /// Calculates Mouth Aspect Ratio relative to face height. /// </summary> public double CalculateMouthAspectRatio(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthWidth = CalculateDistance(landmarks.MouthLeft, landmarks.MouthRight); double mouthOpenRatio = mouthHeight / faceRect.Height; double mouthWidthRatio = mouthWidth / faceRect.Width; _logger.LogDebug( "Mouth - Height ratio: {HeightRatio:F3}, Width ratio: {WidthRatio:F3}", mouthOpenRatio, mouthWidthRatio ); return mouthOpenRatio; } 5.4 Inter-Eye Distance /// <summary> /// Calculates the distance between pupils (inter-pupillary distance). /// </summary> public double CalculateInterEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.PupilLeft, landmarks.PupilRight); } /// <summary> /// Calculates distance between inner eye corners. /// </summary> public double CalculateInnerEyeDistance(FaceLandmarks landmarks) { return CalculateDistance(landmarks.EyeLeftInner, landmarks.EyeRightInner); } 5.5 Face Symmetry Analysis /// <summary> /// Analyzes facial symmetry by comparing left and right sides. /// </summary> public FaceSymmetryMetrics AnalyzeFaceSymmetry(FaceLandmarks landmarks) { double centerX = landmarks.NoseTip.X; double leftEyeDistance = CalculateDistance(landmarks.EyeLeftInner, new { X = centerX, Y = landmarks.EyeLeftInner.Y }); double leftMouthDistance = CalculateDistance(landmarks.MouthLeft, new { X = centerX, Y = landmarks.MouthLeft.Y }); double rightEyeDistance = CalculateDistance(landmarks.EyeRightInner, new { X = centerX, Y = landmarks.EyeRightInner.Y }); double rightMouthDistance = CalculateDistance(landmarks.MouthRight, new { X = centerX, Y = landmarks.MouthRight.Y }); return new FaceSymmetryMetrics { EyeSymmetryRatio = leftEyeDistance / rightEyeDistance, MouthSymmetryRatio = leftMouthDistance / rightMouthDistance, IsSymmetric = Math.Abs(leftEyeDistance - rightEyeDistance) < 5.0 }; } public class FaceSymmetryMetrics { public double EyeSymmetryRatio { get; set; } public double MouthSymmetryRatio { get; set; } public bool IsSymmetric { get; set; } } Part 6: Head Pose Estimation 6.1 Understanding Head Pose Angles Azure Face API provides three Euler angles for head orientation: 6.2 Accessing Head Pose Data public void AnalyzeHeadPose(FaceDetectionResult face) { var headPose = face.FaceAttributes?.HeadPose; if (headPose == null) { _logger.LogWarning("Head pose not available"); return; } double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; Console.WriteLine("Head Pose:"); Console.WriteLine($" Yaw: {yaw:F2}° (Left/Right)"); Console.WriteLine($" Pitch: {pitch:F2}° (Up/Down)"); Console.WriteLine($" Roll: {roll:F2}° (Tilt)"); InterpretHeadPose(yaw, pitch, roll); } 6.3 Interpreting Head Pose public string InterpretHeadPose(double yaw, double pitch, double roll) { var directions = new List<string>(); // Interpret Yaw (horizontal) if (Math.Abs(yaw) < 10) directions.Add("Looking Forward"); else if (yaw < -20) directions.Add($"Turned Left ({Math.Abs(yaw):F0}°)"); else if (yaw > 20) directions.Add($"Turned Right ({yaw:F0}°)"); // Interpret Pitch (vertical) if (Math.Abs(pitch) < 10) directions.Add("Level"); else if (pitch < -15) directions.Add($"Looking Down ({Math.Abs(pitch):F0}°)"); else if (pitch > 15) directions.Add($"Looking Up ({pitch:F0}°)"); // Interpret Roll (tilt) if (Math.Abs(roll) > 15) { string side = roll < 0 ? "Left" : "Right"; directions.Add($"Tilted {side} ({Math.Abs(roll):F0}°)"); } return string.Join(", ", directions); } 6.4 Visualizing Head Pose on Frame /// <summary> /// Draws head pose information with color-coded indicators. /// </summary> public void DrawHeadPoseInfo(Mat frame, HeadPose headPose, FaceRectangle faceRect) { double yaw = headPose.Yaw; double pitch = headPose.Pitch; double roll = headPose.Roll; int centerX = faceRect.Left + faceRect.Width / 2; int centerY = faceRect.Top + faceRect.Height / 2; string poseText = $"Yaw: {yaw:F1}° Pitch: {pitch:F1}° Roll: {roll:F1}°"; Cv2.PutText(frame, poseText, new Point(faceRect.Left, faceRect.Top - 10), HersheyFonts.HersheySimplex, 0.5, new Scalar(255, 255, 255), 1); int arrowLength = 50; double yawRadians = yaw * Math.PI / 180.0; int arrowEndX = centerX + (int)(arrowLength * Math.Sin(yawRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(arrowEndX, centerY), new Scalar(0, 255, 0), 2, tipLength: 0.3); double pitchRadians = -pitch * Math.PI / 180.0; int arrowPitchEndY = centerY + (int)(arrowLength * Math.Sin(pitchRadians)); Cv2.ArrowedLine(frame, new Point(centerX, centerY), new Point(centerX, arrowPitchEndY), new Scalar(255, 0, 0), 2, tipLength: 0.3); } 6.5 Detecting Head Orientation States public enum HeadOrientation { Forward, Left, Right, Up, Down, TiltedLeft, TiltedRight, UpLeft, UpRight, DownLeft, DownRight } public List<HeadOrientation> DetectHeadOrientation(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; var orientations = new List<HeadOrientation>(); if (!lookingUp && !lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Forward); if (lookingUp && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Up); if (lookingDown && !lookingLeft && !lookingRight) orientations.Add(HeadOrientation.Down); if (lookingLeft && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Left); if (lookingRight && !lookingUp && !lookingDown) orientations.Add(HeadOrientation.Right); if (lookingUp && lookingLeft) orientations.Add(HeadOrientation.UpLeft); if (lookingUp && lookingRight) orientations.Add(HeadOrientation.UpRight); if (lookingDown && lookingLeft) orientations.Add(HeadOrientation.DownLeft); if (lookingDown && lookingRight) orientations.Add(HeadOrientation.DownRight); return orientations; } Part 7: Real-Time Video Processing 7.1 Setting Up Video Capture using OpenCvSharp; public class RealTimeFaceAnalyzer : IDisposable { private VideoCapture? _capture; private Mat? _frame; private readonly FaceClient _faceClient; private bool _isRunning; public async Task StartAsync() { _capture = new VideoCapture(0); _frame = new Mat(); _isRunning = true; await Task.Run(() => ProcessVideoLoop()); } private async Task ProcessVideoLoop() { while (_isRunning) { if (_capture == null || !_capture.IsOpened()) break; _capture.Read(_frame); if (_frame == null || _frame.Empty()) { await Task.Delay(1); // Minimal delay to prevent CPU spiking continue; } Cv2.Resize(_frame, _frame, new Size(640, 480)); // Ensure we don't await indefinitely in the rendering loop _ = ProcessFrameAsync(_frame.Clone()); Cv2.ImShow("Face Analysis", _frame); if (Cv2.WaitKey(30) == 'q') break; } Dispose(); } private async Task ProcessFrameAsync(Mat frame) { // This is where your DrawFaceBox, DrawAllLandmarks, and EAR logic will sit. // Remember to use try-catch here to prevent API errors from crashing the loop. } public void Dispose() { _isRunning = false; _capture?.Dispose(); _frame?.Dispose(); Cv2.DestroyAllWindows(); } } 7.2 Optimizing API Calls Problem: Calling Azure Face API on every frame (30 fps) is expensive and slow. Solution: Call API once per second, cache results for 30 frames. private List<FaceDetectionResult> _cachedFaces = new(); private DateTime _lastDetectionTime = DateTime.MinValue; private readonly object _cacheLock = new(); private async Task ProcessFrameAsync(Mat frame) { if ((DateTime.Now - _lastDetectionTime).TotalSeconds >= 1.0) { _lastDetectionTime = DateTime.Now; byte[] imageBytes; Cv2.ImEncode(".jpg", frame, out imageBytes); var faces = await DetectFacesAsync(imageBytes); lock (_cacheLock) { _cachedFaces = faces; } } List<FaceDetectionResult> facesToProcess; lock (_cacheLock) { facesToProcess = _cachedFaces.ToList(); } foreach (var face in facesToProcess) { DrawFaceAnnotations(face, frame); } } Performance Improvement: 30x fewer API calls (1/sec instead of 30/sec) ~$0.02/hour instead of ~$0.60/hour Smooth 30 fps rendering < 100ms latency for visual updates 7.3 Drawing Complete Face Annotations private void DrawFaceAnnotations(FaceDetectionResult face, Mat frame) { DrawFaceBox(face, frame); if (face.FaceLandmarks != null) { DrawAllLandmarks(face.FaceLandmarks, frame); DrawEyeBoxes(face.FaceLandmarks, frame); DrawMouthBox(face.FaceLandmarks, frame); DrawNoseBox(face.FaceLandmarks, frame); double leftEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: true); double rightEAR = ComputeEAR(face.FaceLandmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; Cv2.PutText(frame, $"EAR: {avgEAR:F3}", new Point(10, 30), HersheyFonts.HersheySimplex, 0.6, new Scalar(0, 255, 0), 2); } if (face.FaceAttributes?.HeadPose != null) { DrawHeadPoseInfo(frame, face.FaceAttributes.HeadPose, face.FaceRectangle); string orientation = InterpretHeadPose(face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll); Cv2.PutText(frame, orientation, new Point(10, 60), HersheyFonts.HersheySimplex, 0.6, new Scalar(255, 255, 0), 2); } } Part 8: Advanced Features and Use Cases 8.1 Face Tracking Across Frames public class FaceTracker { private class TrackedFace { public FaceRectangle Rectangle { get; set; } public DateTime LastSeen { get; set; } public int TrackId { get; set; } } private List<TrackedFace> _trackedFaces = new(); private int _nextTrackId = 1; public int TrackFace(FaceRectangle newFace) { const int MATCH_THRESHOLD = 50; var match = _trackedFaces.FirstOrDefault(tf => { double distance = Math.Sqrt(Math.Pow(tf.Rectangle.Left - newFace.Left, 2) + Math.Pow(tf.Rectangle.Top - newFace.Top, 2)); return distance < MATCH_THRESHOLD; }); if (match != null) { match.Rectangle = newFace; match.LastSeen = DateTime.Now; return match.TrackId; } var newTrack = new TrackedFace { Rectangle = newFace, LastSeen = DateTime.Now, TrackId = _nextTrackId++ }; _trackedFaces.Add(newTrack); return newTrack.TrackId; } public void RemoveOldTracks(TimeSpan maxAge) { _trackedFaces.RemoveAll(tf => DateTime.Now - tf.LastSeen > maxAge); } } 8.2 Multi-Face Detection and Analysis public async Task<FaceAnalysisReport> AnalyzeMultipleFacesAsync(byte[] imageBytes) { var faces = await DetectFacesAsync(imageBytes); var report = new FaceAnalysisReport { TotalFacesDetected = faces.Count, Timestamp = DateTime.Now, Faces = new List<SingleFaceAnalysis>() }; for (int i = 0; i < faces.Count; i++) { var face = faces[i]; var analysis = new SingleFaceAnalysis { FaceIndex = i, FaceLocation = face.FaceRectangle, FaceSize = face.FaceRectangle.Width * face.FaceRectangle.Height }; if (face.FaceLandmarks != null) { analysis.LeftEyeEAR = ComputeEAR(face.FaceLandmarks, true); analysis.RightEyeEAR = ComputeEAR(face.FaceLandmarks, false); analysis.InterPupillaryDistance = CalculateInterEyeDistance(face.FaceLandmarks); } if (face.FaceAttributes?.HeadPose != null) { analysis.HeadYaw = face.FaceAttributes.HeadPose.Yaw; analysis.HeadPitch = face.FaceAttributes.HeadPose.Pitch; analysis.HeadRoll = face.FaceAttributes.HeadPose.Roll; } report.Faces.Add(analysis); } report.Faces = report.Faces.OrderByDescending(f => f.FaceSize).ToList(); return report; } public class FaceAnalysisReport { public int TotalFacesDetected { get; set; } public DateTime Timestamp { get; set; } public List<SingleFaceAnalysis> Faces { get; set; } } public class SingleFaceAnalysis { public int FaceIndex { get; set; } public FaceRectangle FaceLocation { get; set; } public int FaceSize { get; set; } public double LeftEyeEAR { get; set; } public double RightEyeEAR { get; set; } public double InterPupillaryDistance { get; set; } public double HeadYaw { get; set; } public double HeadPitch { get; set; } public double HeadRoll { get; set; } } 8.3 Exporting Landmark Data to JSON using System.Text.Json; public string ExportLandmarksToJson(FaceDetectionResult face) { var landmarks = face.FaceLandmarks; var landmarkData = new { Face = new { Rectangle = new { face.FaceRectangle.Left, face.FaceRectangle.Top, face.FaceRectangle.Width, face.FaceRectangle.Height } }, Eyes = new { Left = new { Outer = new { landmarks.EyeLeftOuter.X, landmarks.EyeLeftOuter.Y }, Inner = new { landmarks.EyeLeftInner.X, landmarks.EyeLeftInner.Y }, Top = new { landmarks.EyeLeftTop.X, landmarks.EyeLeftTop.Y }, Bottom = new { landmarks.EyeLeftBottom.X, landmarks.EyeLeftBottom.Y } }, Right = new { Outer = new { landmarks.EyeRightOuter.X, landmarks.EyeRightOuter.Y }, Inner = new { landmarks.EyeRightInner.X, landmarks.EyeRightInner.Y }, Top = new { landmarks.EyeRightTop.X, landmarks.EyeRightTop.Y }, Bottom = new { landmarks.EyeRightBottom.X, landmarks.EyeRightBottom.Y } } }, Mouth = new { UpperLipTop = new { landmarks.UpperLipTop.X, landmarks.UpperLipTop.Y }, UnderLipBottom = new { landmarks.UnderLipBottom.X, landmarks.UnderLipBottom.Y }, Left = new { landmarks.MouthLeft.X, landmarks.MouthLeft.Y }, Right = new { landmarks.MouthRight.X, landmarks.MouthRight.Y } }, Nose = new { Tip = new { landmarks.NoseTip.X, landmarks.NoseTip.Y }, RootLeft = new { landmarks.NoseRootLeft.X, landmarks.NoseRootLeft.Y }, RootRight = new { landmarks.NoseRootRight.X, landmarks.NoseRootRight.Y } }, HeadPose = face.FaceAttributes?.HeadPose != null ? new { face.FaceAttributes.HeadPose.Yaw, face.FaceAttributes.HeadPose.Pitch, face.FaceAttributes.HeadPose.Roll } : null }; return JsonSerializer.Serialize(landmarkData, new JsonSerializerOptions { WriteIndented = true }); } Part 9: Practical Applications 9.1 Gaze Direction Estimation public enum GazeDirection { Center, Left, Right, Up, Down, UpLeft, UpRight, DownLeft, DownRight } public GazeDirection EstimateGazeDirection(HeadPose headPose) { const double THRESHOLD = 15.0; bool lookingUp = headPose.Pitch > THRESHOLD; bool lookingDown = headPose.Pitch < -THRESHOLD; bool lookingLeft = headPose.Yaw < -THRESHOLD; bool lookingRight = headPose.Yaw > THRESHOLD; if (lookingUp && lookingLeft) return GazeDirection.UpLeft; if (lookingUp && lookingRight) return GazeDirection.UpRight; if (lookingDown && lookingLeft) return GazeDirection.DownLeft; if (lookingDown && lookingRight) return GazeDirection.DownRight; if (lookingUp) return GazeDirection.Up; if (lookingDown) return GazeDirection.Down; if (lookingLeft) return GazeDirection.Left; if (lookingRight) return GazeDirection.Right; return GazeDirection.Center; } 9.2 Expression Analysis Using Landmarks public class ExpressionAnalyzer { public bool IsSmiling(FaceLandmarks landmarks) { double mouthCenterY = (landmarks.UpperLipTop.Y + landmarks.UnderLipBottom.Y) / 2; double leftCornerY = landmarks.MouthLeft.Y; double rightCornerY = landmarks.MouthRight.Y; return leftCornerY < mouthCenterY && rightCornerY < mouthCenterY; } public bool IsMouthOpen(FaceLandmarks landmarks, FaceRectangle faceRect) { double mouthHeight = landmarks.UnderLipBottom.Y - landmarks.UpperLipTop.Y; double mouthOpenRatio = mouthHeight / faceRect.Height; return mouthOpenRatio > 0.08; // 8% of face height } public bool AreEyesClosed(FaceLandmarks landmarks) { double leftEAR = ComputeEAR(landmarks, isLeftEye: true); double rightEAR = ComputeEAR(landmarks, isLeftEye: false); double avgEAR = (leftEAR + rightEAR) / 2.0; return avgEAR < 0.18; // Threshold for closed eyes } } 9.3 Face Orientation for AR/VR Applications public class FaceOrientationFor3D { public (Vector3 forward, Vector3 up, Vector3 right) GetFaceOrientation(HeadPose headPose) { double yawRad = headPose.Yaw * Math.PI / 180.0; double pitchRad = headPose.Pitch * Math.PI / 180.0; double rollRad = headPose.Roll * Math.PI / 180.0; var forward = new Vector3((float)(Math.Sin(yawRad) * Math.Cos(pitchRad)), (float)(-Math.Sin(pitchRad)), (float)(Math.Cos(yawRad) * Math.Cos(pitchRad))); var up = new Vector3((float)(Math.Sin(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) - Math.Cos(yawRad) * Math.Sin(rollRad)), (float)(Math.Cos(pitchRad) * Math.Cos(rollRad)), (float)(Math.Cos(yawRad) * Math.Sin(pitchRad) * Math.Cos(rollRad) + Math.Sin(yawRad) * Math.Sin(rollRad))); var right = Vector3.Cross(up, forward); return (forward, up, right); } } public struct Vector3 { public float X, Y, Z; public Vector3(float x, float y, float z) { X = x; Y = y; Z = z; } public static Vector3 Cross(Vector3 a, Vector3 b) => new Vector3(a.Y * b.Z - a.Z * b.Y, a.Z * b.X - a.X * b.Z, a.X * b.Y - a.Y * b.X); } Conclusion This technical guide has explored the capabilities of Azure Face API for facial analysis in C#. We've covered: Key Capabilities Demonstrated Facial Landmark Detection - Accessing 27 precise points on the face Head Pose Estimation - Tracking yaw, pitch, and roll angles Geometric Calculations - Computing EAR, distances, and ratios Visual Annotations - Drawing bounding boxes with OpenCV Real-Time Processing - Optimized video stream analysis Technical Achievements Computer Vision Math: Euclidean distance calculations Eye Aspect Ratio (EAR) formula Mouth aspect ratio measurements Face symmetry analysis OpenCV Integration: Drawing bounding boxes and landmarks Color-coded feature highlighting Real-time annotation overlays Video capture and processing Practical Applications This technology enables: 👁️ Gaze tracking for UI/UX studies 🎮 Head-controlled game interfaces 📸 Auto-focus camera systems 🎭 Expression analysis for feedback 🥽 AR/VR avatar control 📊 Attention analytics for presentations ♿ Accessibility features for disabled users Performance Metrics Detection Accuracy: 95%+ for frontal faces Landmark Precision: ±2-3 pixels Processing Latency: 200-500ms per API call Frame Rate: 30 fps with caching Further Exploration Advanced Topics to Explore: Face Recognition - Identify individuals Age/Gender Detection - Demographic analysis Emotion Detection - Facial expression classification Face Verification - 1:1 identity confirmation Similar Face Search - 1:N face matching Face Grouping - Cluster similar faces Call to Action 📌 Explore these resources to get started: Official Documentation Azure Face API Documentation Face API REST Reference Azure Face SDK for .NET Related Libraries OpenCVSharp - OpenCV wrapper for .NET System.Drawing - .NET image processing Source Code GitHub Repository: ravimodi_microsoft/SmartDriver Sample Code: Included in this article