tips and tricks
121 TopicsComplete Guide to Deploying OpenClaw on Azure Windows 11 Virtual Machine
1. Introduction to OpenClaw OpenClaw is an open-source AI personal assistant platform that runs on your own devices and executes real-world tasks. Unlike traditional cloud-based AI assistants, OpenClaw emphasizes local deployment and privacy protection, giving you complete control over your data. Key Features of OpenClaw Cross-Platform Support: Runs on Windows, macOS, Linux, and other operating systems Multi-Channel Integration: Interact with AI through messaging platforms like WhatsApp, Telegram, and Discord Task Automation: Execute file operations, browser control, system commands, and more Persistent Memory: AI remembers your preferences and contextual information Flexible AI Backends: Supports multiple large language models including Anthropic Claude and OpenAI GPT OpenClaw is built on Node.js and can be quickly installed and deployed via npm. 2. Security Advantages of Running OpenClaw on Azure VM Deploying OpenClaw on an Azure virtual machine instead of your personal computer offers significant security benefits: 1. Environment Isolation Azure VMs provide a completely isolated runtime environment. Even if the AI agent exhibits abnormal behavior or is maliciously exploited, it won't affect your personal computer or local data. This isolation mechanism forms the foundation of a zero-trust security architecture. 2. Network Security Controls Through Azure Network Security Groups (NSGs), you can precisely control which IP addresses can access your virtual machine. The RDP rules configured in the deployment script allow you to securely connect to your Windows 11 VM via Remote Desktop while enabling further restrictions on access sources. 3. Data Persistence and Backup Azure VM managed disks support automatic snapshots and backups. Even if the virtual machine encounters issues, your OpenClaw configuration and data remain safe. 4. Elastic Resource Management You can adjust VM specifications (memory, CPU) at any time based on actual needs, or stop the VM when not in use to save costs, maintaining maximum flexibility. 5. Enterprise-Grade Authentication Azure supports integration with Azure Active Directory (Entra ID) for identity verification, allowing you to assign different access permissions to team members for granular access control. 6. Audit and Compliance Azure provides detailed activity logs and audit trails, making it easy to trace any suspicious activity and meet enterprise compliance requirements. 3. Deployment Steps Explained This deployment script uses Azure CLI to automate the installation of OpenClaw and its dependencies on a Windows 11 virtual machine. Here are the detailed execution steps: Prerequisites Before running the script, ensure you have: Install Azure CLI # Windows users can download the MSI installer https://aka.ms/installazurecliwindows # macOS users brew install azure-cli # Linux users curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash 2. Log in to Azure Account az login 3. Prepare Deployment Script Save the provided deploy-windows11-vm.sh script locally and grant execute permissions: chmod +x deploy-windows11-vm.sh Step 1: Configure Deployment Parameters The script begins by defining key configuration variables that you can modify as needed: RESOURCE_GROUP="Your Azure Resource Group Name" # Resource group name VM_NAME="win11-openclaw-vm" # Virtual machine name LOCATION="Your Azure Regison Name" # Azure region ADMIN_USERNAME="Your Azure VM Administrator Name" # Administrator username ADMIN_PASSWORD="our Azure VM Administrator Password" # Administrator password (change to a strong password) VM_SIZE="Your Azure VM Size" # VM size (4GB memory) Security Recommendations: Always change ADMIN_PASSWORD to your own strong password Passwords should contain uppercase and lowercase letters, numbers, and special characters Never commit scripts containing real passwords to code repositories Step 2: Check and Create Resource Group The script first checks if the specified resource group exists, and creates it automatically if it doesn't: echo "Checking resource group $RESOURCE_GROUP..." az group show --name $RESOURCE_GROUP &> /dev/null if [ $? -ne 0 ]; then echo "Creating resource group $RESOURCE_GROUP..." az group create --name $RESOURCE_GROUP --location $LOCATION fi A resource group is a logical container in Azure used to organize and manage related resources. All associated resources (VMs, networks, storage, etc.) will be created within this resource group. Step 3: Create Windows 11 Virtual Machine This is the core step, using the az vm create command to create a Windows 11 Pro virtual machine: az vm create \ --resource-group $RESOURCE_GROUP \ --name $VM_NAME \ --image MicrosoftWindowsDesktop:windows-11:win11-24h2-pro:latest \ --size $VM_SIZE \ --admin-username $ADMIN_USERNAME \ --admin-password $ADMIN_PASSWORD \ --public-ip-sku Standard \ --nsg-rule RDP Parameter Explanations: --image: Uses the latest Windows 11 24H2 Professional edition image --size: Standard_B2s provides 2 vCPUs and 4GB memory, suitable for running OpenClaw --public-ip-sku Standard: Assigns a standard public IP --nsg-rule RDP: Automatically creates network security group rules allowing RDP (port 3389) inbound traffic Step 4: Retrieve Virtual Machine Public IP After VM creation completes, the script retrieves its public IP address: PUBLIC_IP=$(az vm show -d -g $RESOURCE_GROUP -n $VM_NAME --query publicIps -o tsv) echo "VM Public IP: $PUBLIC_IP" This IP address will be used for subsequent RDP remote connections. Step 5: Install Chocolatey Package Manager Using az vm run-command to execute PowerShell scripts inside the VM, first installing Chocolatey: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString( 'https://community.chocolatey.org/install.ps1'))" Chocolatey is a package manager for Windows, similar to apt or yum on Linux, simplifying subsequent software installations. Step 6: Install Git Git is a dependency for many npm packages, especially those that need to download source code from GitHub for compilation: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install git -y" Step 7: Install CMake and Visual Studio Build Tools Some of OpenClaw's native modules require compilation, necessitating the installation of C++ build toolchain: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "C:\ProgramData\chocolatey\bin\choco.exe install cmake visualstudio2022buildtools visualstudio2022-workload-vctools -y" Component Descriptions: cmake: Cross-platform build system visualstudio2022buildtools: VS 2022 Build Tools visualstudio2022-workload-vctools: C++ development toolchain Step 8: Install Node.js LTS Install the Node.js Long Term Support version, which is the core runtime environment for OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); C:\ProgramData\chocolatey\bin\choco.exe install nodejs-lts -y" The script refreshes environment variables first to ensure Chocolatey is in the PATH, then installs Node.js LTS. Step 9: Globally Install OpenClaw Use npm to globally install OpenClaw: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); npm install -g openclaw" Global installation makes the openclaw command available from anywhere in the system. Step 10: Configure Environment Variables Add Node.js and npm global paths to the system PATH environment variable: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts " $npmGlobalPath = 'C:\Program Files\nodejs'; $npmUserPath = [System.Environment]::GetFolderPath('ApplicationData') + '\npm'; $currentPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine'); if ($currentPath -notlike \"*$npmGlobalPath*\") { $newPath = $currentPath + ';' + $npmGlobalPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added Node.js path to system PATH'; } if ($currentPath -notlike \"*$npmUserPath*\") { $newPath = [System.Environment]::GetEnvironmentVariable('Path', 'Machine') + ';' + $npmUserPath; [System.Environment]::SetEnvironmentVariable('Path', $newPath, 'Machine'); Write-Host 'Added npm global path to system PATH'; } Write-Host 'Environment variables updated successfully!'; " This ensures that node, npm, and openclaw commands can be used directly even in new terminal sessions. Step 11: Verify Installation The script finally verifies that all software is correctly installed: az vm run-command invoke -g $RESOURCE_GROUP -n $VM_NAME --command-id RunPowerShellScript \ --scripts "$env:Path = [System.Environment]::GetEnvironmentVariable('Path','Machine') + ';' + [System.Environment]::GetEnvironmentVariable('Path','User'); Write-Host 'Node.js version:'; node --version; Write-Host 'npm version:'; npm --version; Write-Host 'openclaw:'; npm list -g openclaw" Successful output should look similar to: Node.js version: v20.x.x npm version: 10.x.x openclaw: openclaw@x.x.x Step 12: Connect to Virtual Machine After deployment completes, the script outputs connection information: ============================================ Deployment completed! ============================================ Resource Group: Your Azure Resource Group Name VM Name: win11-openclaw-vm Public IP: xx.xx.xx.xx Admin Username: Your Administrator UserName VM Size: Your VM Size Connect via RDP: mstsc /v:xx.xx.xx.xx ============================================ Connection Methods: Windows Users: Press Win + R to open Run dialog Enter mstsc /v:public_ip and press Enter Log in using the username and password set in the script macOS Users: Download "Windows App" from the App Store Add PC connection with the public IP Log in using the username and password set in the script Linux Users: # Use Remmina or xfreerdp xfreerdp /u:username /v:public_ip Step 13: Initialize OpenClaw After connecting to the VM, run the following in PowerShell or Command Prompt # Initialize OpenClaw openclaw onboard # Configure AI model API key # Edit configuration file: C:\Users\username\.openclaw\openclaw.json notepad $env:USERPROFILE\.openclaw\openclaw.json Add your AI API key in the configuration file: { "agents": { "defaults": { "model": "Your Model Name", "apiKey": "your-api-key-here" } } } Step 14: Start OpenClaw # Start Gateway service openclaw gateway # In another terminal, connect messaging channels (e.g., WhatsApp) openclaw channels login Follow the prompts to scan the QR code and connect OpenClaw to your messaging app. 4. Summary Through this guide, we've successfully implemented the complete process of automatically deploying OpenClaw on an Azure Windows 11 virtual machine. The entire deployment process is highly automated, completing everything from VM creation to installing all dependencies and OpenClaw itself through a single script. Key Takeaways Automation Benefits: Using az vm run-command allows executing configuration scripts immediately after VM creation without manual RDP login Dependency Management: Chocolatey simplifies the Windows package installation workflow Environment Isolation: Running AI agents on cloud VMs protects local computers and data Scalability: Scripted deployment facilitates replication and team collaboration, easily deploying multiple instances Cost Optimization Tips Standard_B2s VMs cost approximately $0.05/hour (~$37/month) on pay-as-you-go pricing When not in use, stop the VM to only pay for storage costs Consider Azure Reserved Instances to save up to 72% Security Hardening Recommendations Change Default Port: Modify RDP port from 3389 to a custom port Enable JIT Access: Use Azure Security Center's just-in-time access feature Configure Firewall Rules: Only allow specific IP addresses to access Regular System Updates: Enable automatic Windows Updates Use Azure Key Vault: Store API keys in Key Vault instead of configuration files 5. Additional Resources Official Documentation OpenClaw Website: https://openclaw.ai OpenClaw GitHub: https://github.com/openclaw/openclaw OpenClaw Documentation: https://docs.openclaw.ai Azure CLI Documentation: https://docs.microsoft.com/cli/azure/ Azure Resources Azure VM Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ Azure Free Account: https://azure.microsoft.com/free/ (new users receive $200 credit) Azure Security Center: https://azure.microsoft.com/services/security-center/ Azure Key Vault: https://azure.microsoft.com/services/key-vault/3.4KViews3likes2CommentsAdding AI Personality to Browser Games
Introduction Browser games traditionally follow predictable patterns, fixed text messages, static tutorials, scripted NPC responses. Players see the same "Game Over" message whether they nearly won or failed spectacularly. Tutorial text remains identical regardless of player skill level. The game experience, while fun, lacks the dynamic reactivity of human-moderated gameplay. What if your Space Invaders game could comment on gameplay in real-time? Taunt players when they miss easy shots? Celebrate close victories with personalized messages? Adjust difficulty suggestions based on actual performance metrics? This article demonstrates exactly that: integrating AI-powered dynamic commentary into a browser game using Spaceinvaders-FoundryLocal, vanilla JavaScript, and Microsoft Foundry Local. You'll learn how to integrate local AI into client-side games, design AI personality systems that enhance rather than distract, implement context-aware commentary generation, and architect optional AI features that don't break core gameplay when unavailable. Whether you're building educational games, interactive training simulations, or simply adding personality to entertainment projects, this approach provides a blueprint for AI-enhanced gaming experiences. Why Local AI Transforms Browser Gaming Adding AI to games sounds expensive, cloud API costs scale with player counts, introducing per-gameplay pricing that makes free-to-play models challenging. Privacy concerns emerge when gameplay data leaves user devices. Latency affects real-time experiences, waiting 2 seconds for commentary after an action breaks immersion. Network requirements exclude offline play. Local AI solves all these challenges simultaneously. Foundry Local runs Small Language Models (SLMs) entirely on player devices, no API costs, no data leaving the machine, no network dependency. Inference happens in milliseconds, enabling truly real-time responses. Games work offline after initial load, perfect for mobile or low-connectivity scenarios. SLMs excel at personality-driven tasks like game commentary. They don't need perfect factual recall or complex reasoning, they generate entertaining, contextually relevant text based on game state. A 1.5B parameter model produces engaging taunts and celebration messages indistinguishable from hand-written content, while running easily on mid-range laptops. Integrating AI as an optional enhancement demonstrates good architecture. Core gameplay must function perfectly without AI, commentary enhances the experience but failure doesn't break the game. This graceful degradation pattern ensures maximum compatibility while offering AI features to capable devices. Architecture: Progressive Enhancement with AI The Spaceinvaders-FoundryLocal implementation uses progressive enhancement, the game fully works without AI, but adds dynamic personality when available: The base game implements classic Space Invaders mechanics entirely in vanilla JavaScript. Player ship movement, bullet physics, enemy patterns, collision detection, scoring, and power-up systems all operate independently of AI. This ensures universal compatibility across browsers, devices, and network conditions. The AI layer adds dynamic commentary through a backend Node.js proxy. The proxy runs locally, communicates with Foundry Local, and provides game context to the AI for generating personalized messages. The game polls the proxy periodically, sending current game state (score, accuracy, wave number, power-up usage) and receiving commentary responses. The architecture flow for AI-enhanced gameplay: Player Action (e.g., destroys enemy) ↓ Game Updates State (score += 100, accuracy tracked) ↓ Game Checks AI Status (polling every 5 seconds) ↓ If AI Available: Send Game Context to Backend → { event: 'wave_complete', score: 2500, accuracy: 78%, wave: 3 } ↓ Backend builds prompt with context ↓ Foundry Local generates comment ↓ Return commentary to game → "Wave 3 conquered! Your 78% accuracy shows improving skills." ↓ Display in game UI (animated text bubble) This design demonstrates several key patterns: Zero-dependency core: Game playable immediately, AI adds value incrementally Graceful degradation: If AI unavailable, game shows generic messages Asynchronous enhancement: AI runs in background, never blocks gameplay Context-aware generation: Commentary reflects actual player performance Local-first architecture: Everything runs on player's machine—no servers, no tracking Implementing Context-Aware AI Commentary Effective game commentary requires understanding current gameplay context. The AI needs to know what just happened, how the player is performing, and what makes this moment interesting: // llm.js - AI integration module export class GameAI { constructor() { this.baseURL = 'http://localhost:3001'; // Local proxy server this.available = false; this.checkAvailability(); } async checkAvailability() { try { const response = await fetch(`${this.baseURL}/health`, { method: 'GET', timeout: 2000 }); this.available = response.ok; return this.available; } catch (error) { console.log('AI server not available (optional feature)'); this.available = false; return false; } } async generateComment(gameContext) { if (!this.available) { return this.getFallbackComment(gameContext.event); } try { const response = await fetch(`${this.baseURL}/api/comment`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(gameContext) }); if (!response.ok) { throw new Error('AI request failed'); } const data = await response.json(); return data.comment; } catch (error) { console.error('AI comment generation failed:', error); return this.getFallbackComment(gameContext.event); } } getFallbackComment(event) { // Static messages when AI unavailable const fallbacks = { 'wave_complete': 'Wave cleared!', 'player_hit': 'Shields damaged!', 'game_over': 'Game Over. Try again!', 'high_score': 'New high score!', 'power_up': 'Power-up collected!' }; return fallbacks[event] || 'Good job!'; } } The backend processes game context and generates contextually relevant commentary: // server.js - Node.js backend proxy import express from 'express'; import { FoundryLocalClient } from 'foundry-local-sdk'; const app = express(); const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT || 'http://127.0.0.1:5272' }); app.use(express.json()); app.use(express.cors()); // Allow browser game to connect app.get('/health', (req, res) => { res.json({ status: 'AI available', model: 'phi-3.5-mini' }); }); app.post('/api/comment', async (req, res) => { const { event, score, accuracy, wave, lives, combo } = req.body; // Build context-rich prompt const prompt = buildCommentPrompt(event, { score, accuracy, wave, lives, combo }); try { const completion = await foundry.chat.completions.create({ model: 'phi-3.5-mini', messages: [ { role: 'system', content: `You are an AI commander providing brief, encouraging commentary for a Space Invaders game. Be energetic, supportive, and sometimes humorous. Keep responses to 1-2 sentences maximum. Reference specific game metrics when relevant.` }, { role: 'user', content: prompt } ], temperature: 0.9, // High temperature for creative variety max_tokens: 50 }); const comment = completion.choices[0].message.content.trim(); res.json({ comment, model: 'phi-3.5-mini', timestamp: new Date().toISOString() }); } catch (error) { console.error('AI generation error:', error); res.status(500).json({ error: 'Commentary generation failed' }); } }); function buildCommentPrompt(event, context) { switch(event) { case 'wave_complete': return `The player just completed wave ${context.wave} with score ${context.score}. Their shooting accuracy is ${context.accuracy}%. ${context.lives} lives remaining. Generate an encouraging comment about their progress.`; case 'player_hit': return `The player got hit by an enemy! They now have ${context.lives} lives left. Score: ${context.score}. Provide a brief motivational comment to keep them engaged.`; case 'game_over': if (context.accuracy > 70) { return `Game over at wave ${context.wave}, score ${context.score}. The player had ${context.accuracy}% accuracy - pretty good! Generate an encouraging comment acknowledging their skill.`; } else { return `Game over at wave ${context.wave}, score ${context.score}. Accuracy was ${context.accuracy}%. Provide a supportive comment with a tip for improvement.`; } case 'combo_streak': return `Player achieved a ${context.combo}x combo streak! Score: ${context.score}. Generate an excited celebration comment.`; case 'power_up_used': return `Player activated a ${context.power_up_type} power-up. Generate a brief tactical comment about using it effectively.`; default: return `General gameplay comment. Score: ${context.score}, Wave: ${context.wave}.`; } } const PORT = 3001; app.listen(PORT, () => { console.log(`✓ Game AI server running on http://localhost:${PORT}`); console.log(`✓ Foundry Local endpoint: ${process.env.FOUNDRY_LOCAL_ENDPOINT || 'http://127.0.0.1:5272'}`); }); This backend demonstrates several best practices: Context-sensitive prompting: Different events get different prompt templates with relevant metrics Personality consistency: System message establishes tone and style guidelines Brevity constraints: max_tokens: 50 ensures comments don't overwhelm UI Creative variety: High temperature (0.9) produces diverse commentary on repeated events Performance-aware feedback: Comments adapt based on accuracy, lives remaining, combo streaks Integrating AI into Game Loop Without Performance Impact Games require 60 FPS to feel smooth, any blocking operation creates stutter. AI integration must be completely asynchronous and non-blocking: // game.js - Main game loop class SpaceInvadersGame { constructor() { this.ai = new GameAI(); this.lastAIUpdate = 0; this.aiUpdateInterval = 5000; // Poll AI every 5 seconds this.pendingAIRequest = false; // ... other game state } update(deltaTime) { // Core game logic (always runs) this.updatePlayer(deltaTime); this.updateEnemies(deltaTime); this.updateBullets(deltaTime); this.checkCollisions(); this.updatePowerUps(deltaTime); // AI commentary (optional, async) this.updateAI(deltaTime); } updateAI(deltaTime) { this.lastAIUpdate += deltaTime; // Only check AI periodically, never block gameplay if (this.lastAIUpdate >= this.aiUpdateInterval && !this.pendingAIRequest) { this.requestAICommentary(); } } async requestAICommentary() { // Check if there's an interesting event to comment on const event = this.getSignificantEvent(); if (!event) return; this.pendingAIRequest = true; // Fire-and-forget async request this.ai.generateComment({ event: event.type, score: this.score, accuracy: this.calculateAccuracy(), wave: this.currentWave, lives: this.lives, combo: this.comboMultiplier }) .then(comment => { this.displayAIComment(comment); this.lastAIUpdate = 0; }) .catch(error => { console.log('AI comment failed (non-critical):', error); }) .finally(() => { this.pendingAIRequest = false; }); } getSignificantEvent() { // Determine what's worth commenting on if (this.justCompletedWave) { this.justCompletedWave = false; return { type: 'wave_complete' }; } if (this.justGotHit) { this.justGotHit = false; return { type: 'player_hit' }; } if (this.comboMultiplier >= 5) { return { type: 'combo_streak' }; } return null; // Nothing interesting right now } displayAIComment(comment) { // Show comment in animated text bubble const bubble = document.createElement('div'); bubble.className = 'ai-comment-bubble'; bubble.textContent = comment; document.getElementById('game-container').appendChild(bubble); // Animate in setTimeout(() => bubble.classList.add('show'), 50); // Remove after 4 seconds setTimeout(() => { bubble.classList.remove('show'); setTimeout(() => bubble.remove(), 500); }, 4000); } calculateAccuracy() { if (this.shotsFired === 0) return 0; return Math.round((this.shotsHit / this.shotsFired) * 100); } } This integration pattern ensures: Zero gameplay impact: AI runs completely asynchronously—game never waits for AI Periodic updates only: Check AI every 5 seconds, not every frame (60 FPS → minimal CPU overhead) Event-driven commentary: Only request comments for significant moments, not continuous chatter Non-blocking display: Comments appear as animated overlays that don't interrupt gameplay Graceful failure: AI errors logged but never shown to players—game continues normally Designing AI Personality Systems Effective game AI has consistent personality that enhances rather than distracts. The system message establishes tone, response templates ensure variety, and context awareness makes commentary relevant: // Enhanced system message for consistent personality const AI_COMMANDER_PERSONALITY = ` You are AEGIS, an AI defense commander providing tactical commentary for a Space Invaders-style game. Your personality traits: - Enthusiastic but professional military commander tone - Celebrate victories with tactical language ("Excellent flanking maneuver!") - Acknowledge defeats with constructive feedback ("Regroup and maintain formation!") - Reference specific metrics to show you're paying attention - Keep responses to 1-2 sentences maximum - Use occasional humor but stay in character - Be encouraging even when player struggles Examples of your style: - "Wave neutralized! Your 85% accuracy shows precision targeting." - "Shield integrity compromised! Fall back and reassess the battlefield." - "Impressive combo multiplier! Sustained fire superiority achieved." - "That power-up spread pattern cleared the sector perfectly." `; // Context-aware response variety const RESPONSE_TEMPLATES = { wave_complete: { high_performance: [ "Your {accuracy}% accuracy led to decisive victory, Commander!", "Wave {wave} eliminated with tactical excellence!", "Strategic brilliance! {accuracy}% hit rate maintained." ], medium_performance: [ "Wave {wave} cleared. Solid tactics, Commander.", "Sector secured. Your {accuracy}% accuracy shows improvement potential.", "Objective achieved. Recommend tightening shot discipline." ], low_performance: [ "Wave {wave} cleared, but {accuracy}% accuracy needs work.", "Victory secured. Focus on accuracy in next engagement.", "Mission accomplished, though your hit rate needs improvement." ] }, player_hit: { lives_critical: [ "Critical damage! Only {lives} lives remain - exercise extreme caution!", "Shields failing! {lives} backup systems active.", "Red alert! Hull integrity at {lives} units." ], lives_okay: [ "Shields damaged. {lives} lives remaining. Stay focused!", "Hit sustained. {lives} backup systems online.", "Damage taken. Maintain defensive posture." ] } }; function selectResponseTemplate(event, context) { const templates = RESPONSE_TEMPLATES[event]; if (!templates) return; // Choose template category based on context let category; if (event === 'wave_complete') { if (context.accuracy >= 75) category = templates.high_performance; else if (context.accuracy >= 50) category = templates.medium_performance; else category = templates.low_performance; } else if (event === 'player_hit') { category = context.lives <= 2 ? templates.lives_critical : templates.lives_okay; } // Randomly select from category for variety const template = category[Math.floor(Math.random() * category.length)]; // Fill in context variables return template .replace('{accuracy}', context.accuracy) .replace('{wave}', context.wave) .replace('{lives}', context.lives); } This personality system creates: Consistent character: AEGIS always sounds like a military commander, never breaks character Context-appropriate responses: Different situations trigger different tones (celebration vs concern) Natural variety: Template randomization prevents repetitive commentary Metric awareness: Specific references to accuracy, lives, waves show AI is "watching" Encouraging feedback: Even in failure scenarios, provides constructive guidance Key Takeaways and Game AI Design Patterns Integrating AI into browser games demonstrates that advanced features don't require cloud services or complex infrastructure. Local AI enables personality-driven enhancements that run entirely on player devices, cost nothing at scale, and work offline. Essential principles for game AI integration: Progressive enhancement architecture: Core gameplay must work perfectly without AI—commentary enhances but isn't required Asynchronous-only integration: Never block game loop for AI—60 FPS gameplay is non-negotiable Context-aware generation: Commentary reflecting actual game state feels intelligent, generic messages feel robotic Personality consistency: Well-defined character voice creates memorable experiences Graceful failure handling: AI errors should be invisible to players—fallback to static messages Performance-conscious polling: Check AI every few seconds, not every frame Event-driven commentary: Only generate responses for significant moments This pattern extends beyond games, any interactive application benefits from context-aware AI personality: educational software providing personalized encouragement, fitness apps offering adaptive coaching, productivity tools giving motivational feedback. The complete implementation with game engine, AI integration, backend proxy, and deployment instructions is available at github.com/leestott/Spaceinvaders-FoundryLocal. Clone the repository to experience AI-enhanced gaming—just open index.html and start playing immediately, then optionally enable AI features for dynamic commentary. Resources and Further Reading Space Invaders with AI Repository - Complete game with AI integration Quick Start Guide - Play immediately or enable AI features Microsoft Foundry Local Documentation - SDK and model reference MDN Game Development - Browser game development patterns HTML5 Game Devs Forum - Community discussions and techniquesHow To: Send requests to Azure Storage from Azure API Management
In this How To, I will show a simple mechanism for writing a payload to Azure Blob Storage from Azure API Management. Some examples where this is useful is implementing a Claim-Check pattern for large messages or to support message logging when Application Insights is not suitable.How to Build Safe Natural Language-Driven APIs
TL;DR Building production natural language APIs requires separating semantic parsing from execution. Use LLMs to translate user text into canonical structured requests (via schemas), then execute those requests deterministically. Key patterns: schema completion for clarification, confidence gates to prevent silent failures, code-based ontologies for normalization, and an orchestration layer. This keeps language as input, not as your API contract. Introduction APIs that accept natural language as input are quickly becoming the norm in the age of agentic AI apps and LLMs. From search and recommendations to workflows and automation, users increasingly expect to "just ask" and get results. But treating natural language as an API contract introduces serious risks in production systems: Nondeterministic behavior Prompt-driven business logic Difficult debugging and replay Silent failures that are hard to detect In this post, I'll describe a production-grade architecture for building safe, natural language-driven APIs: one that embraces LLMs for intent discovery and entity extraction while preserving the determinism, observability, and reliability that backend systems require. This approach is based on building real systems using Azure OpenAI and LangGraph, and on lessons learned the hard way. The Core Problem with Natural Language APIs Natural language is an excellent interface for humans. It is a poor interface for systems. When APIs accept raw text directly and execute logic based on it, several problems emerge: The API contract becomes implicit and unversioned Small prompt changes cause behavioral changes Business logic quietly migrates into prompts In short: language becomes the contract, and that's fragile. The solution is not to avoid natural language, but to contain it. A Key Principle: Natural Language Is Input, Not a Contract So how do we contain it? The answer lies in treating natural language fundamentally differently than we treat traditional API inputs. The most important design decision we made was this: Natural language should be translated into structure, not executed directly. That single principle drives the entire architecture. Instead of building "chatty APIs," we split responsibilities clearly: Natural language is used for intent discovery and entity extraction Structured data is used for execution Two Explicit API Layers This principle translates into a concrete architecture with two distinct API layers, each with a single, clear responsibility. 1. Semantic Parse API (Natural Language → Structure) This API: Accepts user text Extracts intent and entities using LLMs Completes a predefined schema Asks clarifying questions when required Returns a canonical, structured request Does not execute business logic Think of this as a compiler, not an engine. 2. Structured Execution API (Structure → Action) This API: Accepts only structured input Calls downstream systems to process the request and get results Is deterministic and versioned Contains no natural language handling Is fully testable and replayable This is where execution happens. Why This Separation Matters Separating these layers gives you: A stable, versionable API contract Freedom to improve NLP without breaking clients Clear ownership boundaries Deterministic execution paths Most importantly, it prevents LLM behavior from leaking into core business logic. Canonical Schemas Are the Backbone Now that we've established the two-layer architecture, let's dive into what makes it work: canonical schemas. Each supported intent is defined by a canonical schema that lives in code. Example (simplified): This schema is used when a user is looking for similar product recommendations. The entities capture which product to use as reference and how to bias the recommendations toward price or quality. { "intent": "recommend_similar", "entities": { "reference_product_id": "string", "price_bias": "number (-1 to 1)", "quality_bias": "number (-1 to 1)" } } Schemas define: Required vs optional fields Allowed ranges and types Validation rules They are the contract, not the prompt. When a user says "show me products like the blue backpack but cheaper", the LLM extracts: Intent: recommend_similar reference_product_id: "blue_backpack_123" price_bias: -0.8 (strongly prefer cheaper) quality_bias: 0.0 (neutral) The schema ensures that even if the user phrased it as "find alternatives to item 123 with better pricing" or "cheaper versions of that blue bag", the output is always the same structure. The natural language variation is absorbed at the semantic layer. The execution layer receives a consistent, validated request every time. This decoupling is what makes the system maintainable. Schema Completion, Not Free-Form Chat But what happens when the user's input doesn't contain all the information needed to complete the schema? This is where structured clarification comes in. A common misconception is that clarification means "chatting until it feels right." In production systems, clarification is schema completion. If required fields are missing or ambiguous, the semantic API responds with: What information is missing A targeted clarification question The current schema state Example response: { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } The state object is the memory. The API itself remains stateless. A Complete Conversation Flow To illustrate how schema completion works in practice, here's a full conversation flow where the user's initial request is missing required information: Initial Request: User: "Show me cheaper alternatives with good quality" API Response (needs clarification): { "status": "needs_clarification", "missing_fields": ["reference_product_id"], "question": "Which product should I compare against?", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } Follow-up Request: User: "The blue backpack" Client sends: { "user_input": "The blue backpack", "state": { "intent": "recommend_similar", "entities": { "reference_product_id": null, "price_bias": -0.3, "quality_bias": 0.4 } } } API Response (complete): { "status": "complete", "canonical_request": { "intent": "recommend_similar", "entities": { "reference_product_id": "blue_backpack_123", "price_bias": -0.3, "quality_bias": 0.4 } } } The client passes the state back with each clarification. The API remains stateless, while the client manages the conversation context. Once complete, the canonical_request can be sent directly to the execution API. Why LangGraph Fits This Problem Perfectly With schemas and clarification flows defined, we need a way to orchestrate the semantic parsing workflow reliably. This is where LangGraph becomes valuable. LangGraph allows semantic parsing to be modeled as a structured, deterministic workflow with explicit decision points: Classify intent: Determine what the user wants to do from a predefined set of supported actions Extract candidate entities: Pull out relevant parameters from the natural language input using the LLM Merge into schema state: Map the extracted values into the canonical schema structure Validate required fields: Check if all mandatory fields are present and values are within acceptable ranges Either complete or request clarification: Return the canonical request if complete, or ask a targeted question if information is missing Each node has a single responsibility. Validation and routing are done in code, not by the LLM. LangGraph provides: Explicit state transitions Deterministic routing Observable execution Safe retries Used this way, it becomes a powerful orchestration tool, not a conversational agent. Confidence Gates Prevent Silent Failures Structured workflows handle the process, but there's another critical safety mechanism we need: knowing when the LLM isn't confident about its extraction. Even when outputs are structurally valid, they may not be reliable. We require the semantic layer to emit a confidence score. If confidence falls below a threshold, execution is blocked and clarification is requested. This simple rule eliminates an entire class of silent misinterpretations that are otherwise very hard to detect. Example: When a user says "Show me items similar to the bag", the LLM might extract: { "intent": "recommend_similar", "confidence": 0.55, "entities": { "reference_product_id": "generic_bag_001", "confidence_scores": { "reference_product_id": 0.4 } } } The overall confidence is low (0.55), and the entity confidence for reference_product_id is very low (0.4) because "the bag" is ambiguous. There might be hundreds of bags in the catalog. Instead of proceeding with a potentially wrong guess, the API responds: { "status": "needs_clarification", "reason": "low_confidence", "question": "I found multiple bags. Did you mean the blue backpack, the leather tote, or the travel duffel?", "confidence": 0.55 } This prevents the system from silently executing the wrong recommendation and provides a better user experience. Lightweight Ontologies (Keep Them in Code) Beyond confidence scoring, we need a way to normalize the variety of terms users might use into consistent canonical values. We also introduced lightweight, code-level ontologies: Allowed intents Required entities per intent Synonym-to-canonical mappings Cross-field validation rules These live in code and configuration, not in prompts. LLMs propose values. Code enforces meaning. Example: Consider these user inputs that all mean the same thing: "Show me cheaper options" "Find budget-friendly alternatives" "I want something more affordable" "Give me lower-priced items" The LLM might extract different values: "cheaper", "budget-friendly", "affordable", "lower-priced". The ontology maps all of these to a canonical value: PRICE_BIAS_SYNONYMS = { "cheaper": -0.7, "budget-friendly": -0.7, "affordable": -0.7, "lower-priced": -0.7, "expensive": 0.7, "premium": 0.7, "high-end": 0.7 } When the LLM extracts "budget-friendly", the code normalizes it to -0.7 for the price_bias field. Similarly, cross-field validation catches logical inconsistencies: if entities["price_bias"] < -0.5 and entities["quality_bias"] > 0.5: return clarification("You want cheaper items with higher quality. This might be difficult. Should I prioritize price or quality?") The LLM proposes. The ontology normalizes. The validation enforces business rules. What About Latency? A common concern with multi-step semantic parsing is performance. In practice, we observed: Intent classification: ~40 ms Entity extraction: ~200 ms Validation and routing: ~1 ms Total overhead: ~250–300 ms. For chat-driven user experiences, this is well within acceptable bounds and far cheaper than incorrect or inconsistent execution. Key Takeaways Let's bring it all together. If you're building APIs that accept natural language in production: Do not make language your API contract Translate language into canonical structure Own schema completion server-side Use LLMs for discovery and extraction, not execution Treat safety and determinism as first-class requirements Natural language is an input format. Structure is the contract. Closing Thoughts LLMs make it easy to build impressive demos. Building safe, reliable systems with them requires discipline. By separating semantic interpretation from execution, and by using tools like Azure OpenAI and LangGraph thoughtfully, you can build natural language-driven APIs that scale, evolve, and behave predictably in production. Hopefully, this architecture saves you a few painful iterations.Benchmarking Local AI Models
Introduction Selecting the right AI model for your application requires more than reading benchmark leaderboards. Published benchmarks measure academic capabilities, question answering, reasoning, coding, but your application has specific requirements: latency budgets, hardware constraints, quality thresholds. How do you know if Phi-4 provides acceptable quality for your document summarization use case? Will Qwen2.5-0.5B meet your 100ms response time requirement? Does your edge device have sufficient memory for Phi-3.5 Mini? The answer lies in empirical testing: running actual models on your hardware with your workload patterns. This article demonstrates building a comprehensive model benchmarking platform using FLPerformance, Node.js, React, and Microsoft Foundry Local. You'll learn how to implement scientific performance measurement, design meaningful benchmark suites, visualize multi-dimensional comparisons, and make data-driven model selection decisions. Whether you're evaluating models for production deployment, optimizing inference costs, or validating hardware specifications, this platform provides the tools for rigorous performance analysis. Why Model Benchmarking Requires Purpose-Built Tools You cannot assess model performance by running a few manual tests and noting the results. Scientific benchmarking demands controlled conditions, statistically significant sample sizes, multi-dimensional metrics, and reproducible methodology. Understand why purpose-built tooling is essential. Performance is multi-dimensional. A model might excel at throughput (tokens per second) but suffer at latency (time to first token). Another might generate high-quality outputs slowly. Your application might prioritize consistency over average performance, a model with variable response times (high p95/p99 latency) creates poor user experiences even if averages look good. Measuring all dimensions simultaneously enables informed tradeoffs. Hardware matters enormously. Benchmark results from NVIDIA A100 GPUs don't predict performance on consumer laptops. NPU acceleration changes the picture again. Memory constraints affect which models can even load. Test on your actual deployment hardware or comparable specifications to get actionable results. Concurrency reveals bottlenecks. A model handling one request excellently might struggle with ten concurrent requests. Real applications experience variable load, measuring only single-threaded performance misses critical scalability constraints. Controlled concurrency testing reveals these limits. Statistical rigor prevents false conclusions. Running a prompt once and noting the response time tells you nothing about performance distribution. Was this result typical? An outlier? You need dozens or hundreds of trials to establish p50/p95/p99 percentiles, understand variance, and detect stability issues. Comparison requires controlled experiments. Different prompts, different times of day, different system loads, all introduce confounding variables. Scientific comparison runs identical workloads across models sequentially, controlling for external factors. Architecture: Three-Layer Performance Testing Platform FLPerformance implements a clean separation between orchestration, measurement, and presentation: The frontend React application provides model management, benchmark configuration, test execution, and results visualization. Users add models from the Foundry Local catalog, configure benchmark parameters (iterations, concurrency, timeout values), launch test runs, and view real-time progress. The results dashboard displays comparison tables, latency distribution charts, throughput graphs, and "best model for..." recommendations. The backend Node.js/Express server orchestrates tests and captures metrics. It manages the single Foundry Local service instance, loads/unloads models as needed, executes benchmark suites with controlled concurrency, measures comprehensive metrics (TTFT, TPOT, total latency, throughput, error rates), and persists results to JSON storage. WebSocket connections provide real-time progress updates during long benchmark runs. Foundry Local SDK integration uses the official foundry-local-sdk npm package. The SDK manages service lifecycle, starting, stopping, health checkin, and handles model operations, downloading, loading into memory, unloading. It provides OpenAI-compatible inference APIs for consistent request formatting across models. The architecture supports simultaneous testing of multiple models by loading them one at a time, running identical benchmarks, and aggregating results for comparison: User Initiates Benchmark Run ↓ Backend receives {models: [...], suite: "default", iterations: 10} ↓ For each model: 1. Load model into Foundry Local 2. Execute benchmark suite - For each prompt in suite: * Run N iterations * Measure TTFT, TPOT, total time * Track errors and timeouts * Calculate tokens/second 3. Aggregate statistics (mean, p50, p95, p99) 4. Unload model ↓ Store results with metadata ↓ Return comparison data to frontend ↓ Visualize performance metrics Implementing Scientific Measurement Infrastructure Accurate performance measurement requires instrumentation that captures multiple dimensions without introducing measurement overhead: // src/server/benchmark.js import { performance } from 'perf_hooks'; export class BenchmarkExecutor { constructor(foundryClient, options = {}) { this.client = foundryClient; this.options = { iterations: options.iterations || 10, concurrency: options.concurrency || 1, timeout_ms: options.timeout_ms || 30000, warmup_iterations: options.warmup_iterations || 2 }; } async runBenchmarkSuite(modelId, prompts) { const results = []; // Warmup phase (exclude from results) console.log(`Running ${this.options.warmup_iterations} warmup iterations...`); for (let i = 0; i < this.options.warmup_iterations; i++) { await this.executePrompt(modelId, prompts[0].text); } // Actual benchmark runs for (const prompt of prompts) { console.log(`Benchmarking prompt: ${prompt.id}`); const measurements = []; for (let i = 0; i < this.options.iterations; i++) { const measurement = await this.executeMeasuredPrompt( modelId, prompt.text ); measurements.push(measurement); // Small delay between iterations to stabilize await sleep(100); } results.push({ prompt_id: prompt.id, prompt_text: prompt.text, measurements, statistics: this.calculateStatistics(measurements) }); } return { model_id: modelId, timestamp: new Date().toISOString(), config: this.options, results }; } async executeMeasuredPrompt(modelId, promptText) { const measurement = { success: false, error: null, ttft_ms: null, // Time to first token tpot_ms: null, // Time per output token total_ms: null, tokens_generated: 0, tokens_per_second: 0 }; try { const startTime = performance.now(); let firstTokenTime = null; let tokenCount = 0; // Streaming completion to measure TTFT const stream = await this.client.chat.completions.create({ model: modelId, messages: [{ role: 'user', content: promptText }], max_tokens: 200, temperature: 0.7, stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { if (firstTokenTime === null) { firstTokenTime = performance.now(); measurement.ttft_ms = firstTokenTime - startTime; } tokenCount++; } } const endTime = performance.now(); measurement.total_ms = endTime - startTime; measurement.tokens_generated = tokenCount; if (tokenCount > 1 && firstTokenTime) { // TPOT = time after first token / (tokens - 1) const timeAfterFirstToken = endTime - firstTokenTime; measurement.tpot_ms = timeAfterFirstToken / (tokenCount - 1); measurement.tokens_per_second = 1000 / measurement.tpot_ms; } measurement.success = true; } catch (error) { measurement.error = error.message; measurement.success = false; } return measurement; } calculateStatistics(measurements) { const successful = measurements.filter(m => m.success); const total = measurements.length; if (successful.length === 0) { return { success_rate: 0, error_rate: 1.0, sample_size: total }; } const ttfts = successful.map(m => m.ttft_ms).sort((a, b) => a - b); const tpots = successful.map(m => m.tpot_ms).filter(v => v !== null).sort((a, b) => a - b); const totals = successful.map(m => m.total_ms).sort((a, b) => a - b); const throughputs = successful.map(m => m.tokens_per_second).filter(v => v > 0); return { success_rate: successful.length / total, error_rate: (total - successful.length) / total, sample_size: total, ttft: { mean: mean(ttfts), median: percentile(ttfts, 50), p95: percentile(ttfts, 95), p99: percentile(ttfts, 99), min: Math.min(...ttfts), max: Math.max(...ttfts) }, tpot: tpots.length > 0 ? { mean: mean(tpots), median: percentile(tpots, 50), p95: percentile(tpots, 95) } : null, total_latency: { mean: mean(totals), median: percentile(totals, 50), p95: percentile(totals, 95), p99: percentile(totals, 99) }, throughput: { mean_tps: mean(throughputs), median_tps: percentile(throughputs, 50) } }; } } function mean(arr) { return arr.reduce((sum, val) => sum + val, 0) / arr.length; } function percentile(sortedArr, p) { const index = Math.ceil((sortedArr.length * p) / 100) - 1; return sortedArr[Math.max(0, index)]; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } This measurement infrastructure captures: Time to First Token (TTFT): Critical for perceived responsiveness—users notice delays before output begins Time Per Output Token (TPOT): Determines generation speed after first token—affects throughput Total latency: End-to-end time—matters for batch processing and high-volume scenarios Tokens per second: Overall throughput metric—useful for capacity planning Statistical distributions: Mean alone masks variability—p95/p99 reveal tail latencies that impact user experience Success/error rates: Stability metrics—some models timeout or crash under load Designing Meaningful Benchmark Suites Benchmark quality depends on prompt selection. Generic prompts don't reflect real application behavior. Design suites that mirror actual use cases: // benchmarks/suites/default.json { "name": "default", "description": "General-purpose benchmark covering diverse scenarios", "prompts": [ { "id": "short-factual", "text": "What is the capital of France?", "category": "factual", "expected_tokens": 5 }, { "id": "medium-explanation", "text": "Explain how photosynthesis works in 3-4 sentences.", "category": "explanation", "expected_tokens": 80 }, { "id": "long-reasoning", "text": "Analyze the economic factors that led to the 2008 financial crisis. Discuss at least 5 major causes with supporting details.", "category": "reasoning", "expected_tokens": 250 }, { "id": "code-generation", "text": "Write a Python function that finds the longest palindrome in a string. Include docstring and example usage.", "category": "coding", "expected_tokens": 150 }, { "id": "creative-writing", "text": "Write a short story (3 paragraphs) about a robot learning to paint.", "category": "creative", "expected_tokens": 200 } ] } This suite covers multiple dimensions: Length variation: Short (5 tokens), medium (80), long (250)—tests models across output ranges Task diversity: Factual recall, explanation, reasoning, code, creative—reveals capability breadth Token predictability: Expected token counts enable throughput calculations For production applications, create custom suites matching your actual workload: { "name": "customer-support", "description": "Simulates actual customer support queries", "prompts": [ { "id": "product-question", "text": "How do I reset my password for the customer portal?" }, { "id": "troubleshooting", "text": "I'm getting error code 503 when trying to upload files. What should I do?" }, { "id": "policy-inquiry", "text": "What is your refund policy for annual subscriptions?" } ] } Visualizing Multi-Dimensional Performance Comparisons Raw numbers don't reveal insights—visualization makes patterns obvious. The frontend implements several comparison views: Comparison Table shows side-by-side metrics: // frontend/src/components/ResultsTable.jsx export function ResultsTable({ results }) { return ( {results.map(result => ( ))} Model TTFT (ms) TPOT (ms) Throughput (tok/s) P95 Latency Error Rate {result.model_id} {result.stats.ttft.median.toFixed(0)} (p95: {result.stats.ttft.p95.toFixed(0)}) {result.stats.tpot?.median.toFixed(1) || 'N/A'} {result.stats.throughput.median_tps.toFixed(1)} {result.stats.total_latency.p95.toFixed(0)} ms 0.05 ? 'error' : 'success'}> {(result.stats.error_rate * 100).toFixed(1)}% ); } Latency Distribution Chart reveals performance consistency: // Using Chart.js for visualization export function LatencyChart({ results }) { const data = { labels: results.map(r => r.model_id), datasets: [ { label: 'Median (p50)', data: results.map(r => r.stats.total_latency.median), backgroundColor: 'rgba(75, 192, 192, 0.5)' }, { label: 'p95', data: results.map(r => r.stats.total_latency.p95), backgroundColor: 'rgba(255, 206, 86, 0.5)' }, { label: 'p99', data: results.map(r => r.stats.total_latency.p99), backgroundColor: 'rgba(255, 99, 132, 0.5)' } ] }; return ( ); } Recommendations Engine synthesizes multi-dimensional comparison: export function generateRecommendations(results) { const recommendations = []; // Find fastest TTFT (best perceived responsiveness) const fastestTTFT = results.reduce((best, r) => r.stats.ttft.median < best.stats.ttft.median ? r : best ); recommendations.push({ category: 'Fastest Response', model: fastestTTFT.model_id, reason: `Lowest median TTFT: ${fastestTTFT.stats.ttft.median.toFixed(0)}ms` }); // Find highest throughput const highestThroughput = results.reduce((best, r) => r.stats.throughput.median_tps > best.stats.throughput.median_tps ? r : best ); recommendations.push({ category: 'Best Throughput', model: highestThroughput.model_id, reason: `Highest tok/s: ${highestThroughput.stats.throughput.median_tps.toFixed(1)}` }); // Find most consistent (lowest p95-p50 spread) const mostConsistent = results.reduce((best, r) => { const spread = r.stats.total_latency.p95 - r.stats.total_latency.median; const bestSpread = best.stats.total_latency.p95 - best.stats.total_latency.median; return spread < bestSpread ? r : best; }); recommendations.push({ category: 'Most Consistent', model: mostConsistent.model_id, reason: 'Lowest latency variance (p95-p50 spread)' }); return recommendations; } Key Takeaways and Benchmarking Best Practices Effective model benchmarking requires scientific methodology, comprehensive metrics, and application-specific testing. FLPerformance demonstrates that rigorous performance measurement is accessible to any development team. Critical principles for model evaluation: Test on target hardware: Results from cloud GPUs don't predict laptop performance Measure multiple dimensions: TTFT, TPOT, throughput, consistency all matter Use statistical rigor: Single runs mislead—capture distributions with adequate sample sizes Design realistic workloads: Generic benchmarks don't predict your application's behavior Include warmup iterations: Model loading and JIT compilation affect early measurements Control concurrency: Real applications handle multiple requests—test at realistic loads Document methodology: Reproducible results require documented procedures and configurations The complete benchmarking platform with model management, measurement infrastructure, visualization dashboards, and comprehensive documentation is available at github.com/leestott/FLPerformance. Clone the repository and run the startup script to begin evaluating models on your hardware. Resources and Further Reading FLPerformance Repository - Complete benchmarking platform Quick Start Guide - Setup and first benchmark run Microsoft Foundry Local Documentation - SDK reference and model catalog Architecture Guide - System design and SDK integration Benchmarking Best Practices - Methodology and troubleshootingLeverage AI for faster, more productive coding with GitHub Copilot
GitHub Copilot serves as an invaluable learning tool for developers, especially those who are still learning a particular programming language or framework. By providing context-aware suggestions, it helps learners understand the syntax, structure, and logic behind different code snippets. While fundamental knowledge of programming concepts and syntax services should remain a prerequisite, a developer exploring a new framework would be able to describe the functionality they want to implement, and GitHub Copilot will generate code that demonstrates how to achieve it. This accelerates the learning process and empowers developers to gain proficiency in new technologies more efficiently.Exploring the Future of AI Agents with Microsoft Foundry
Why Agentic AI Matters AI agents are no longer a distant vision—they’re here and transforming how businesses operate. According to industry analysts: Over 1 billion AI agents are expected to be in use by 2028. 80% of organisations plan to integrate agents within the next 2–3 years. By 2026, 40% of enterprise apps will include task-specific AI agents. Why this surge? Agents address critical challenges such as inefficiencies in manual processes, human error, lack of visibility, and scalability issues. They enable autonomous decision-making, with projections suggesting that by 2028, half of day-to-day work decisions will be made autonomously. From Chatbots to Intelligent Agents As Mary Joe highlighted, early chatbots relied on rigid rules and regular expressions, often leading to frustrating user experiences. The introduction of large language models (LLMs) changed the game, making interactions more natural. But true autonomy, where systems act on our behalf, required more than conversational AI. Agentic AI combines: Reasoning and planning capabilities. Tools and APIs for real-world actions. Memory for learning and improving over time. This evolution moves us beyond simple input-output interactions to intelligent systems that can execute workflows, validate data, and deliver outcomes. Microsoft Foundry: Your Platform for Building Agents Microsoft Foundry offers a Platform-as-a-Service (PaaS) approach for creating AI agents, striking a balance between control and ease of use. Key components include: Model Catalogue: Access models from OpenAI, Anthropic, Mistral, and more. Foundry Agent Service: Build and customise agents with integrated tools. Foundry IQ: Knowledge grounding for accurate responses. Control Plane: Ensures safety, trust, and observability in production. Whether you need full control (Infrastructure-as-a-Service) or simplicity (Software-as-a-Service via Copilot Studio), Foundry provides flexibility for diverse scenarios. What Makes an AI Solution Agentic? Unlike traditional AI apps that perform narrow tasks (e.g., extracting text from receipts), agentic solutions: Analyse inputs using LLMs and system instructions. Integrate tools for actions like file search, code execution, or API calls. Retain memory for contextual learning. Operate autonomously across workflows. Real-World Use Cases Agentic AI unlocks new possibilities across industries: Expense Management: Automate claims and approvals. Employee Onboarding: Personalised learning paths and skills navigation. Customer Support: Intelligent assistants for FAQs and troubleshooting. Data Analytics: Interactive insights and reporting with Fabric agents. Multi-agent systems can coordinate complex tasks, with specialised agents handling subtasks under a central orchestrator. Getting Started with Microsoft Foundry Creating your first agent is simple: Sign in at https://ai.azure.com and create a Foundry project. Select a model (e.g., GPT-4.1 mini) and configure deployment options. Customise instructions to define your agent’s persona and tasks. Add tools like file search or code interpreter for extended functionality. Test and iterate using the agent playground, then export code to Visual Studio Code for deployment. For detailed guidance, explore the https://learn.microsoft.com/training. Follow the skilling plan for this series Plans | Microsoft Learn Get started with AI Agents https://aka.ms/ai-agents-fundamentals Join the Community Stay connected and keep learning: Discord: Engage with developers building agents. https://aka.ms/foundry/discord GitHub Discussions: Share ideas and troubleshoot. https://aka.ms/foundrydevs Office Hours: Get direct support from product teams. Final Thoughts Agentic AI is reshaping the way we work, enabling systems to act, learn, and collaborate. With Microsoft Foundry, developers have the tools to build secure, scalable, and intelligent agents today not tomorrow. Join the sessions at https://aka.ms/AzureSkilling-Ignite/25AI Dev Days 2025: Your Gateway to the Future of AI Development
What’s in Store? Day 1 – 10 December: Video Link Building AI Applications with Azure, GitHub, and Foundry Explore cutting-edge topics like: Agentic DevOps Azure SRE Agent Microsoft Foundry MCP Models for AI innovation Day 2 – 11 December Agenda: Video Link Using AI to Boost Developer Productivity Get hands-on with: Agent HQ VS Code & Visual Studio 2026 GitHub Copilot Coding Agent App Modernisation Strategies Why Join? Hands-on Labs: Apply the latest product features immediately. Highlights from Microsoft Ignite & GitHub Universe 2025: Stay ahead of the curve. Global Reach: Local-language workshops for LATAM and EMEA coming soon. You’ll recognise plenty of familiar faces in the lineup – don’t miss the chance to connect and learn from the best! 👉 Register now and share widely across your networks – there’s truly something for everyone! https://aka.ms/ai-dev-daysAzure Skilling at Microsoft Ignite 2025
The energy at Microsoft Ignite was unmistakable. Developers, architects, and technical decision-makers converged in San Francisco to explore the latest innovations in cloud technology, AI applications, and data platforms. Beyond the keynotes and product announcements was something even more valuable: an integrated skilling ecosystem designed to transform how you build with Azure. This year Azure Skilling at Microsoft Ignite 2025 brought together distinct learning experiences, over 150+ hands-on labs, and multiple pathways to industry-recognized credentials—all designed to help you master skills that matter most in today's AI-driven cloud landscape. Just Launched at Ignite Microsoft Ignite 2025 offered an exceptional array of learning opportunities, each designed to meet developers anywhere on the skilling journey. Whether you joined us in-person or on-demand in the virtual experience, multiple touchpoints are available to deepen your Azure expertise. Ignite 2025 is in the books, but you can still engage with the latest Microsoft skilling opportunities, including: The Azure Skills Challenge provides a gamified learning experience that lets you compete while completing task-based achievements across Azure's most critical technologies. These challenges aren't just about badges and bragging rights—they're carefully designed to help you advance technical skills and prepare for Microsoft role-based certifications. The competitive element adds urgency and motivation, turning learning into an engaging race against the clock and your peers. For those seeking structured guidance, Plans on Learn offer curated sets of content designed to help you achieve specific learning outcomes. These carefully assembled learning journeys include built-in milestones, progress tracking, and optional email reminders to keep you on track. Each plan represents 12-15 hours of focused learning, taking you from concept to capability in areas like AI application development, data platform modernization, or infrastructure optimization. The Microsoft Reactor Azure Skilling Series, running December 3-11, brings skilling to life through engaging video content, mixing regular programming with special Ignite-specific episodes. This series will deliver technical readiness and programming guidance in a livestream presentation that's more digestible than traditional documentation. Whether you're catching episodes live with interactive Q&A or watching on-demand later, you’ll get world-class instruction that makes complex topics approachable. Beyond Ignite: Your Continuous Learning Journey Here's the critical insight that separates Ignite attendees who transform their careers from those who simply collect swag: the real learning begins after the event ends. Microsoft Ignite is your launchpad, not your destination. Every module you start, every lab you complete, and every challenge you tackle connects to a comprehensive learning ecosystem on Microsoft Learn that's available 24/7, 365 days a year. Think of Ignite as your intensive immersion experience—the moment when you gain context, build momentum, and identify the skills that will have the biggest impact on your work. What you do in the weeks and months following determines whether that momentum compounds into career-defining expertise or dissipates into business as usual. For those targeting career advancement through formal credentials, Microsoft Certifications, Applied Skills and AI Skills Navigator, provide globally recognized validation of your expertise. Applied Skills focus on scenario-based competencies, demonstrating that you can build and deploy solutions, not simply answer theoretical questions. Certifications cover role-based scenarios for developers, data engineers, AI engineers, and solution architects. The assessment experiences include performance-based testing in dedicated Azure tenants where you complete real configuration and development tasks. And finally, the NEW AI Skills Navigator is an agentic learning space, bringing together AI-powered skilling experiences and credentials in a single, unified experience with Microsoft, LinkedIn Learning and GitHub – all in one spot Why This Matters: The Competitive Context The cloud skills race is intensifying. While our competitors offer robust training and content, Microsoft's differentiation comes not from having more content—though our 1.4 million module completions last fiscal year and 35,000+ certifications awarded speak to scale—but from integration of services to orchestrate workflows. Only Microsoft offers a truly unified ecosystem where GitHub Copilot accelerates your development, Azure AI services power your applications, and Azure platform services deploy and scale your solutions—all backed by integrated skilling content that teaches you to maximize this connected experience. When you continue your learning journey after Ignite, you're not just accumulating technical knowledge. You're developing fluency in an integrated development environment that no competitor can replicate. You're learning to leverage AI-powered development tools, cloud-native architectures, and enterprise-grade security in ways that compound each other's value. This unified expertise is what transforms individual developers into force-multipliers for their organizations. Start Now, Build Momentum, Never Stop Microsoft Ignite 2025 offered the chance to compress months of learning into days of intensive, hands-on experience, but you can still take part through the on-demand videos, the Global Ignite Skills Challenge, visiting the GitHub repos for the /Ignite25 labs, the Reactor Azure Skilling Series, and the curated Plans on Learn provide multiple entry points regardless of your current skill level or preferred learning style. But remember: the developers who extract the most value from Ignite are those who treat the event as the beginning, not the culmination, of their learning journey. They join hackathons, contribute to GitHub repositories, and engage with the Azure community on Discord and technical forums. The question isn't whether you'll learn something valuable from Microsoft Ignite 2025-that's guaranteed. The question is whether you'll convert that learning into sustained momentum that compounds over months and years into career-defining expertise. The ecosystem is here. The content is ready. Your skilling journey doesn't end when Ignite does—it accelerates.3.6KViews0likes0CommentsAMA Spotlight: Build Smarter with Azure Developer CLI 'AZD'
Weekly AMA 'Ask Me Anything': Build Smarter with Azure Developer CLI Calling all AI engineers, developers, and builders of the future, this is your backstage pass to the tools shaping scalable, agentic AI deployments. Join Kristen Womack, Product Manager for the Azure Developer CLI (azd) Developer CLI (azd), and the engineering team behind azd for a live Ask Me Anything session every Thursday at 12:30pm PT in the Azure AI Foundry Discord. Whether you're: 🧠 Orchestrating multi-agent systems 📦 Deploying LLM-powered apps with Azure AI Foundry 🔐 Navigating least-privilege infrastructure setups 🛠️ Debugging and optimizing reproducible workflows …this AMA is your chance to connect directly with the team building the CLI that powers it all. 💡 Why Join? Real-time answers from the azd engineers and product team Deployment walkthroughs for Foundry templates, from chatbots to document processors Tips for CI/CD, debugging, and reproducibility in enterprise environments Community-first mindset: bring your feedback, challenges, and ideas Kristen Womack brings deep insight into developer experience and product strategy; this is a rare opportunity to learn from the source and shape the future of AI tooling. 🔧 Get Ready Before you join: Install azd 👉 Install Guide Explore Kristen’s work 👉 www.kristenwomack.io Join the Discord 👉 Azure AI Foundry Community 🗓️ Weekly Schedule 🕧 Thursdays at 12:30pm PT 📍 Azure AI Foundry Discord Bring your questions. Bring your curiosity. Build with the best. Additional resources: check out the AZD for Beginners course https://aka.ms/azd-for-beginners