javascript
118 TopicsTeaching AI Development Through Gamification:
Introduction Learning AI development can feel overwhelming. Developers face abstract concepts like embeddings, prompt engineering, and workflow orchestration topics that traditional tutorials struggle to make tangible. How do you teach someone what an embedding "feels like" or why prompt engineering matters beyond theoretical examples? The answer lies in experiential learning through gamification. Instead of reading about AI concepts, what if developers could play a game that teaches these ideas through progressively challenging levels, immediate feedback, and real AI interactions? This article explores exactly that: building an educational adventure game that transforms AI learning from abstract theory into hands-on exploration. We'll dive into Foundry Local Learning Adventure, a JavaScript-based game that teaches AI fundamentals through five interactive levels. You'll learn how to create engaging educational experiences, integrate local AI models using Foundry Local, design progressive difficulty curves, and build cross-platform applications that run both in browsers and terminals. Whether you're an educator designing technical curriculum or a developer building learning tools, this architecture provides a proven blueprint for gamified technical education. Why Gamification Works for Technical Learning Traditional technical education follows a predictable pattern: read documentation, watch tutorials, attempt exercises, struggle with setup, eventually give up. The problem isn't content quality, it's engagement and friction. Gamification addresses both issues simultaneously. By framing learning as progression through levels, you create intrinsic motivation. Each completed challenge feels like unlocking a new ability in a game, triggering the same dopamine response that keeps players engaged in entertainment experiences. Progress is visible, achievements are celebrated, and setbacks feel like natural parts of the journey rather than personal failures. More importantly, gamification reduces friction. Instead of "install dependencies, configure API keys, read documentation, write code, debug errors," learners simply start the game and begin playing. The game handles setup, provides guardrails, and offers immediate feedback. When a concept clicks, the game celebrates it. When learners struggle, hints appear automatically. For AI development specifically, gamification solves a unique challenge: making probabilistic, non-deterministic systems feel approachable. Traditional programming has clear right and wrong answers, but AI outputs vary. A game can frame this variability as exploration rather than failure, teaching developers to evaluate AI responses critically while maintaining confidence. Architecture Overview: Dual-Platform Design for Maximum Reach The Foundry Local Learning Adventure implements a dual-platform architecture with separate but consistent implementations for web browsers and command-line terminals. This design maximizes accessibility, learners can start playing instantly in a browser, then graduate to CLI mode for the full terminal experience when they're ready to go deeper. The web version prioritizes zero-friction onboarding. It's deployed to GitHub Pages and can also be opened locally via a simple HTTP server, no build step, no package managers. The game starts with simulated AI responses in demo mode, but crucially, it also supports real AI responses when Foundry Local is installed. The web version auto-discovers Foundry Local's dynamic port through a foundry-port.json file (written by the startup scripts) or by scanning common ports. Progress saves to localStorage, badges unlock as you complete challenges, and an AI-powered mentor named Sage guides you through a chat widget in the corner. This version is perfect for classrooms, conference demos, and learners who want to try before committing to a full CLI setup. The CLI version provides the full terminal experience with real AI interactions. Built on Node.js with ES modules, this version features a custom FoundryLocalClient class that connects to Foundry Local's OpenAI-compatible REST API. Instead of relying on an external SDK, the game implements its own API client with automatic port discovery, model selection, and graceful fallback to demo mode. The terminal interface includes a rich command system ( play , hint , ask , explain , progress , badges ) and the Sage mentor provides contextual guidance throughout. Both versions implement the same five levels and learning objectives independently. The CLI uses game/src/game.js , levels.js , and mentor.js as ES modules, while the web version uses game/web/game-web.js and game-data.js . A key innovation is the automatic port discovery system, which eliminates manual configuration: // 3-tier port discovery strategy (game/src/game.js) class FoundryLocalClient { constructor() { this.commonPorts = [61341, 5272, 51319, 5000, 8080]; this.mode = 'demo'; // 'local', 'azure', or 'demo' } async initialize() { // Tier 1: CLI discovery - parse 'foundry service status' output const cliPort = await this.discoverPortViaCLI(); if (cliPort) { this.baseUrl = cliPort; this.mode = 'local'; return; } // Tier 2: Try configured URL from config.json if (await this.tryFoundryUrl(config.foundryLocal.baseUrl)) { this.mode = 'local'; return; } // Tier 3: Scan common ports for (const port of this.commonPorts) { if (await this.tryFoundryUrl(`http://127.0.0.1:${port}`)) { this.mode = 'local'; return; } } // Fallback: demo mode with simulated responses console.log('💡 Running in demo mode (no Foundry Local detected)'); this.mode = 'demo'; } async chat(messages, options = {}) { if (this.mode === 'demo') return this.getDemoResponse(messages); const response = await fetch(`${this.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: this.selectedModel, messages, temperature: options.temperature || 0.7, max_tokens: options.max_tokens || 300 }) }); const data = await response.json(); return data.choices[0].message.content; } } This architecture demonstrates several key principles for educational software: Progressive disclosure: Start simple (web demo mode), add complexity optionally (real AI via Foundry Local or Azure) Consistent learning outcomes: Both platforms teach the same five concepts through independently implemented but equivalent experiences Zero barriers to entry: No installation required for the web version eliminates the #1 reason learners abandon technical tutorials Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Graceful degradation: Three connection modes (local, Azure, demo) ensure the game always works regardless of setup Level Design: Teaching AI Concepts Through Progressive Challenges The game's five levels form a carefully designed curriculum that builds AI understanding incrementally. Each level introduces one core concept, provides hands-on practice, and validates learning before proceeding. Level 1: Meet the Model teaches the fundamental request-response pattern. Learners send their first message to an AI and see it respond. The challenge is deliberately trivial, just say hello, because the goal is building confidence. The level succeeds when the learner realizes "I can talk to an AI and it understands me." This moment of agency sets the foundation for everything else. The implementation focuses on positive reinforcement. In the CLI version, the Sage mentor celebrates each completion with contextual messages, while the web version displays inline celebration banners with badge animations: // Level 1 execution (game/src/game.js) async executeLevel1() { const level = this.levels.getLevel(1); this.displayLevelHeader(level); // Sage introduces the level const intro = await this.mentor.introduceLevel(1); console.log(`\n🧙 Sage: ${intro}`); const userPrompt = await this.askQuestion('\nYour prompt: '); console.log('\n🤖 AI is thinking...'); const response = await this.client.chat([ { role: 'system', content: 'You are Sage, a friendly AI mentor.' }, { role: 'user', content: userPrompt } ]); console.log(`\n📨 AI Response:\n${response}`); if (response && response.length > 10) { // Sage celebrates const celebration = await this.mentor.celebrateLevelComplete(1); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n🎯 You earned the Prompt Apprentice badge!'); console.log('🏆 +100 points'); this.progress.completeLevel(1, 100, '🎯 Prompt Apprentice'); } } This celebration pattern repeats throughout, explicit acknowledgment of success via the Sage mentor, explanation of what was learned, and a preview of what's next. The mentor system ( game/src/mentor.js ) provides contextual encouragement using AI-generated or pre-written fallback messages, transforming abstract concepts into concrete achievements. Level 2: Prompt Mastery introduces prompt quality through comparison. The game presents a deliberately poor prompt: "tell me stuff about coding." Learners must rewrite it to be specific, contextual, and actionable. The game runs both prompts, displays results side-by-side, and asks learners to evaluate the difference. // Level 2: Prompt Improvement (game/src/game.js) async executeLevel2() { const level = this.levels.getLevel(2); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(2); console.log(`\n🧙 Sage: ${intro}`); // Show the bad prompt const badPrompt = "tell me stuff about coding"; console.log(`\n❌ Poor prompt: "${badPrompt}"`); console.log('\n🤖 Getting response to bad prompt...'); const badResponse = await this.client.chat([ { role: 'user', content: badPrompt } ]); console.log(`\n📊 Bad prompt result:\n${badResponse}`); // Get the learner's improved version console.log('\n✍️ Now write a BETTER prompt about the same topic:'); const goodPrompt = await this.askQuestion('Your improved prompt: '); console.log('\n🤖 Getting response to your prompt...'); const goodResponse = await this.client.chat([ { role: 'user', content: goodPrompt } ]); console.log(`\n📊 Your prompt result:\n${goodResponse}`); // Evaluate: improved prompt should be longer and more specific const isImproved = goodPrompt.length > badPrompt.length && goodResponse.length > 0; if (isImproved) { const celebration = await this.mentor.celebrateLevelComplete(2); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n✨ You earned the Prompt Engineer badge!'); console.log('🏆 +150 points'); this.progress.completeLevel(2, 150, '✨ Prompt Engineer'); } else { const hint = await this.mentor.provideHint(2); console.log(`\n💡 Sage: ${hint}`); } } This comparative approach is powerful, learners don't just read about prompt engineering, they experience its impact directly. The before/after comparison makes quality differences undeniable. Level 3: Embeddings Explorer demystifies semantic search through practical demonstration. Learners search a knowledge base about Foundry Local using natural language queries. The game shows how embedding similarity works by returning relevant content even when exact keywords don't match. // Level 3: Embedding Search (game/src/game.js) async executeLevel3() { const level = this.levels.getLevel(3); this.displayLevelHeader(level); // Knowledge base loaded from game/data/knowledge-base.json const knowledgeBase = [ { id: 1, content: "Foundry Local runs AI models entirely on your device" }, { id: 2, content: "Embeddings convert text into numerical vectors" }, { id: 3, content: "Cosine similarity measures how related two texts are" }, // ... more entries about AI and Foundry Local ]; const query = await this.askQuestion('\n🔍 Search query: '); // Get embedding for user's query const queryEmbedding = await this.client.getEmbedding(query); // Get embeddings for all knowledge base entries const results = []; for (const item of knowledgeBase) { const itemEmbedding = await this.client.getEmbedding(item.content); const similarity = this.cosineSimilarity(queryEmbedding, itemEmbedding); results.push({ ...item, similarity }); } // Sort by similarity and show top matches results.sort((a, b) => b.similarity - a.similarity); console.log('\n📑 Top matches:'); results.slice(0, 3).forEach((r, i) => { console.log(` ${i + 1}. (${(r.similarity * 100).toFixed(1)}%) ${r.content}`); }); } // Cosine similarity calculation (also in TaskHandler) cosineSimilarity(a, b) { const dot = a.reduce((sum, val, i) => sum + val * b[i], 0); const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); return dot / (magA * magB); } // Demo mode generates pseudo-embeddings when Foundry isn't available getPseudoEmbedding(text) { // 128-dimension hash-based vector for offline demonstration const embedding = new Array(128).fill(0); for (let i = 0; i < text.length; i++) { embedding[i % 128] += text.charCodeAt(i) / 1000; } return embedding; } Learners query things like "How do I run AI offline?" and discover content about Foundry Local's offline capabilities—even though the word "offline" appears nowhere in the result. When Foundry Local is running, the game calls the /v1/embeddings endpoint for real vector representations. In demo mode, a pseudo-embedding function generates 128-dimension hash-based vectors that still demonstrate the concept of similarity search. This concrete demonstration of semantic understanding beats any theoretical explanation. Level 4: Workflow Wizard teaches AI pipeline composition. Learners build a three-step workflow: summarize text → extract keywords → generate questions. Each step uses the previous output as input, demonstrating how complex AI tasks decompose into chains of simpler operations. // Level 4: Workflow Builder (game/src/game.js) async executeLevel4() { const level = this.levels.getLevel(4); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(4); console.log(`\n🧙 Sage: ${intro}`); console.log('\n📝 Enter text for the 3-step AI pipeline:'); const inputText = await this.askQuestion('Input text: '); // Step 1: Summarize console.log('\n⚙️ Step 1: Summarizing...'); const summary = await this.client.chat([ { role: 'system', content: 'Summarize this in 2 sentences.' }, { role: 'user', content: inputText } ]); console.log(` Result: ${summary}`); // Step 2: Extract keywords (chained from Step 1 output) console.log('\n🔑 Step 2: Extracting keywords...'); const keywords = await this.client.chat([ { role: 'system', content: 'Extract 5 important keywords.' }, { role: 'user', content: summary } ]); console.log(` Keywords: ${keywords}`); // Step 3: Generate questions (chained from Step 2 output) console.log('\n❓ Step 3: Generating study questions...'); const questions = await this.client.chat([ { role: 'system', content: 'Create 3 quiz questions about these topics.' }, { role: 'user', content: keywords } ]); console.log(` Questions:\n${questions}`); console.log('\n✅ Workflow complete!'); const celebration = await this.mentor.celebrateLevelComplete(4); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n⚡ You earned the Workflow Wizard badge!'); console.log('🏆 +250 points'); this.progress.completeLevel(4, 250, '⚡ Workflow Wizard'); } This level bridges the gap between "toy examples" and real applications. Learners see firsthand how combining simple AI operations creates sophisticated functionality. Level 5: Build Your Own Tool challenges learners to create a custom AI-powered tool by selecting from pre-built templates and configuring them. Rather than asking learners to write arbitrary code, the game provides four structured templates that demonstrate how AI tools work in practice: // Level 5: Tool Builder templates (game/web/game-web.js) const TOOL_TEMPLATES = [ { id: 'summarizer', name: '📝 Text Summarizer', description: 'Summarizes long text into key points', systemPrompt: 'You are a text summarization tool. Provide concise summaries.', exampleInput: 'Paste any long article or document...' }, { id: 'translator', name: '🌐 Code Translator', description: 'Translates code between programming languages', systemPrompt: 'You are a code translation tool. Convert code accurately.', exampleInput: 'function hello() { console.log("Hello!"); }' }, { id: 'reviewer', name: '🔍 Code Reviewer', description: 'Reviews code for bugs, style, and improvements', systemPrompt: 'You are a code review tool. Identify issues and suggest fixes.', exampleInput: 'Paste code to review...' }, { id: 'custom', name: '✨ Custom Tool', description: 'Design your own AI tool with a custom system prompt', systemPrompt: '', // Learner provides this exampleInput: '' } ]; // Tool testing sends the configured system prompt + user input to Foundry Local async function testTool(template, userInput) { const response = await callFoundryAPI([ { role: 'system', content: template.systemPrompt }, { role: 'user', content: userInput } ]); console.log(`🔧 Tool output: ${response}`); return response; } This template-based approach is safer and more educational than arbitrary code execution. Learners select a template, customize its system prompt, test it with sample input, and see how the AI responds differently based on the tool's configuration. The "Custom Tool" option lets advanced learners design their own system prompts from scratch. Completing this level marks true understanding—learners aren't just using AI, they're shaping what it can do through prompt design and tool composition. Building the Web Version: Zero-Install Educational Experience The web version demonstrates how to create educational software that requires absolutely zero setup. This is critical for workshops, classroom settings, and casual learners who won't commit to installation until they see value. The architecture is deliberately simple, vanilla JavaScript with ES6 modules, no build tools, no package managers. The HTML includes a multi-screen layout with a welcome screen, level selection grid, game area, and modals for progress, badges, help, and game completion: <!-- game/web/index.html --> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Foundry Local Learning Adventure</title> <link rel="stylesheet" href="styles.css"> </head> <body> <!-- Welcome Screen with name input --> <div id="welcome-screen" class="screen active"> <h1>🎮 Foundry Local Learning Adventure</h1> <p>Master Microsoft Foundry AI - One Level at a Time!</p> <input type="text" id="player-name" placeholder="Enter your name"> <button id="start-btn">Start Adventure</button> <div id="foundry-status"><!-- Auto-detected connection status --></div> </div> <!-- Menu Screen with level grid --> <div id="menu-screen" class="screen"> <div class="level-grid"> <!-- 5 level cards with lock/unlock states --> </div> <div class="stats-bar"> <span id="points-display">0 points</span> <span id="badges-count">0/5 badges</span> </div> </div> <!-- Level Screen with task area --> <div id="level-screen" class="screen"> <div id="level-header"></div> <div id="task-area"><!-- Level-specific UI loads here --></div> <div id="response-area"></div> <div id="hint-area"></div> </div> <!-- Sage Mentor Chat Widget (fixed bottom-right) --> <div id="mentor-chat" class="mentor-widget"> <div class="mentor-header">🧙 Sage (AI Mentor)</div> <div id="mentor-messages"></div> <input type="text" id="mentor-input" placeholder="Ask Sage anything..."> </div> <script type="module" src="game-data.js"></script> <script type="module" src="game-web.js"></script> </body> </html> A critical feature of the web version is its ability to connect to a real Foundry Local instance. On startup, the game checks for a foundry-port.json file (written by the cross-platform start scripts) and falls back to scanning common ports: // game/web/game-web.js - Foundry Local auto-discovery let foundryConnection = { connected: false, baseUrl: null }; async function checkFoundryConnection() { // Try reading port from discovery file (written by start scripts) const discoveredPort = await readDiscoveredPort(); if (discoveredPort) { try { const resp = await fetch(`${discoveredPort}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: discoveredPort }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { /* continue to port scan */ } } // Scan common Foundry Local ports const ports = [61341, 5272, 51319, 5000, 8080]; for (const port of ports) { try { const resp = await fetch(`http://127.0.0.1:${port}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: `http://127.0.0.1:${port}` }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { continue; } } // Demo mode - use simulated responses from DEMO_RESPONSES updateStatusBadge('🟡 Demo Mode (install Foundry Local for real AI)'); } async function callFoundryAPI(messages) { if (!foundryConnection.connected) { return getDemoResponse(messages); // Simulated responses } const resp = await fetch(`${foundryConnection.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'auto', messages, temperature: 0.7 }) }); const data = await resp.json(); return data.choices[0].message.content; } The web version also includes level-specific UIs: each level type has its own builder function that constructs the appropriate interface. For example, Level 2 (Prompt Improvement) shows a split-view with the bad prompt result on one side and the learner's improved prompt on the other. Level 3 (Embeddings) presents a search interface with similarity scores. Level 5 (Tool Builder) offers a template selector with four options (Text Summarizer, Code Translator, Code Reviewer, and Custom). This architecture teaches several patterns for web-based educational tools: LocalStorage for persistence: Progress survives page refreshes without requiring accounts or databases ES6 modules for organization: Clean separation between game data ( game-data.js ) and engine ( game-web.js ) Hybrid AI mode: Real AI when Foundry Local is available, simulated responses when it's not—same code path for both Multi-screen navigation: Welcome, menu, level, and completion screens provide clear progression Always-available mentor: The Sage chat widget in the corner lets learners ask questions at any point Implementing the CLI Version with Real AI Integration The CLI version provides the authentic AI development experience. This version requires Node.js and Foundry Local, but rewards setup effort with genuine model interactions. Installation uses a startup script that handles prerequisites: #!/bin/bash # scripts/start-game.sh echo "🎮 Starting Foundry Local Learning Adventure..." # Check Node.js if ! command -v node &> /dev/null; then echo "❌ Node.js not found. Install from https://nodejs.org/" exit 1 fi # Check Foundry Local if ! command -v foundry &> /dev/null; then echo "❌ Foundry Local not found." echo " Install: winget install Microsoft.FoundryLocal" exit 1 fi # Start Foundry service echo "🚀 Starting Foundry Local service..." foundry service start # Wait for service sleep 2 # Load model echo "📦 Loading Phi-4 model..." foundry model load phi-4 # Install dependencies echo "📥 Installing game dependencies..." npm install # Start game echo "✅ Launching game..." npm start The game logic integrates with Foundry Local using the official SDK: // game/src/game.js import { FoundryLocalClient } from 'foundry-local-sdk'; import readline from 'readline/promises'; const client = new FoundryLocalClient({ endpoint: 'http://127.0.0.1:5272' // Default Foundry Local port }); async function getAIResponse(prompt, level) { try { const startTime = Date.now(); const completion = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are Sage, a friendly AI mentor teaching ${LEVELS[level-1].title}.` }, { role: 'user', content: prompt } ], temperature: 0.7, max_tokens: 300 }); const latency = Date.now() - startTime; console.log(`\n⏱️ AI responded in ${latency}ms`); return completion.choices[0].message.content; } catch (error) { console.error('❌ AI error:', error.message); console.log('💡 Falling back to demo mode...'); return getDemoResponse(prompt, level); } } async function playLevel(levelNumber) { const level = LEVELS[levelNumber - 1]; console.clear(); console.log(`\n${'='.repeat(60)}`); console.log(` Level ${levelNumber}: ${level.title}`); console.log(`${'='.repeat(60)}\n`); console.log(`🎯 ${level.objective}\n`); console.log(`📚 ${level.description}\n`); const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const userPrompt = await rl.question('Your prompt: '); rl.close(); console.log('\n🤖 AI is thinking...'); const response = await getAIResponse(userPrompt, levelNumber); console.log(`\n📨 AI Response:\n${response}\n`); // Evaluate success if (level.successCriteria(response, userPrompt)) { celebrateSuccess(level); updateProgress(levelNumber); if (levelNumber < 5) { const playNext = await askYesNo('Play next level?'); if (playNext) { await playLevel(levelNumber + 1); } } else { showGameComplete(); } } else { console.log(`\n💡 Hint: ${level.hints[0]}\n`); const retry = await askYesNo('Try again?'); if (retry) { await playLevel(levelNumber); } } } The CLI version adds several enhancements that deepen learning: Latency visibility: Display response times so learners understand local vs cloud performance differences Graceful fallback: If Foundry Local fails, switch to demo mode automatically rather than crashing Interactive prompts: Use readline for natural command-line interaction patterns Progress persistence: Save to JSON files so learners can pause and resume Command history: Log all prompts and responses for learners to review their progression Key Takeaways and Educational Design Principles Building effective educational software for technical audiences requires balancing several competing concerns: accessibility vs authenticity, simplicity vs depth, guidance vs exploration. The Foundry Local Learning Adventure succeeds by making deliberate architectural choices that prioritize learner experience. Key principles demonstrated: Zero-friction starts win: The web version eliminates all setup barriers, maximizing the chance learners will actually begin Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Progressive challenge curves build confidence: Each level introduces exactly one new concept, building on previous knowledge Immediate feedback accelerates learning: Learners know instantly if they succeeded, with Sage providing contextual explanations Real tools create transferable skills: The CLI version uses professional developer patterns (OpenAI-compatible REST APIs, ES modules, readline) that apply beyond the game Celebration creates emotional investment: Badges, points, and Sage's encouragement transform learning into achievement Dual platforms expand reach: Web attracts casual learners, CLI converts them to serious practitioners—and both support real AI Graceful degradation ensures reliability: Three connection modes (local, Azure, demo) mean the game always works regardless of setup To extend this approach for your own educational projects, consider: Domain-specific challenges: Adapt level structure to your technical domain (e.g., API design, database optimization, security practices) Multiplayer competitions: Add leaderboards and time trials to introduce social motivation Adaptive difficulty: Track learner performance and adjust challenge difficulty dynamically Sandbox modes: After completing the curriculum, provide free-play areas for experimentation Community sharing: Let learners share custom levels or challenges they've created The complete implementation with all levels, both web and CLI versions, comprehensive tests, and deployment guides is available at github.com/leestott/FoundryLocal-LearningAdventure. You can play the web version immediately at leestott.github.io/FoundryLocal-LearningAdventure or clone the repository to experience the full CLI version with real AI. Resources and Further Reading Foundry Local Learning Adventure Repository - Complete source code for both web and CLI versions Play Online Now - Try the web version instantly in your browser (supports real AI with Foundry Local installed) Microsoft Foundry Local Documentation - Official SDK and CLI reference Contributing Guide - How to contribute new levels or improvementsBenchmarking Local AI Models
Introduction Selecting the right AI model for your application requires more than reading benchmark leaderboards. Published benchmarks measure academic capabilities, question answering, reasoning, coding, but your application has specific requirements: latency budgets, hardware constraints, quality thresholds. How do you know if Phi-4 provides acceptable quality for your document summarization use case? Will Qwen2.5-0.5B meet your 100ms response time requirement? Does your edge device have sufficient memory for Phi-3.5 Mini? The answer lies in empirical testing: running actual models on your hardware with your workload patterns. This article demonstrates building a comprehensive model benchmarking platform using FLPerformance, Node.js, React, and Microsoft Foundry Local. You'll learn how to implement scientific performance measurement, design meaningful benchmark suites, visualize multi-dimensional comparisons, and make data-driven model selection decisions. Whether you're evaluating models for production deployment, optimizing inference costs, or validating hardware specifications, this platform provides the tools for rigorous performance analysis. Why Model Benchmarking Requires Purpose-Built Tools You cannot assess model performance by running a few manual tests and noting the results. Scientific benchmarking demands controlled conditions, statistically significant sample sizes, multi-dimensional metrics, and reproducible methodology. Understand why purpose-built tooling is essential. Performance is multi-dimensional. A model might excel at throughput (tokens per second) but suffer at latency (time to first token). Another might generate high-quality outputs slowly. Your application might prioritize consistency over average performance, a model with variable response times (high p95/p99 latency) creates poor user experiences even if averages look good. Measuring all dimensions simultaneously enables informed tradeoffs. Hardware matters enormously. Benchmark results from NVIDIA A100 GPUs don't predict performance on consumer laptops. NPU acceleration changes the picture again. Memory constraints affect which models can even load. Test on your actual deployment hardware or comparable specifications to get actionable results. Concurrency reveals bottlenecks. A model handling one request excellently might struggle with ten concurrent requests. Real applications experience variable load, measuring only single-threaded performance misses critical scalability constraints. Controlled concurrency testing reveals these limits. Statistical rigor prevents false conclusions. Running a prompt once and noting the response time tells you nothing about performance distribution. Was this result typical? An outlier? You need dozens or hundreds of trials to establish p50/p95/p99 percentiles, understand variance, and detect stability issues. Comparison requires controlled experiments. Different prompts, different times of day, different system loads, all introduce confounding variables. Scientific comparison runs identical workloads across models sequentially, controlling for external factors. Architecture: Three-Layer Performance Testing Platform FLPerformance implements a clean separation between orchestration, measurement, and presentation: The frontend React application provides model management, benchmark configuration, test execution, and results visualization. Users add models from the Foundry Local catalog, configure benchmark parameters (iterations, concurrency, timeout values), launch test runs, and view real-time progress. The results dashboard displays comparison tables, latency distribution charts, throughput graphs, and "best model for..." recommendations. The backend Node.js/Express server orchestrates tests and captures metrics. It manages the single Foundry Local service instance, loads/unloads models as needed, executes benchmark suites with controlled concurrency, measures comprehensive metrics (TTFT, TPOT, total latency, throughput, error rates), and persists results to JSON storage. WebSocket connections provide real-time progress updates during long benchmark runs. Foundry Local SDK integration uses the official foundry-local-sdk npm package. The SDK manages service lifecycle, starting, stopping, health checkin, and handles model operations, downloading, loading into memory, unloading. It provides OpenAI-compatible inference APIs for consistent request formatting across models. The architecture supports simultaneous testing of multiple models by loading them one at a time, running identical benchmarks, and aggregating results for comparison: User Initiates Benchmark Run ↓ Backend receives {models: [...], suite: "default", iterations: 10} ↓ For each model: 1. Load model into Foundry Local 2. Execute benchmark suite - For each prompt in suite: * Run N iterations * Measure TTFT, TPOT, total time * Track errors and timeouts * Calculate tokens/second 3. Aggregate statistics (mean, p50, p95, p99) 4. Unload model ↓ Store results with metadata ↓ Return comparison data to frontend ↓ Visualize performance metrics Implementing Scientific Measurement Infrastructure Accurate performance measurement requires instrumentation that captures multiple dimensions without introducing measurement overhead: // src/server/benchmark.js import { performance } from 'perf_hooks'; export class BenchmarkExecutor { constructor(foundryClient, options = {}) { this.client = foundryClient; this.options = { iterations: options.iterations || 10, concurrency: options.concurrency || 1, timeout_ms: options.timeout_ms || 30000, warmup_iterations: options.warmup_iterations || 2 }; } async runBenchmarkSuite(modelId, prompts) { const results = []; // Warmup phase (exclude from results) console.log(`Running ${this.options.warmup_iterations} warmup iterations...`); for (let i = 0; i < this.options.warmup_iterations; i++) { await this.executePrompt(modelId, prompts[0].text); } // Actual benchmark runs for (const prompt of prompts) { console.log(`Benchmarking prompt: ${prompt.id}`); const measurements = []; for (let i = 0; i < this.options.iterations; i++) { const measurement = await this.executeMeasuredPrompt( modelId, prompt.text ); measurements.push(measurement); // Small delay between iterations to stabilize await sleep(100); } results.push({ prompt_id: prompt.id, prompt_text: prompt.text, measurements, statistics: this.calculateStatistics(measurements) }); } return { model_id: modelId, timestamp: new Date().toISOString(), config: this.options, results }; } async executeMeasuredPrompt(modelId, promptText) { const measurement = { success: false, error: null, ttft_ms: null, // Time to first token tpot_ms: null, // Time per output token total_ms: null, tokens_generated: 0, tokens_per_second: 0 }; try { const startTime = performance.now(); let firstTokenTime = null; let tokenCount = 0; // Streaming completion to measure TTFT const stream = await this.client.chat.completions.create({ model: modelId, messages: [{ role: 'user', content: promptText }], max_tokens: 200, temperature: 0.7, stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { if (firstTokenTime === null) { firstTokenTime = performance.now(); measurement.ttft_ms = firstTokenTime - startTime; } tokenCount++; } } const endTime = performance.now(); measurement.total_ms = endTime - startTime; measurement.tokens_generated = tokenCount; if (tokenCount > 1 && firstTokenTime) { // TPOT = time after first token / (tokens - 1) const timeAfterFirstToken = endTime - firstTokenTime; measurement.tpot_ms = timeAfterFirstToken / (tokenCount - 1); measurement.tokens_per_second = 1000 / measurement.tpot_ms; } measurement.success = true; } catch (error) { measurement.error = error.message; measurement.success = false; } return measurement; } calculateStatistics(measurements) { const successful = measurements.filter(m => m.success); const total = measurements.length; if (successful.length === 0) { return { success_rate: 0, error_rate: 1.0, sample_size: total }; } const ttfts = successful.map(m => m.ttft_ms).sort((a, b) => a - b); const tpots = successful.map(m => m.tpot_ms).filter(v => v !== null).sort((a, b) => a - b); const totals = successful.map(m => m.total_ms).sort((a, b) => a - b); const throughputs = successful.map(m => m.tokens_per_second).filter(v => v > 0); return { success_rate: successful.length / total, error_rate: (total - successful.length) / total, sample_size: total, ttft: { mean: mean(ttfts), median: percentile(ttfts, 50), p95: percentile(ttfts, 95), p99: percentile(ttfts, 99), min: Math.min(...ttfts), max: Math.max(...ttfts) }, tpot: tpots.length > 0 ? { mean: mean(tpots), median: percentile(tpots, 50), p95: percentile(tpots, 95) } : null, total_latency: { mean: mean(totals), median: percentile(totals, 50), p95: percentile(totals, 95), p99: percentile(totals, 99) }, throughput: { mean_tps: mean(throughputs), median_tps: percentile(throughputs, 50) } }; } } function mean(arr) { return arr.reduce((sum, val) => sum + val, 0) / arr.length; } function percentile(sortedArr, p) { const index = Math.ceil((sortedArr.length * p) / 100) - 1; return sortedArr[Math.max(0, index)]; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } This measurement infrastructure captures: Time to First Token (TTFT): Critical for perceived responsiveness—users notice delays before output begins Time Per Output Token (TPOT): Determines generation speed after first token—affects throughput Total latency: End-to-end time—matters for batch processing and high-volume scenarios Tokens per second: Overall throughput metric—useful for capacity planning Statistical distributions: Mean alone masks variability—p95/p99 reveal tail latencies that impact user experience Success/error rates: Stability metrics—some models timeout or crash under load Designing Meaningful Benchmark Suites Benchmark quality depends on prompt selection. Generic prompts don't reflect real application behavior. Design suites that mirror actual use cases: // benchmarks/suites/default.json { "name": "default", "description": "General-purpose benchmark covering diverse scenarios", "prompts": [ { "id": "short-factual", "text": "What is the capital of France?", "category": "factual", "expected_tokens": 5 }, { "id": "medium-explanation", "text": "Explain how photosynthesis works in 3-4 sentences.", "category": "explanation", "expected_tokens": 80 }, { "id": "long-reasoning", "text": "Analyze the economic factors that led to the 2008 financial crisis. Discuss at least 5 major causes with supporting details.", "category": "reasoning", "expected_tokens": 250 }, { "id": "code-generation", "text": "Write a Python function that finds the longest palindrome in a string. Include docstring and example usage.", "category": "coding", "expected_tokens": 150 }, { "id": "creative-writing", "text": "Write a short story (3 paragraphs) about a robot learning to paint.", "category": "creative", "expected_tokens": 200 } ] } This suite covers multiple dimensions: Length variation: Short (5 tokens), medium (80), long (250)—tests models across output ranges Task diversity: Factual recall, explanation, reasoning, code, creative—reveals capability breadth Token predictability: Expected token counts enable throughput calculations For production applications, create custom suites matching your actual workload: { "name": "customer-support", "description": "Simulates actual customer support queries", "prompts": [ { "id": "product-question", "text": "How do I reset my password for the customer portal?" }, { "id": "troubleshooting", "text": "I'm getting error code 503 when trying to upload files. What should I do?" }, { "id": "policy-inquiry", "text": "What is your refund policy for annual subscriptions?" } ] } Visualizing Multi-Dimensional Performance Comparisons Raw numbers don't reveal insights—visualization makes patterns obvious. The frontend implements several comparison views: Comparison Table shows side-by-side metrics: // frontend/src/components/ResultsTable.jsx export function ResultsTable({ results }) { return ( {results.map(result => ( ))} Model TTFT (ms) TPOT (ms) Throughput (tok/s) P95 Latency Error Rate {result.model_id} {result.stats.ttft.median.toFixed(0)} (p95: {result.stats.ttft.p95.toFixed(0)}) {result.stats.tpot?.median.toFixed(1) || 'N/A'} {result.stats.throughput.median_tps.toFixed(1)} {result.stats.total_latency.p95.toFixed(0)} ms 0.05 ? 'error' : 'success'}> {(result.stats.error_rate * 100).toFixed(1)}% ); } Latency Distribution Chart reveals performance consistency: // Using Chart.js for visualization export function LatencyChart({ results }) { const data = { labels: results.map(r => r.model_id), datasets: [ { label: 'Median (p50)', data: results.map(r => r.stats.total_latency.median), backgroundColor: 'rgba(75, 192, 192, 0.5)' }, { label: 'p95', data: results.map(r => r.stats.total_latency.p95), backgroundColor: 'rgba(255, 206, 86, 0.5)' }, { label: 'p99', data: results.map(r => r.stats.total_latency.p99), backgroundColor: 'rgba(255, 99, 132, 0.5)' } ] }; return ( ); } Recommendations Engine synthesizes multi-dimensional comparison: export function generateRecommendations(results) { const recommendations = []; // Find fastest TTFT (best perceived responsiveness) const fastestTTFT = results.reduce((best, r) => r.stats.ttft.median < best.stats.ttft.median ? r : best ); recommendations.push({ category: 'Fastest Response', model: fastestTTFT.model_id, reason: `Lowest median TTFT: ${fastestTTFT.stats.ttft.median.toFixed(0)}ms` }); // Find highest throughput const highestThroughput = results.reduce((best, r) => r.stats.throughput.median_tps > best.stats.throughput.median_tps ? r : best ); recommendations.push({ category: 'Best Throughput', model: highestThroughput.model_id, reason: `Highest tok/s: ${highestThroughput.stats.throughput.median_tps.toFixed(1)}` }); // Find most consistent (lowest p95-p50 spread) const mostConsistent = results.reduce((best, r) => { const spread = r.stats.total_latency.p95 - r.stats.total_latency.median; const bestSpread = best.stats.total_latency.p95 - best.stats.total_latency.median; return spread < bestSpread ? r : best; }); recommendations.push({ category: 'Most Consistent', model: mostConsistent.model_id, reason: 'Lowest latency variance (p95-p50 spread)' }); return recommendations; } Key Takeaways and Benchmarking Best Practices Effective model benchmarking requires scientific methodology, comprehensive metrics, and application-specific testing. FLPerformance demonstrates that rigorous performance measurement is accessible to any development team. Critical principles for model evaluation: Test on target hardware: Results from cloud GPUs don't predict laptop performance Measure multiple dimensions: TTFT, TPOT, throughput, consistency all matter Use statistical rigor: Single runs mislead—capture distributions with adequate sample sizes Design realistic workloads: Generic benchmarks don't predict your application's behavior Include warmup iterations: Model loading and JIT compilation affect early measurements Control concurrency: Real applications handle multiple requests—test at realistic loads Document methodology: Reproducible results require documented procedures and configurations The complete benchmarking platform with model management, measurement infrastructure, visualization dashboards, and comprehensive documentation is available at github.com/leestott/FLPerformance. Clone the repository and run the startup script to begin evaluating models on your hardware. Resources and Further Reading FLPerformance Repository - Complete benchmarking platform Quick Start Guide - Setup and first benchmark run Microsoft Foundry Local Documentation - SDK reference and model catalog Architecture Guide - System design and SDK integration Benchmarking Best Practices - Methodology and troubleshootingHow to Integrate Playwright MCP for AI-Driven Test Automation
Test automation has come a long way, from scripted flows to self-healing and now AI-driven testing. With the introduction of Model Context Protocol (MCP), Playwright can now interact with AI models and external tools to make smarter testing decisions. This guide walks you through integrating MCP with Playwright in VSCode, starting from the basics, enabling you to build smarter, adaptive tests today. What Is Playwright MCP? Playwright: An open-source framework for web testing and automation. It supports multiple browsers (Chromium, Firefox, and WebKit) and offers robust features like auto-wait, capturing screenshots, along with some great tooling like Codegen, Trace Viewer. MCP (Model Context Protocol): A protocol that enables external tools to communicate with AI models or services in a structured, secure way. By combining Playwright with MCP, you unlock: AI-assisted test generation. Dynamic test data. Smarter debugging and adaptive workflows. Why Integrate MCP with Playwright? AI-powered test generation: Reduce manual scripting. Dynamic context awareness: Tests adapt to real-time data. Improved debugging: AI can suggest fixes for failing tests. Smarter locator selection: AI helps pick stable, reliable selectors to reduce flaky tests. Natural language instructions: Write or trigger tests using plain English prompts. Getting Started in VS Code Prerequisites Node.js Download: nodejs.org Minimum version: v18.0.0 or higher (recommended: latest LTS) Check version: node --version Playwright Install Playwright: npm install @playwright/test Step 1: Create Project Folder mkdir playwrightMCP-demo cd playwrightMCP-demo Step 2: Initialize Project npm init playwright@latest Step 3: Install MCP Server for VS Code Navigate to GitHub - microsoft/playwright-mcp: Playwright MCP server and click install server for VS Code Search for 'MCP: Open user configuration' (type ‘>mcp’ in the search box) You will see a file mcp.json is created in your user -> app data folder, which is having the server details. { "servers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest" ], "type": "stdio" } }, "inputs": [] } Alternatively, install an MCP server directly GitHub MCP server registry using the Extensions view in VS Code. From GitHub MCP server registry Verify installation: Open Copilot Chat → select Agent Mode → click Configure Tools → confirm microsoft/playwright-mcp appears in the list. Step 4: Create a Simple Test Using MCP Once your project and MCP setup are ready in VS Code, you can create a simple test that demonstrates MCP’s capabilities. MCP can help in multiple scenarios, below is the example for Test Generation using AI: Scenario: AI-Assisted Test Generation- Use natural language prompts to generate Playwright tests automatically. Test Scenario - Validate that a user can switch the Playwright documentation language dropdown to Python, search for “Frames,” and navigate to the Frames documentation page. Confirm that the page heading correctly displays “Frames.” Sample Prompt to Use in VS Code (Copilot Agent Mode):Create a Playwright automated test in JavaScript that verifies navigation to the 'Frames' documentation page following below steps and be more specific about locators to avoid strict mode violation error Navigate to : Playwright documentation select “Python” from the dropdown options, labelled “Node.js” Type the keyword “Frames” into the search box. Click the search result for the Frames documentation page Verify that the page header reads “Frames”. Log success or provide a failure message with details. Copilot will generate the test automatically in your tests folder Step 5: Run Test npx playwright test Conclusion Integrating Playwright with MCP in VS Code helps you build smarter, adaptive tests without adding complexity. Start small, follow best practices, and scale as you grow. Note - Installation steps may vary depending on your environment. Refer to MCP Registry · GitHub for the latest instructions.Real‑Time AI Streaming with Azure OpenAI and SignalR
TL;DR We’ll build a real-time AI app where Azure OpenAI streams responses and SignalR broadcasts them live to an Angular client. Users see answers appear incrementally just like ChatGPT while Azure SignalR Service handles scale. You’ll learn the architecture, streaming code, Angular integration, and optional enhancements like typing indicators and multi-agent scenarios. Why This Matters Modern users expect instant feedback. Waiting for a full AI response feels slow and breaks engagement. Streaming responses: Reduces perceived latency: Users see content as it’s generated. Improves UX: Mimics ChatGPT’s typing effect. Keeps users engaged: Especially for long-form answers. Scales for enterprise: Azure SignalR Service handles thousands of concurrent connections. What you’ll build A SignalR Hub that calls Azure OpenAI with streaming enabled and forwards partial output to clients as it arrives. An Angular client that connects over WebSockets/SSE to the hub and renders partial content with a typing indicator. An optional Azure SignalR Service layer for scalable connection management (thousands to millions of long‑lived connections). References: SignalR hosting & scale; Azure SignalR Service concepts. Architecture The hub calls Azure OpenAI with streaming enabled (await foreach over updates) and broadcasts partials to clients. Azure SignalR Service (optional) offloads connection scale and removes sticky‑session complexity in multi‑node deployments. References: Streaming code pattern; scale/ARR affinity; Azure SignalR integration. Prerequisites Azure OpenAI resource with a deployed model (e.g., gpt-4o or gpt-4o-mini) .NET 8 API + ASP.NET Core SignalR backend Angular 16+ frontend (using microsoft/signalr) Step‑by‑Step Implementation 1) Backend: ASP.NET Core + SignalR Install packages dotnet add package Microsoft.AspNetCore.SignalR dotnet add package Azure.AI.OpenAI --prerelease dotnet add package Azure.Identity dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease # Optional (managed scale): Azure SignalR Service dotnet add package Microsoft.Azure.SignalR Using DefaultAzureCredential (Entra ID) avoids storing raw keys in code and is the recommended auth model for Azure services. Program.cs var builder = WebApplication.CreateBuilder(args); builder.Services.AddSignalR(); // To offload connection management to Azure SignalR Service, uncomment: // builder.Services.AddSignalR().AddAzureSignalR(); builder.Services.AddSingleton<AiStreamingService>(); var app = builder.Build(); app.MapHub<ChatHub>("/chat"); app.Run(); AiStreamingService.cs - streams content from Azure OpenAI using Microsoft.Extensions.AI; using Azure.AI.OpenAI; using Azure.Identity; public class AiStreamingService { private readonly IChatClient _chatClient; public AiStreamingService(IConfiguration config) { var endpoint = new Uri(config["AZURE_OPENAI_ENDPOINT"]!); var deployment = config["AZURE_OPENAI_DEPLOYMENT"]!; // e.g., "gpt-4o-mini" var azureClient = new AzureOpenAIClient(endpoint, new DefaultAzureCredential()); _chatClient = azureClient.GetChatClient(deployment).AsIChatClient(); } public async IAsyncEnumerable<string> StreamReplyAsync(string userMessage) { var messages = new List<ChatMessage> { ChatMessage.CreateSystemMessage("You are a helpful assistant."), ChatMessage.CreateUserMessage(userMessage) }; await foreach (var update in _chatClient.CompleteChatStreamingAsync(messages)) { // Only text parts; ignore tool calls/annotations var chunk = string.Join("", update.Content .Where(p => p.Kind == ChatMessageContentPartKind.Text) .Select(p => ((TextContent)p).Text)); if (!string.IsNullOrEmpty(chunk)) yield return chunk; } } } Modern .NET AI extensions (Microsoft.Extensions.AI) expose a unified streaming pattern via CompleteChatStreamingAsync. ChatHub.cs - pushes partials to the caller using Microsoft.AspNetCore.SignalR; public class ChatHub : Hub { private readonly AiStreamingService _ai; public ChatHub(AiStreamingService ai) => _ai = ai; // Client calls: connection.invoke("AskAi", prompt) public async Task AskAi(string prompt) { var messageId = Guid.NewGuid().ToString("N"); await Clients.Caller.SendAsync("typing", messageId, true); await foreach (var partial in _ai.StreamReplyAsync(prompt)) { await Clients.Caller.SendAsync("partial", messageId, partial); } await Clients.Caller.SendAsync("typing", messageId, false); await Clients.Caller.SendAsync("completed", messageId); } } 2) Frontend: Angular client with microsoft/signalr Install the SignalR client npm i microsoft/signalr Create a SignalR service (Angular) // src/app/services/ai-stream.service.ts import { Injectable } from '@angular/core'; import * as signalR from '@microsoft/signalr'; import { BehaviorSubject, Observable } from 'rxjs'; @Injectable({ providedIn: 'root' }) export class AiStreamService { private connection?: signalR.HubConnection; private typing$ = new BehaviorSubject<boolean>(false); private partial$ = new BehaviorSubject<string>(''); private completed$ = new BehaviorSubject<boolean>(false); get typing(): Observable<boolean> { return this.typing$.asObservable(); } get partial(): Observable<string> { return this.partial$.asObservable(); } get completed(): Observable<boolean> { return this.completed$.asObservable(); } async start(): Promise<void> { this.connection = new signalR.HubConnectionBuilder() .withUrl('/chat') // same origin; use absolute URL if CORS .withAutomaticReconnect() .configureLogging(signalR.LogLevel.Information) .build(); this.connection.on('typing', (_id: string, on: boolean) => this.typing$.next(on)); this.connection.on('partial', (_id: string, text: string) => { // Append incremental content this.partial$.next((this.partial$.value || '') + text); }); this.connection.on('completed', (_id: string) => this.completed$.next(true)); await this.connection.start(); } async ask(prompt: string): Promise<void> { // Reset state per request this.partial$.next(''); this.completed$.next(false); await this.connection?.invoke('AskAi', prompt); } } Angular component // src/app/components/ai-chat/ai-chat.component.ts import { Component, OnInit } from '@angular/core'; import { AiStreamService } from '../../services/ai-stream.service'; @Component({ selector: 'app-ai-chat', templateUrl: './ai-chat.component.html', styleUrls: ['./ai-chat.component.css'] }) export class AiChatComponent implements OnInit { prompt = ''; output = ''; typing = false; done = false; constructor(private ai: AiStreamService) {} async ngOnInit() { await this.ai.start(); this.ai.typing.subscribe(on => this.typing = on); this.ai.partial.subscribe(text => this.output = text); this.ai.completed.subscribe(done => this.done = done); } async send() { this.output = ''; this.done = false; await this.ai.ask(this.prompt); } } HTML Template <!-- src/app/components/ai-chat/ai-chat.component.html --> <div class="chat"> <div class="prompt"> <input [(ngModel)]="prompt" placeholder="Ask me anything…" /> <button (click)="send()">Send</button> </div> <div class="response"> <pre>{{ output }}</pre> <div class="typing" *ngIf="typing">Assistant is typing…</div> <div class="done" *ngIf="done">✓ Completed</div> </div> </div> Streaming modes, content filters, and UX Azure OpenAI streaming interacts with content filtering in two ways: Default streaming: The service buffers output into content chunks and runs content filters before each chunk is emitted; you still stream, but not necessarily token‑by‑token. Asynchronous Filter (optional): The service returns token‑level updates immediately and runs filters asynchronously. You get ultra‑smooth streaming but must handle delayed moderation signals (e.g., redaction or halting the stream). Best practices Append partials in small batches client‑side to avoid DOM thrash; finalize formatting on "completed". Log full messages server‑side only after completion to keep histories consistent (mirrors agent frameworks). Security & compliance Auth: Prefer Microsoft Entra ID (DefaultAzureCredential) to avoid key sprawl; use RBAC and Managed Identities where possible. Secrets: Store Azure SignalR connection strings in Key Vault and rotate periodically; never hardcode. CORS & cross‑domain: When hosting frontend and hub on different origins, configure CORS and use absolute URLs in withUrl(...). Connection management & scaling tips Persistent connection load: SignalR consumes TCP resources; separate heavy real‑time workloads or use Azure SignalR to protect other apps. Sticky sessions (self‑hosted): Required in most multi‑server scenarios unless WebSockets‑only + SkipNegotiation applies; Azure SignalR removes this requirement. Learn more AI‑Powered Group Chat sample (ASP.NET Core): Azure OpenAI .NET client (auth & streaming): SignalR JavaScript ClientServerless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔AI Career Navigator — Empowering Job Seekers with Azure OpenAI
AI Career Navigator is more than just a project — it’s a mission to make career growth accessible, intelligent, and human. Powered by Azure OpenAI, it transforms uncertainty into direction and effort into achievement. Author: Aryan Jaiswal — Gold Microsoft Learn Student Ambassador Reviewer: Julia Muiruri (Microsoft)386Views2likes0CommentsIntl.DateTimeFormat missing Welsh (cy / cy-GB) support
Hi! I'm a developer at a company who is currently working with the Welsh government and have come across an issue with Microsoft Edge. When using the Intl.DateTimeFormat JavaScript API in Microsoft Edge, there is no support for cy / cy-GB. Intl.DateTimeFormat.supportedLocalesOf("cy-GB"); // Outputs [] Intl.DateTimeFormat.supportedLocalesOf("cy"); // Outputs [] If a developer tries to do any date formatting with the locale set to cy or cy-GB, only English will be returned. Out of curiosity I tried this with and without the Windows language being set to Welsh / Cymraeg. The result was the same (no support). Mozilla Firefox and Apple Safari have full support. Interestingly Google Chrome does not (perhaps shared issue / limitation of Chromium). Unfortunately this causes a bit of a significant challenge. Any website / web app developed for the Welsh government or related bodies is legally required to be available in Welsh. Due to the popularity of Microsoft Edge, particularly with governments and institutions, developers cannot reliably use the Intl based APIs. The only known workarounds are: To use a formatting library instead of Intl Avoids using the ECMAScript APIs designed to solve problems for developers Additional client side and developer maintenance costs To include very bulky polyfills Popular polyfill, locale and timezone information exceeds 2MB of data for the client to load per website / web app. Unsure if this a bug or locale which is yet to be supported. Adding support for Welsh would be a huge step forward and greatly appreciated by developers working with government bodies. Even better would be adding support for the same languages Windows supports!86Views0likes0CommentsWeb Development for Beginners: Learners Matchmaking!
Hey Everyone, Thanks for joining the Web Development for Beginners course! Learning and building together is the goal of this course. Post the below information and I will make sure to connect you with similar students so you can build together in this 8 weeks! Name Experience Level Location / Timezone Where can we find you? LinkedIn / Twitter / Telegram / etc Example: Korey Stegared-Pace I love JavaScript Europe/Sweden Twitter - koreyspace / LinkedIn: Korey Stegared-Pace7.1KViews0likes45CommentsWhat is GitHub Codespaces and how can Students access it for free?
GitHub Codespaces is a new service that is free for anyone to develop with powerful environments using Visual Studio Code. In this post, we'll cover how you can make use of this new technology and take advantage of its most powerful features.48KViews5likes6Comments