ai foundry
61 TopicsOn-Premises Manufacturing Intelligence
Manufacturing facilities face a fundamental dilemma in the AI era: how to harness artificial intelligence for predictive maintenance, equipment diagnostics, and operational insights while keeping sensitive production data entirely on-premises. Industrial environments generate proprietary information, CNC machining parameters, quality control thresholds, equipment performance signatures, maintenance histories, that represents competitive advantage accumulated over decades of process optimization. Sending this data to cloud APIs risks intellectual property exposure, regulatory non-compliance, and operational dependencies that manufacturing operations cannot accept. Traditional cloud-based AI introduces unacceptable vulnerabilities. Network latency of 100-500ms makes real-time decision support impossible for time-sensitive manufacturing processes. Internet dependency creates single points of failure in environments where connectivity is unreliable or deliberately restricted for security. API pricing models become prohibitively expensive when analyzing thousands of sensor readings and maintenance logs continuously. Most critically, data residency requirements for aerospace, defense, pharmaceutical, and automotive industries make cloud AI architectures non-compliant by design ITAR, FDA 21 CFR Part 11, and customer-specific mandates require data never leaves facility boundaries. This article demonstrates a sample solution for manufacturing asset intelligence that runs entirely on-premises using Microsoft Foundry Local, Node.js, and JavaScript. The FoundryLocal-IndJSsample repository provides production-ready implementation with Express backend, HTML/JavaScript frontend, and comprehensive Foundry Local SDK integration. Facilities can deploy sophisticated AI-powered monitoring without external dependencies, cloud costs, data exposure risks, or network requirements. Every inference happens locally on facility hardware with predictable performance and zero data egress. Why On-Premises AI Matters for Industrial Operations The case for local AI inference in manufacturing extends beyond simple preference, it addresses fundamental operational, security, and compliance requirements that cloud solutions cannot satisfy. Understanding these constraints shapes architectural decisions that prioritize reliability, data sovereignty, and cost predictability. Data Sovereignty and Intellectual Property Protection Manufacturing processes represent years of proprietary research, optimization, and competitive advantage. Equipment configurations, cycle times, quality thresholds, and maintenance schedules contain intelligence that competitors would value highly. Sending this data to third-party cloud services, even with contractual protections, introduces risks that manufacturing operations cannot accept. On-premises AI ensures that production data never leaves the facility network perimeter. Telemetry from CNC machines, hydraulic systems, conveyor networks, and control systems remains within air-gapped environments where physical access controls and network isolation provide demonstrable data protection. This architectural guarantee of data locality satisfies both internal security policies and external audit requirements without relying on contractual assurances or encryption alone. Operational Resilience and Network Independence Factory floors frequently operate in environments with limited, unreliable, or intentionally restricted internet connectivity. Remote facilities, secure manufacturing zones, and legacy industrial networks cannot depend on continuous cloud access for critical monitoring functions. When network failures occur, whether from ISP outages, DDoS attacks, or infrastructure damage, AI capabilities must continue operating to prevent production losses. Local inference provides true operational independence. Equipment health monitoring, anomaly detection, and maintenance prioritization continue functioning during network disruptions. This resilience is essential for 24/7 manufacturing operations where downtime costs can exceed tens of thousands of dollars per hour. By eliminating external dependencies, on-premises AI becomes as reliable as the local power supply and computing infrastructure. Latency Requirements for Real-Time Decision Making Manufacturing processes involve precise timing where milliseconds determine quality outcomes. Automated inspection systems must classify defects before products leave the production line. Safety interlocks must respond to hazardous conditions before injuries occur. Predictive maintenance alerts must trigger before catastrophic equipment failures cascade through production lines. Cloud-based AI introduces latency that incompatible with these requirements. Network round-trips to cloud endpoints typically require 100-500 milliseconds, in some case latency is unacceptable for real-time applications. Local inference with Foundry Local delivers sub-50ms response times by eliminating network hops, enabling true real-time AI integration with SCADA systems, PLCs, and manufacturing execution systems. Cost Predictability at Industrial Scale Manufacturing facilities generate enormous volumes of time-series data from thousands of sensors, producing millions of data points daily. Cloud AI services charge per API call or per token processed, creating unpredictable costs that scale linearly with data volume. High-throughput industrial applications can quickly accumulate tens of thousands of dollars in monthly API fees. On-premises AI transforms this variable operational expense into fixed capital infrastructure costs. After initial hardware investment, inference costs remain constant regardless of query volume. For facilities analyzing equipment telemetry, maintenance logs, and operator notes continuously, this economic model provides cost certainty and eliminates budget surprises. Regulatory Compliance and Audit Requirements Regulated industries face strict data handling requirements. Aerospace manufacturers must comply with ITAR controls on technical data. Pharmaceutical facilities must satisfy FDA 21 CFR Part 11 requirements for electronic records. Automotive suppliers must meet customer-specific data residency mandates. Cloud AI services complicate compliance by introducing third-party data processors, cross-border data transfers, and shared infrastructure concerns. Local AI simplifies regulatory compliance by eliminating external data flows. Audit trails remain within the facility. Data handling procedures avoid third-party agreements. Compliance demonstrations become straightforward when AI infrastructure resides entirely within auditable physical and network boundaries. Architecture: Manufacturing Intelligence Without Cloud Dependencies The manufacturing asset intelligence system demonstrates a practical architecture for deploying AI capabilities entirely on-premises. The design prioritizes operational reliability, straightforward integration patterns, and maintainable code structure that facilities can adapt to their specific requirements. System Components and Technology Stack The implementation consists of three primary layers that separate concerns and enable independent scaling: Foundry Local Layer: Provides the local AI inference runtime. Foundry Local manages model loading, execution, and resource allocation. It supports multiple model families (Phi-3.5, Phi-4, Qwen2.5) with automatic hardware acceleration detection for NVIDIA GPUs (CUDA), Intel GPUs (OpenVINO), ARM Qualcomm (QNN) and optimized CPU inference. The service exposes a REST API on localhost that the backend layer consumes for completions. Backend Service Layer: An Express Node.js application that serves as the integration point between the AI runtime and the manufacturing data systems. This layer implements business logic for equipment monitoring, maintenance log classification, and conversational interfaces. It formats prompts with equipment context, calls Foundry Local for inference, and structures responses for the frontend. The backend persists chat history and provides RESTful endpoints for all AI operations. Frontend Interface Layer: A standalone HTML/JavaScript application that provides operator interfaces for equipment monitoring, maintenance management, and AI assistant interactions. The UI fetches data from the backend service and renders dashboards, equipment status views, and chat interfaces. No framework dependencies or build steps are required, the frontend operates as static files that any web server or file system can serve. Data Flow for Equipment Analysis Understanding how data moves through the system clarifies integration points and extension opportunities. When an operator requests AI analysis of equipment status, the following sequence occurs: The frontend collects equipment context including asset ID, current telemetry values, alert status, and recent maintenance history. It constructs an HTTP request to the backend's equipment summary endpoint, passing this context as query parameters or request body. The backend retrieves additional context from the equipment database, including specifications, normal operating ranges, and historical performance patterns. The backend constructs a detailed prompt that provides the AI model with comprehensive context: equipment specifications, current telemetry with alarming conditions highlighted, recent maintenance notes, and specific questions about operational status. This prompt engineering is critical, the model's accuracy depends entirely on the context provided. Generic prompts produce generic responses; detailed, structured prompts yield actionable insights. The backend calls Foundry Local's completion API with the formatted prompt, specifying temperature, max tokens, and other generation parameters. Foundry Local loads the configured model (if not already in memory) and generates a response analyzing the equipment's condition. The inference occurs locally with no network traffic leaving the facility. Response time typically ranges from 500ms to 3 seconds depending on prompt complexity and model size. Foundry Local returns the generated text to the backend, which parses the response for structured information if required (equipment health classifications, priority levels, recommended actions). The backend formats this analysis as JSON and returns it to the frontend. The frontend renders the AI-generated summary in the equipment health dashboard, highlighting critical findings and recommended operator actions. Prompt Engineering for Maintenance Log Classification The maintenance log classification feature demonstrates effective prompt engineering for extracting structured decisions from language models. Manufacturing facilities accumulate thousands of maintenance notes, operator observations, technician reports, and automated system logs. Automatically classifying these entries by severity enables priority-based work scheduling without manual review of every log entry. The classification prompt provides the model with clear instructions, classification categories with definitions, and the maintenance note text to analyze: const classificationPrompt = `You are a manufacturing maintenance expert analyzing equipment log entries. Classify the following maintenance note into one of these categories: CRITICAL: Immediate safety hazard, equipment failure, or production stoppage HIGH: Degraded performance, abnormal readings requiring same-shift attention MEDIUM: Scheduled maintenance items or routine inspections LOW: Informational notes, normal operations logs Provide your response in JSON format: { "classification": "CRITICAL|HIGH|MEDIUM|LOW", "reasoning": "Brief explanation of classification decision", "recommended_action": "Specific next steps for maintenance team" } Maintenance Note: ${maintenanceNote} Classification:`; const response = await foundryClient.chat.completions.create({ model: currentModelAlias, messages: [{ role: 'user', content: classificationPrompt }], temperature: 0.1, // Low temperature for consistent classification max_tokens: 300 }); Key aspects of this prompt design: Role definition: Establishing the model as a "manufacturing maintenance expert" activates relevant knowledge and reasoning patterns in the model's training data. Clear categories: Explicit classification options with definitions prevent ambiguous outputs and enable consistent decision-making across thousands of logs. Structured output format: Requesting JSON responses with specific fields enables automated parsing and integration with maintenance management systems without fragile text parsing. Temperature control: Setting temperature to 0.1 reduces randomness in classifications, ensuring consistent severity assessments for similar maintenance conditions. Context isolation: Separating the maintenance note text from the instructions with clear delimiters prevents prompt injection attacks where malicious log entries might attempt to manipulate classification logic. This classification runs locally for every maintenance log entry without API costs or network delays. Facilities processing hundreds of maintenance notes daily benefit from immediate, consistent classification that routes critical issues to technicians automatically while filtering routine informational logs. Model Selection and Performance Trade-offs Foundry Local supports multiple model families with different memory requirements, inference speeds, and accuracy characteristics. Choosing appropriate models for manufacturing environments requires balancing these trade-offs against hardware constraints and operational requirements: Qwen2.5-0.5b (500MB memory): The smallest available model provides extremely fast inference (100-200ms responses) on limited hardware. Suitable for simple classification tasks, keyword extraction, and high-throughput scenarios where response speed matters more than nuanced understanding. Works well on older servers or edge devices with constrained resources. Phi-3.5-mini (2.1GB memory): The recommended default model balances accuracy with reasonable memory requirements. Provides strong reasoning capabilities for equipment analysis, maintenance prioritization, and conversational assistance. Response times of 1-3 seconds on modern CPUs are acceptable for interactive dashboards. This model handles complex prompts with detailed equipment context effectively. Phi-4-mini (3.6GB memory): Increased model capacity improves understanding of technical terminology and complex equipment relationships. Best choice when analyzing detailed maintenance histories, interpreting sensor correlation patterns, or providing nuanced operational recommendations. Requires more memory but delivers noticeably improved analysis quality for complex scenarios. Qwen2.5-7b (4.7GB memory): The largest supported model provides maximum accuracy and sophisticated reasoning. Ideal for facilities with modern server hardware where best-possible analysis quality justifies longer inference times (3-5 seconds). Consider this model for critical applications where operator decisions depend heavily on AI recommendations. Facilities can download all models during initial setup and switch between them based on specific use cases. Use faster models for real-time dashboard updates and automated classification. Deploy larger models for detailed equipment analysis and maintenance planning where operators can wait several seconds for comprehensive insights. Implementation: Equipment Monitoring and AI Analysis The practical implementation reveals how straightforward on-premises AI integration can be with modern JavaScript tooling and proper architectural separation. The backend service manages all AI interactions, shielding the frontend from inference complexity and providing clean REST interfaces. Backend Service Architecture with Express The Node.js backend initializes the Foundry Local SDK client and exposes endpoints for equipment operations: const express = require('express'); const { FoundryLocalClient } = require('foundry-local-sdk'); const cors = require('cors'); const app = express(); const PORT = process.env.PORT || 3000; // Initialize Foundry Local client const foundryClient = new FoundryLocalClient({ baseURL: 'http://localhost:8008', // Default Foundry Local endpoint timeout: 30000 }); // Middleware configuration app.use(cors()); // Enable cross-origin requests from frontend app.use(express.json()); // Parse JSON request bodies // Health check endpoint for monitoring app.get('/api/health', (req, res) => { res.json({ ok: true, service: 'manufacturing-ai-backend' }); }); // Start server app.listen(PORT, () => { console.log(`Manufacturing AI backend running on port ${PORT}`); console.log(`Foundry Local endpoint: http://localhost:8008`); }); This foundational structure establishes the Express application with CORS support for browser-based frontends and JSON request handling. The Foundry Local client connects to the local inference service running on port 8008, no external network configuration required. Equipment Summary Generation with Context-Rich Prompts The equipment summary endpoint demonstrates effective context injection for accurate AI analysis: app.get('/api/assets/:id/summary', async (req, res) => { try { const assetId = req.params.id; const asset = equipmentDatabase.find(a => a.id === assetId); if (!asset) { return res.status(404).json({ error: 'Asset not found' }); } // Construct detailed equipment context const contextPrompt = buildEquipmentContext(asset); // Generate AI analysis const completion = await foundryClient.chat.completions.create({ model: 'phi-3.5-mini', messages: [{ role: 'user', content: contextPrompt }], temperature: 0.3, max_tokens: 500 }); const analysis = completion.choices[0].message.content; res.json({ assetId: asset.id, assetName: asset.name, analysis: analysis, generatedAt: new Date().toISOString() }); } catch (error) { console.error('Equipment summary error:', error); res.status(500).json({ error: 'AI analysis failed', details: error.message }); } }); The equipment context builder assembles comprehensive information for accurate analysis: function buildEquipmentContext(asset) { const alerts = asset.alerts.filter(a => a.severity !== 'INFO'); const telemetry = asset.currentTelemetry; return `Analyze the following manufacturing equipment status: Equipment: ${asset.name} (${asset.id}) Type: ${asset.type} Location: ${asset.location} Current Telemetry: - Temperature: ${telemetry.temperature}°C (Normal range: ${asset.specs.tempRange}) - Vibration: ${telemetry.vibration} mm/s (Threshold: ${asset.specs.vibrationThreshold}) - Pressure: ${telemetry.pressure} PSI (Normal: ${asset.specs.pressureRange}) - Runtime: ${telemetry.runHours} hours (Next maintenance due: ${asset.nextMaintenance}) Active Alerts: ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Recent Maintenance History: ${asset.recentMaintenance.slice(0, 3).map(m => `- ${m.date}: ${m.description}`).join('\n')} Provide a concise operational summary focusing on: 1. Current equipment health status 2. Any concerning trends or anomalies 3. Recommended operator actions if applicable 4. Maintenance priority level Summary:`; } This context-rich approach produces accurate, actionable analysis because the model receives equipment specifications, current telemetry with context, alert history, maintenance patterns, and structured output guidance. The model can identify abnormal conditions accurately rather than guessing what values seem unusual. Conversational AI Assistant with Manufacturing Context The chat endpoint enables natural language queries about equipment status and operational questions: app.post('/api/chat', async (req, res) => { try { const { message, conversationId } = req.body; // Retrieve conversation history for context const history = conversationStore.get(conversationId) || []; // Build plant-wide context for the query const plantContext = buildPlantOperationsContext(); // Construct system message with domain knowledge const systemMessage = { role: 'system', content: `You are an AI assistant for a manufacturing facility's operations team. You have access to real-time equipment data and maintenance records. Current Plant Status: ${plantContext} Provide specific, actionable responses based on actual equipment data. If you don't have information to answer a query, clearly state that. Never speculate about equipment conditions beyond available data.` }; // Include conversation history for multi-turn context const messages = [ systemMessage, ...history, { role: 'user', content: message } ]; const completion = await foundryClient.chat.completions.create({ model: 'phi-3.5-mini', messages: messages, temperature: 0.4, max_tokens: 600 }); const assistantResponse = completion.choices[0].message.content; // Update conversation history history.push( { role: 'user', content: message }, { role: 'assistant', content: assistantResponse } ); conversationStore.set(conversationId, history); res.json({ response: assistantResponse, conversationId: conversationId, timestamp: new Date().toISOString() }); } catch (error) { console.error('Chat error:', error); res.status(500).json({ error: 'Chat request failed', details: error.message }); } }); The conversational interface enables operators to ask natural language questions and receive grounded responses based on actual equipment data, citing specific asset IDs, current metric values, and alert statuses rather than speculating. Deployment and Production Operations Deploying on-premises AI in industrial settings requires consideration of hardware placement, network architecture, integration patterns, and operational procedures that differ from typical web application deployments. Hardware and Infrastructure Requirements The system runs on standard server hardware without specialized AI accelerators, though GPU availability improves performance significantly. Minimum requirements include 8GB RAM for the Phi-3.5-mini model, 4-core CPU, and 50GB storage for model files and application data. Production deployments benefit from 16GB+ RAM to support larger models and concurrent analysis requests. For facilities with NVIDIA GPUs, Foundry Local automatically utilizes CUDA acceleration, reducing inference times by 3-5x compared to CPU-only execution. Deploy the backend service on dedicated server hardware within the factory network. Avoid running AI workloads on the same systems that host critical SCADA or MES applications due to resource contention concerns. Network Architecture and SCADA Integration The AI backend should reside on the manufacturing operations network with firewall rules permitting connections from operator workstations and monitoring systems. Do not expose the backend service directly to the internet, all access should occur through the facility's internal network with authentication via existing directory services. Integrate with SCADA systems through standard industrial protocols. Configure OPC-UA clients to subscribe to equipment telemetry topics and forward readings to the AI backend via REST API calls. Modbus TCP gateways can bridge legacy PLCs to modern APIs by polling register values and POSTing updates to the backend's telemetry ingestion endpoints. Security and Compliance Considerations Many manufacturing facilities operate air-gapped networks where physical separation prevents internet connectivity entirely. Deploy Foundry Local and the AI application in these environments by transferring model files and application packages via removable media during controlled maintenance windows. Implement role-based access control (RBAC) using Active Directory integration. Configure the backend to validate user credentials against LDAP before serving AI analysis requests. Maintain detailed audit logs of all AI invocations including user identity, timestamp, equipment queried, and model version used. Store these logs in immutable append-only databases for compliance audits. Key Takeaways Building production-ready AI systems for industrial environments requires architectural decisions that prioritize operational reliability, data sovereignty, and integration simplicity: Data locality by architectural design: On-premises AI ensures proprietary production data never leaves facility networks through fundamental architectural guarantees rather than configuration options Model selection impacts deployment feasibility: Smaller models (0.5B-2B parameters) enable deployment on commodity hardware without specialized accelerators while maintaining acceptable accuracy Fallback logic preserves operational continuity: AI capabilities enhance but don't replace core monitoring functions, ensuring equipment dashboards display raw telemetry even when AI analysis is unavailable Context-rich prompts determine accuracy: Effective prompts include equipment specifications, normal operating ranges, alert thresholds, and maintenance history to enable grounded recommendations Structured outputs enable automation: JSON response formats allow automated systems to parse classifications and route work orders without fragile text parsing Integration patterns bridge legacy systems: OPC-UA and Modbus TCP gateways connect decades-old PLCs and SCADA systems to modern AI without replacing functional control infrastructure Resources and Further Exploration The complete implementation with extensive comments and documentation is available in the GitHub repository. Additional resources help facilities customize and extend the system for their specific requirements. FoundryLocal-IndJSsample GitHub Repository – Full source code with JavaScript backend, HTML frontend, and sample data files Quick Start Guide and Documentation – Installation instructions, API documentation, and troubleshooting guidance Microsoft Foundry Local Documentation – Official SDK reference, model catalog, and deployment guidance Sample Manufacturing Data – Example equipment telemetry, maintenance logs, and alert structures Backend Implementation Reference – Express server code with Foundry Local SDK integration patterns OPC Foundation – Industrial communication standards for SCADA and PLC integration Edge AI for Beginners - Online FREE course and resources for learning more about using AI on Edge Devices Why On-Premises AI Cloud AI services offer convenience, but they fundamentally conflict with manufacturing operational requirements. Understanding these conflicts explains why local AI isn't just preferable, it's mandatory for production environments. Data privacy and intellectual property protection stand paramount. A CNC machining program represents years of optimization, feed rates, tool paths, thermal compensation algorithms. Quality control measurements reveal product specifications competitors would pay millions to access. Sending this data to external APIs, even with encryption, creates unacceptable exposure risk. Every API call generates logs on third-party servers, potentially subject to subpoenas, data breaches, or regulatory compliance failures. Latency requirements eliminate cloud viability for real-time decisions. When a thermal sensor detects bearing temperature exceeding safe thresholds, the control system needs AI analysis in under 50 milliseconds to prevent catastrophic failure. Cloud APIs introduce 100-500ms baseline latency from network round-trips alone, before queue times and processing. For safety systems, quality inspection, and process control, this latency is operationally unacceptable. Network dependency creates operational fragility. Factory floors frequently have limited connectivity, legacy equipment, RF interference, isolated production cells. Critical AI capabilities cannot fail because internet service drops. Moreover, many defense, aerospace, and pharmaceutical facilities operate air-gapped networks for security compliance. Cloud AI is simply non-operational in these environments. Regulatory requirements mandate data residency. ITAR (International Traffic in Arms Regulations) prohibits certain manufacturing data from leaving approved facilities. FDA 21 CFR Part 11 requires strict data handling controls for pharmaceutical manufacturing. GDPR demands data residency in approved jurisdictions. On-premises AI simplifies compliance by eliminating cross-border data transfers. Cost predictability at scale favors local deployment. A high-volume facility generating 10,000 equipment events per day, each requiring AI analysis, would incur significant cloud API costs. Local models have fixed infrastructure costs that scale economically with usage, making AI economically viable for continuous monitoring. Application Architecture: Web UI + Local AI Backend The FoundryLocal-IndJSsample implements a clean separation between data presentation and AI inference. This architecture ensures the UI remains responsive while AI operations run independently, enabling real-time dashboard updates without blocking user interactions. The web frontend serves a single-page application with vanilla HTML, CSS, and JavaScript, no frameworks, no build tools. This simplicity is intentional: factory IT teams need to audit code, customize interfaces, and deploy on legacy systems. The UI presents four main interfaces: Plant Asset Overview (real-time health cards for all equipment), Asset Health (AI-generated summaries and trend analysis), Maintenance Logs (classification and priority routing), and AI Assistant (natural language interface for operations queries). The Node.js backend runs Express as the HTTP server, handling static file serving, API routing, and WebSocket connections for real-time updates. It loads sample manufacturing data from JSON files, equipment telemetry, maintenance logs, historical events, simulating the data streams that would come from SCADA systems, PLCs, and MES platforms in production. Foundry Local provides the AI inference layer. The backend uses foundry-local-sdk to communicate with the locally running service. All model loading, prompt processing, and response generation happens on-device. The application detects Foundry Local automatically and falls back to rule-based analysis if unavailable, ensuring core functionality persists even when AI is offline. Here's the architectural flow for asset health analysis: User Request (Web UI) ↓ Express API Route (/api/assets/:id/summary) ↓ Load Equipment Data (from JSON/database) ↓ Build Analysis Prompt (Equipment ID, telemetry, alerts) ↓ Foundry Local SDK Call (local AI inference) ↓ Parse AI Response (structured insights) ↓ Return JSON Result (with metadata: model, latency, confidence) ↓ Display in UI (formatted health summary) This architecture demonstrates several industrial system design principles: Offline-first operation: Core functionality works without internet connectivity, with AI as an enhancement rather than dependency Graceful degradation: If AI fails, fall back to rule-based logic rather than crashing operations Minimal external dependencies: Simple stack reduces attack surface and simplifies air-gapped deployment Data locality: All processing happens on-premises, no external API calls Real-time updates: WebSocket connections enable push-based event streaming for dashboard updates Setting Up Foundry Local for Industrial Applications Industrial deployments require careful model selection that balances accuracy, speed, and hardware constraints. Factory edge devices often run on limited hardware—industrial PCs with modest GPUs or CPU-only configurations. Model choice significantly impacts deployment feasibility. Install Foundry Local on the industrial edge device: # Windows (most common for industrial PCs) winget install Microsoft.FoundryLocal # Verify installation foundry --version For manufacturing asset intelligence, model selection trades off speed versus quality: # Fast option: Qwen 0.5B (500MB, <100ms inference) foundry model load qwen2.5-0.5b # Balanced option: Phi-3.5 Mini (2.1GB, ~200ms inference) foundry model load phi-3.5-mini # High quality option: Phi-4 Mini (3.6GB, ~500ms inference) foundry model load phi-4 # Check which model is currently loaded foundry model list For real-time monitoring dashboards where hundreds of assets update continuously, qwen2.5-0.5b provides sufficient quality at speeds that don't bottleneck refresh cycles. For detailed root cause analysis or maintenance report generation where quality matters most, phi-4 justifies the slightly longer inference time. Industrial systems benefit from proactive model caching during downtime: # During maintenance windows, pre-download models foundry model download phi-3.5-mini foundry model download qwen2.5-0.5b # Models cache locally, eliminating runtime downloads The backend automatically detects Foundry Local and selects the loaded model: // backend/services/foundry-service.js import { FoundryLocalClient } from 'foundry-local-sdk'; class FoundryService { constructor() { this.client = null; this.modelAlias = null; this.initializeClient(); } async initializeClient() { try { // Detect Foundry Local endpoint const endpoint = process.env.FOUNDRY_LOCAL_ENDPOINT || 'http://127.0.0.1:5272'; this.client = new FoundryLocalClient({ endpoint }); // Query which model is currently loaded const models = await this.client.models.list(); this.modelAlias = models.data[0]?.id || 'phi-3.5-mini'; console.log(`✅ Foundry Local connected: ${this.modelAlias}`); } catch (error) { console.warn('⚠️ Foundry Local not available, using rule-based fallback'); this.client = null; } } async generateCompletion(prompt, options = {}) { if (!this.client) { // Fallback to rule-based analysis return this.ruleBasedAnalysis(prompt); } try { const startTime = Date.now(); const completion = await this.client.chat.completions.create({ model: this.modelAlias, messages: [ { role: 'system', content: 'You are an industrial asset intelligence assistant analyzing manufacturing equipment.' }, { role: 'user', content: prompt } ], temperature: 0.3, // Low temperature for factual analysis max_tokens: 400, ...options }); const latency = Date.now() - startTime; return { content: completion.choices[0].message.content, model: this.modelAlias, latency_ms: latency, tokens: completion.usage?.total_tokens }; } catch (error) { console.error('Foundry inference error:', error); return this.ruleBasedAnalysis(prompt); } } ruleBasedAnalysis(prompt) { // Fallback logic for when AI is unavailable // Pattern matching and heuristics return { content: '(Rule-based analysis) Equipment status: Monitoring...', model: 'rule-based-fallback', latency_ms: 5, tokens: 0 }; } } export default new FoundryService(); This service layer demonstrates critical production patterns: Automatic endpoint detection: Tries environment variable first, falls back to default Model auto-discovery: Queries Foundry Local for currently loaded model rather than hardcoding Robust error handling: Every API call wrapped in try-catch with fallback logic Performance tracking: Latency measurement enables monitoring and capacity planning Conservative temperature: 0.3 temperature reduces hallucination for factual equipment analysis Implementing AI-Powered Asset Health Analysis Equipment health monitoring forms the core use case, synthesizing telemetry from multiple sources into actionable insights. Traditional monitoring systems show raw metrics (temperature, vibration, pressure) but require expert interpretation. AI transforms this into natural language summaries that any operator can understand and act upon. Here's the API endpoint that generates asset health summaries: // backend/routes/assets.js import express from 'express'; import foundryService from '../services/foundry-service.js'; import { getAssetData } from '../data/asset-loader.js'; const router = express.Router(); router.get('/api/assets/:id/summary', async (req, res) => { try { const assetId = req.params.id; // Load equipment data const asset = await getAssetData(assetId); if (!asset) { return res.status(404).json({ error: 'Asset not found' }); } // Build analysis prompt with context const prompt = buildHealthAnalysisPrompt(asset); // Generate AI summary const analysis = await foundryService.generateCompletion(prompt); // Structure response res.json({ asset_id: assetId, asset_name: asset.name, summary: analysis.content, model_used: analysis.model, latency_ms: analysis.latency_ms, timestamp: new Date().toISOString(), telemetry_snapshot: { temperature: asset.telemetry.temperature, vibration: asset.telemetry.vibration, runtime_hours: asset.telemetry.runtime_hours }, active_alerts: asset.alerts.filter(a => a.active).length }); } catch (error) { console.error('Asset summary error:', error); res.status(500).json({ error: 'Analysis failed' }); } }); function buildHealthAnalysisPrompt(asset) { return ` Analyze the health of this manufacturing equipment and provide a concise summary: Equipment: ${asset.name} (${asset.id}) Type: ${asset.type} Location: ${asset.location} Current Telemetry: - Temperature: ${asset.telemetry.temperature}°C (Normal: ${asset.specs.normal_temp_range}) - Vibration: ${asset.telemetry.vibration} mm/s (Threshold: ${asset.specs.vibration_threshold}) - Operating Pressure: ${asset.telemetry.pressure} PSI - Runtime: ${asset.telemetry.runtime_hours} hours - Last Maintenance: ${asset.maintenance.last_service_date} Active Alerts: ${asset.alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Recent Events: ${asset.recent_events.slice(0, 3).map(e => `- ${e.timestamp}: ${e.description}`).join('\n')} Provide a 3-4 sentence summary covering: 1. Overall equipment health status 2. Any concerning trends or anomalies 3. Recommended actions or monitoring focus Be factual and specific. Do not speculate beyond the provided data. `.trim(); } export default router; This prompt construction demonstrates several best practices for industrial AI: Structured data presentation: Organize telemetry, specs, and alerts in clear sections with labels Context enrichment: Include normal operating ranges so AI can assess abnormality Explicit constraints: Instruction to avoid speculation reduces hallucination risk Output formatting guidance: Request specific structure (3-4 sentences, covering key points) Temporal context: Include recent events so AI understands trend direction Example AI-generated asset summary: { "asset_id": "CNC-L2-M03", "asset_name": "CNC Mill #3", "summary": "Equipment is operating outside normal parameters with elevated temperature at 92°C, significantly above the 75-80°C normal range. Thermal Alert indicates possible coolant flow issue. Vibration levels remain acceptable at 2.8 mm/s. Recommend immediate inspection of coolant system and thermal throttling may impact throughput until resolved.", "model_used": "phi-3.5-mini", "latency_ms": 243, "timestamp": "2026-01-30T14:23:18Z", "telemetry_snapshot": { "temperature": 92, "vibration": 2.8, "runtime_hours": 12847 }, "active_alerts": 2 } This summary transforms raw telemetry into actionable intelligence—operations staff immediately understand the problem, its severity, and the appropriate response, without requiring deep equipment expertise. Maintenance Log Classification with AI Maintenance departments generate hundreds of logs daily, technician notes, operator observations, inspection reports. Manually categorizing and prioritizing these logs consumes significant time. AI classification automatically routes logs to appropriate teams, identifies urgent issues, and extracts key information. The classification endpoint processes maintenance notes: // backend/routes/maintenance.js router.post('/api/logs/classify', async (req, res) => { try { const { log_text, equipment_id } = req.body; if (!log_text || log_text.length < 10) { return res.status(400).json({ error: 'Log text required (min 10 chars)' }); } const classificationPrompt = ` Classify this maintenance log entry into appropriate categories and priority: Equipment: ${equipment_id || 'Unknown'} Log Text: "${log_text}" Classify into EXACTLY ONE primary category: - MECHANICAL: Physical components, bearings, belts, motors - ELECTRICAL: Power systems, sensors, controllers, wiring - HYDRAULIC: Pumps, fluid systems, pressure issues - THERMAL: Cooling, heating, temperature control - SOFTWARE: PLC programming, HMI issues, control logic - ROUTINE: Scheduled maintenance, inspections, calibration Assign priority level: - CRITICAL: Immediate action required, safety or production impact - HIGH: Resolve within 24 hours, performance degradation - MEDIUM: Schedule within 1 week, minor issues - LOW: Routine maintenance, cosmetic issues Extract key details: - Symptoms described - Suspected root cause (if mentioned) - Recommended actions Return ONLY a JSON object with this exact structure: { "category": "MECHANICAL", "priority": "HIGH", "symptoms": ["grinding noise", "vibration above 5mm/s"], "suspected_cause": "bearing wear", "recommended_actions": ["inspect bearings", "order replacement parts"] } `.trim(); const analysis = await foundryService.generateCompletion(classificationPrompt); // Parse AI response as JSON let classification; try { // Extract JSON from response (AI might add explanation text) const jsonMatch = analysis.content.match(/\{[\s\S]*\}/); classification = JSON.parse(jsonMatch[0]); } catch (parseError) { // Fallback parsing if JSON extraction fails classification = parseClassificationText(analysis.content); } // Validate classification const validCategories = ['MECHANICAL', 'ELECTRICAL', 'HYDRAULIC', 'THERMAL', 'SOFTWARE', 'ROUTINE']; const validPriorities = ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW']; if (!validCategories.includes(classification.category)) { classification.category = 'ROUTINE'; } if (!validPriorities.includes(classification.priority)) { classification.priority = 'MEDIUM'; } res.json({ original_log: log_text, classification, model_used: analysis.model, latency_ms: analysis.latency_ms, timestamp: new Date().toISOString() }); } catch (error) { console.error('Classification error:', error); res.status(500).json({ error: 'Classification failed' }); } }); function parseClassificationText(text) { // Fallback parser for when AI doesn't return valid JSON // Extract category, priority, and details using regex patterns const categoryMatch = text.match(/category[":]\s*(MECHANICAL|ELECTRICAL|HYDRAULIC|THERMAL|SOFTWARE|ROUTINE)/i); const priorityMatch = text.match(/priority[":]\s*(CRITICAL|HIGH|MEDIUM|LOW)/i); return { category: categoryMatch ? categoryMatch[1].toUpperCase() : 'ROUTINE', priority: priorityMatch ? priorityMatch[1].toUpperCase() : 'MEDIUM', symptoms: [], suspected_cause: 'Unknown', recommended_actions: [] }; } This implementation demonstrates several critical patterns for structured AI outputs: Explicit output format requirements: Prompt specifies exact JSON structure to encourage parseable responses Defensive parsing: Try JSON extraction first, fall back to text parsing if that fails Validation with sensible defaults: Validate categories and priorities against allowed values, default to safe values on mismatch Constrained classification vocabulary: Limit categories to predefined set rather than open-ended categories Priority inference rules: Guide AI to assess urgency based on safety, production impact, and timeline Example classification output: POST /api/logs/classify { "log_text": "Hydraulic pump PUMP-L1-H01 making grinding noise during startup. Vibration readings spiked to 5.2 mm/s this morning. Possible bearing wear. Recommend inspection.", "equipment_id": "PUMP-L1-H01" } Response: { "original_log": "Hydraulic pump PUMP-L1-H01 making grinding noise...", "classification": { "category": "MECHANICAL", "priority": "HIGH", "symptoms": ["grinding noise during startup", "vibration spike to 5.2 mm/s"], "suspected_cause": "bearing wear", "recommended_actions": ["inspect bearings", "schedule replacement if confirmed worn"] }, "model_used": "phi-3.5-mini", "latency_ms": 187, "timestamp": "2026-01-30T14:35:22Z" } This classification automatically routes the log to the mechanical maintenance team, marks it high priority for same-day attention, and extracts actionable details, all without human intervention. Building the Natural Language Operations Assistant The AI Assistant interface enables operations staff to query equipment status, ask diagnostic questions, and get contextual guidance using natural language. This interface bridges the gap between complex SCADA systems and operators who need quick answers without navigating multiple screens. The chat endpoint implements contextual conversation: // backend/routes/chat.js router.post('/api/chat', async (req, res) => { try { const { message, conversation_id } = req.body; if (!message || message.length < 3) { return res.status(400).json({ error: 'Message required (min 3 chars)' }); } // Load conversation history if exists const history = conversation_id ? await loadConversationHistory(conversation_id) : []; // Build context from current plant state const plantContext = await buildPlantContext(); // Construct system prompt with operational context const systemPrompt = ` You are an operations assistant for a manufacturing facility. Answer questions about equipment status, maintenance, and operational issues. Current Plant Status: ${plantContext} Guidelines: - Provide specific, actionable answers based on current data - Reference specific equipment IDs when relevant - Suggest appropriate next steps for issues - If information is unavailable, say so clearly - Use concise language suitable for busy operators Do not speculate about issues without data to support it. `.trim(); // Build message chain with history const messages = [ { role: 'system', content: systemPrompt }, ...history.map(h => ({ role: h.role, content: h.content })), { role: 'user', content: message } ]; // Generate response const response = await foundryService.generateCompletion( message, { messages } // Pass full conversation context ); // Save conversation const newConversationId = conversation_id || generateConversationId(); await saveConversationTurn(newConversationId, message, response.content); res.json({ response: response.content, conversation_id: newConversationId, model_used: response.model, latency_ms: response.latency_ms, timestamp: new Date().toISOString() }); } catch (error) { console.error('Chat error:', error); res.status(500).json({ error: 'Chat failed' }); } }); async function buildPlantContext() { const assets = await getAllAssets(); // Summarize plant-wide status const criticalAlerts = assets.flatMap(a => a.alerts.filter(alert => alert.severity === 'CRITICAL') ); const equipmentStatus = assets.map(a => `${a.id}: ${a.status} (Temp: ${a.telemetry.temperature}°C, Alerts: ${a.alerts.length})` ); return ` Total Equipment: ${assets.length} Critical Alerts: ${criticalAlerts.length} Equipment Status Summary: ${equipmentStatus.join('\n')} `.trim(); } Example conversational interactions: User: "What's wrong with Line 2?" AI: "Line 2 has elevated issues. CNC-L2-M03 is running 12°C above normal temperature at 92°C with a thermal alert active. This has caused 28% below-target throughput as the equipment throttles to prevent damage. The coolant system requires immediate inspection—flow rate dropped to 3.2 L/min from the normal 5.5 L/min, indicating a potential blockage or pump failure. Other Line 2 equipment (CONV-L2-B01, PLC-L2-MAIN) is operating normally." User: "How's the hydraulic pump performing?" AI: "PUMP-L1-H01 has a high-priority maintenance alert. Grinding noise was reported during startup with vibration at 5.2 mm/s, exceeding the 4.0 mm/s threshold. This pattern suggests bearing wear. Current operational status shows the pump is running but degraded. Recommend inspection today to prevent catastrophic failure. Bearing replacement typically requires 2-4 hours downtime." User: "What should I prioritize this shift?" AI: "Priority order for this shift: 1) CNC-L2-M03 coolant inspection (CRITICAL - impacting production, safety risk), 2) PUMP-L1-H01 bearing check (HIGH - prevent failure progression), 3) Routine inspections on CONV-L1-B02 (scheduled maintenance). Focus immediate attention on Line 2 thermal issues. Line 1 is stable." The assistant provides grounded, specific responses citing actual equipment IDs, current metrics, and actionable priorities. This data-grounded approach maintains operational credibility, operators quickly learn to trust responses that reference specific equipment conditions they can independently verify on SCADA systems. Real-World Industrial Scenario: Thermal Event Response To understand the system's practical value, consider how it supports operators during an actual equipment incident. This scenario simulates realistic manufacturing conditions where multiple issues develop simultaneously and operators need rapid situational awareness. Scenario Setup: Assembly Line Degradation During Shift B operations on a Tuesday evening at a precision manufacturing facility, multiple equipment anomalies develop on production lines. The plant overview dashboard begins displaying elevated alert counts. Telemetry monitoring detects concerning patterns across Assembly Line 2 equipment. CNC mill CNC-L2-M03 reports sustained high temperature at 92°C compared to normal operating range of 65-75°C. Coolant flow sensors indicate 40% reduction from baseline levels—dropping from 5.5 L/min to 3.2 L/min. Equipment throughput has decreased 28% as thermal management systems automatically throttle spindle speeds to prevent damage. The mill has accumulated 12,847 runtime hours with next scheduled maintenance in 153 hours. Simultaneously, hydraulic pump PUMP-L1-H01 on Line 1 logged a maintenance note during startup: "Grinding noise observed during pump startup, vibration spike detected." Automated vibration sensors registered 5.2 mm/s compared to the 4.0 mm/s alarm threshold. The pump's condition monitoring system flagged bearing wear as probable cause based on frequency analysis of vibration signatures. The shift supervisor, monitoring the plant overview dashboard, sees both alerts flagged as high priority. Traditional response would require accessing SCADA screens, checking telemetry history, reviewing maintenance logs across multiple systems, correlating patterns, and consulting equipment experts, consuming 15-20 minutes of investigation time while equipment conditions potentially worsen. AI-Assisted Response Workflow Instead, the supervisor uses the AI Assistant interface to rapidly assess the situation. The workflow demonstrates how local AI transforms incident response: Step 1 - Initial Assessment Query: Supervisor types "What's the current plant status?" The AI assistant analyzes the complete equipment database and returns: "Plant status as of 22:14: 14 assets monitored, 2 high-priority alerts active. Critical issues: CNC-L2-M03 thermal event (92°C, coolant flow reduced 40%, throughput down 28%). High-priority: PUMP-L1-H01 mechanical degradation (bearing wear suspected, 5.2 mm/s vibration). All other equipment operating within normal parameters. Recommend immediate attention to Line 2 thermal issue—production impact and potential equipment damage risk." Step 2 - Detailed Equipment Analysis: Supervisor navigates to Asset Health tab and selects CNC-L2-M03 for detailed analysis. Within 1.8 seconds, the AI summary generates: "CNC-L2-M03 Health Summary: DEGRADED - IMMEDIATE ACTION REQUIRED Current Status: Elevated thermal condition with active coolant flow issue. Operating temperature 92°C exceeds normal range (65-75°C) by 22%. Coolant flow rate 3.2 L/min indicates 42% reduction from baseline 5.5 L/min. Impact Assessment: Throughput reduced 28% due to automatic thermal throttling. Spindle speed limited to 65% capacity to prevent thermal damage. Current production rate: 18 parts/hour vs. target 25 parts/hour. Root Cause Analysis: Coolant flow reduction suggests blockage in cooling circuit or coolant pump degradation. Temperature rise pattern correlates with gradual flow reduction observed over past 6 hours. Recommended Actions: 1. Immediate: Verify coolant reservoir level and pump operation 2. Inspect: Check cooling circuit for blockages or debris 3. Monitor: Temperature trend every 15 minutes until resolved 4. Consider: Production pause if temperature exceeds 95°C Maintenance Priority: HIGH - Production impact and equipment damage risk if unresolved. Estimated repair time: 1-2 hours for coolant system service." Step 3 - Maintenance Log Classification: When the pump maintenance note is entered, the AI classification system analyzes it and returns structured classification within 240ms: { "classification": { "category": "MECHANICAL", "priority": "HIGH", "symptoms": ["grinding noise during startup", "vibration spike to 5.2 mm/s", "frequency pattern indicates bearing wear"], "suspected_cause": "bearing wear - inner race degradation probable", "recommended_actions": [ "schedule bearing inspection within 8 hours", "prepare replacement bearings (6208-2RS)", "plan maintenance window: 2-4 hours", "order backup pump if repair extended" ], "estimated_downtime": "2-4 hours for bearing replacement" } } Step 4 - Decision Support and Prioritization: Supervisor asks "What should I prioritize tonight?" The AI synthesizes all context: "Priority recommendations for Shift B: IMMEDIATE (next 30 minutes): - CNC-L2-M03: Dispatch maintenance to coolant system NOW. Production impacted, equipment damage risk if temperature continues rising. Impact: $2,400/hour revenue loss at current throughput reduction. HIGH (within 4 hours): - PUMP-L1-H01: Schedule bearing inspection for line changeover at 02:00. Pump operational but degraded—plan bearing replacement parts and backup pump. Risk: Catastrophic failure if bearing fully degrades. STANDARD (this week): - CONV-L1-B02: Scheduled inspection per maintenance calendar - Routine: Daily checklist completion for all lines Resource allocation: Focus maintenance team lead on CNC thermal issue. Assign technician to prepare pump bearing replacement kit during current production run." Outcome: Rapid Response Enabled by Local AI The AI-assisted workflow reduced incident assessment time from 15-20 minutes to under 3 minutes. The supervisor immediately dispatched maintenance to investigate CNC-L2-M03's coolant system, identified as root cause. Technicians found debris blocking a cooling circuit junction, 5-minute clearance procedure restored coolant flow to 5.3 L/min. Equipment temperature dropped to 73°C within 10 minutes, and full production throughput resumed. For the hydraulic pump, bearing inspection was scheduled during planned line changeover at 02:00, preventing emergency production stoppage. Bearings were replaced preemptively, avoiding the catastrophic pump failure that would have caused 6-8 hours of unplanned downtime. Total downtime avoided: 8+ hours. Revenue protection: ~$48,000 based on facility's production value. All decisions made with AI running entirely on local edge device, no cloud dependency, no data exposure, no network latency impact. The complete incident response workflow operated on facility-controlled infrastructure with full data sovereignty. Key Takeaways for Manufacturing AI Deployment Building production-ready AI systems for industrial environments requires architectural decisions that prioritize operational reliability, data sovereignty, and integration pragmatism over cutting-edge model sophistication. Several critical lessons emerge from implementing on-premises manufacturing intelligence: Data locality through architectural guarantee: On-premises AI ensures proprietary production data never leaves facility networks not through configuration but through fundamental architecture. There are no cloud API calls to misconfigure, no data upload features to accidentally enable, no external endpoints to compromise. This physical data boundary satisfies security audits and competitive protection requirements with demonstrable certainty rather than contractual assurance. Model selection determines deployment feasibility: Smaller models (0.5B-2B parameters) enable deployment on commodity server hardware without specialized AI accelerators. These models provide sufficient accuracy for industrial classification, summarization, and conversational assistance while maintaining sub-3-second response times essential for operator acceptance. Larger models improve nuance but require GPU infrastructure and longer inference times that may not justify marginal accuracy gains for operational decision-making. Graceful degradation preserves operations: AI capabilities enhance but never replace core monitoring functions. Equipment dashboards must display raw telemetry, alert states, and historical trends even when AI analysis is unavailable. This architectural separation ensures operations continue during AI service maintenance, model updates, or system failures. AI becomes value-add intelligence rather than critical dependency. Context-rich prompts determine accuracy: Generic prompts produce generic responses unsuitable for operational decisions. Effective industrial prompts include equipment specifications, normal operating ranges, alert thresholds, maintenance history, and temporal context. This structured context enables models to provide grounded, specific recommendations citing actual equipment conditions rather than hallucinated speculation. Prompt engineering matters more than model size for operational accuracy. Structured outputs enable automation: JSON response formats with predefined fields allow automated systems to parse classifications, severity levels, and recommended actions without fragile natural language parsing. Maintenance management systems can automatically route work orders, trigger alerts, and update dashboards based on AI classification results. This structured integration scales AI beyond human-read summaries into automated workflow systems. Integration patterns bridge legacy and modern: OPC-UA clients and Modbus TCP gateways connect decades-old PLCs and SCADA systems to modern AI backends without replacing functional control infrastructure. This evolutionary approach enables AI adoption without massive capital equipment replacement. Manufacturing facilities can augment existing investments rather than ripping and replacing proven systems. Responsible AI through grounding and constraints: Industrial AI must acknowledge limits and avoid speculation beyond available data. System prompts should explicitly instruct models: "If you don't have information to answer, clearly state that" and "Do not speculate about equipment conditions beyond provided data." This reduces hallucination risk and maintains operator trust. Operators must verify AI recommendations against domain expertise, position AI as decision support augmenting human judgment, not replacing it. Getting Started: Installation and Deployment Implementing the manufacturing intelligence system requires Foundry Local installation, Node.js backend deployment, and frontend hosting, achievable within a few hours for facilities with existing IT infrastructure and server hardware. Prerequisites and System Requirements Hardware requirements depend on selected AI models. Minimum configuration supports Phi-3.5-mini model (2.1GB): 8GB RAM, 4-core CPU (Intel Core i5/AMD Ryzen 5 or better) 50GB available storage for model files and application data Windows 11/Server 2025 distribution. Recommended production configuration: 16GB+ RAM (supports larger models and concurrent requests), 8-core CPU or NVIDIA GPU (RTX 3060/4060 or better for 3-5x inference acceleration), 100GB SSD storage, gigabit network interface for intra-facility communication. Software prerequisites: Node.js 18 or newer (download from nodejs.org or install via system package manager), Git for repository cloning, modern web browser (Chrome, Edge, Firefox) for frontend access, Windows: PowerShell 5.1+. Foundry Local Installation and Model Setup Install Foundry Local using system-appropriate package manager: # Windows installation via winget winget install Microsoft.FoundryLocal # Verify installation foundry --version # macOS installation via Homebrew brew install microsoft/foundrylocal/foundrylocal Download AI models based on hardware capabilities and accuracy requirements: # Fast option: Qwen 0.5B (500MB, 100-200ms inference) foundry model download qwen2.5-0.5b # Balanced option: Phi-3.5 Mini (2.1GB, 1-3 second inference) foundry model download phi-3.5-mini # High quality option: Phi-4 Mini (3.6GB, 2-5 second inference) foundry model download phi-4-mini # Check downloaded models foundry model list Load a model into the Foundry Local service: # Load default recommended model foundry model run phi-3.5-mini # Verify service is running and model is loaded foundry service status The Foundry Local service will start automatically and expose a REST API on localhost:8008 (default port). The backend application connects to this endpoint for all AI inference operations. Backend Service Deployment Clone the repository and install dependencies: # Clone from GitHub git clone https://github.com/leestott/FoundryLocal-IndJSsample.git cd FoundryLocal-IndJSsample # Navigate to backend directory cd backend # Install Node.js dependencies npm install # Start the backend service npm start The backend server will initialize and display startup messages: Manufacturing AI Backend Starting... ✓ Foundry Local client initialized: http://localhost:8008 ✓ Model detected: phi-3.5-mini ✓ Sample data loaded: 6 assets, 12 maintenance logs ✓ Server running on port 3000 ✓ Frontend accessible at: http://localhost:3000 Health check: http://localhost:3000/api/health Verify backend health: # Test backend API curl http://localhost:3000/api/health # Expected response: {"ok":true,"service":"manufacturing-ai-backend"} # Test Foundry Local integration curl http://localhost:3000/api/models/status # Expected response: {"serviceRunning":true,"model":"phi-3.5-mini"} Frontend Access and Validation Open the web interface by navigating to web/index.html in a browser or starting from the backend URL: # Windows: Open frontend directly start http://localhost:3000 # macOS/Linux: Open frontend open http://localhost:3000 # or xdg-open http://localhost:3000 The web interface displays a navigation bar with four main sections: Overview: Plant-wide dashboard showing all equipment with health status cards, alert counts, and "Load Scenario" button to populate sample data Asset Health: Equipment selector dropdown, telemetry display, active alerts list, and "Generate AI Summary" button for detailed analysis Maintenance: Text area for maintenance log entry, "Classify Log" button, and classification result display showing category, priority, and recommendations AI Assistant: Chat interface with message input, conversation history, and natural language query capabilities Running the Sample Scenario Test the complete system with included sample data: Load scenario data: Click "Load Scenario Inputs" button in Overview tab. This populates equipment database with CNC-L2-M03 thermal event, PUMP-L1-H01 vibration alert, and baseline telemetry for all assets. Generate asset summary: Navigate to Asset Health tab, select "CNC-L2-M03" from dropdown, click "Generate AI Analysis". Within 2-3 seconds, detailed health summary appears explaining thermal condition, coolant flow issue, impacts, and recommended actions. Classify maintenance note: Go to Maintenance tab, enter text: "Grinding noise on startup, vibration 5.2 mm/s, suspect bearing wear". Click "Classify Log". AI categorizes as MECHANICAL/HIGH priority with specific repair recommendations. Ask operational questions: Open AI Assistant tab, type "What's wrong with Line 2?" or "Which equipment needs attention?" AI responds with specific equipment IDs, current conditions, and prioritized action list. Production Deployment Considerations For actual manufacturing facility deployment, several additional configurations apply: Hardware placement: Deploy backend service on dedicated server within manufacturing network zone. Avoid co-locating AI workloads with critical SCADA/MES systems due to resource contention. Use physical server or VM with direct hardware access for GPU acceleration. Network configuration: Backend should reside behind facility firewall with access restricted to internal networks. Do not expose AI service directly to internetm use VPN for remote access if required. Implement authentication via Active Directory/LDAP integration. Configure firewall rules permitting connections from operator workstations and monitoring systems only. Data integration: Replace sample JSON data with connections to actual data sources. Implement OPC-UA client for SCADA integration, connect to MES database for production schedules, integrate with CMMS for maintenance history. Code includes placeholder functions for external data source integration, customize for facility-specific systems. Model selection: Choose appropriate model based on hardware and accuracy requirements. Start with phi-3.5-mini for production deployment. Upgrade to phi-4-mini if analysis quality needs improvement and hardware supports it. Use qwen2.5-0.5b for high-throughput scenarios where speed matters more than nuanced understanding. Test all models against validation scenarios before production promotion. Monitoring and maintenance: Implement health checks monitoring Foundry Local service status, backend API responsiveness, model inference latency, and error rates. Set up alerting when inference latency exceeds thresholds or service unavailable. Establish procedures for model updates during planned maintenance windows. Keep audit logs of all AI invocations for compliance and troubleshooting. Resources and Further Learning The complete implementation with detailed comments, sample data, and documentation provides a foundation for building custom manufacturing intelligence systems. Additional resources support extension and adaptation to specific facility requirements. FoundryLocal-IndJSsample GitHub Repository – Complete source code with JavaScript backend, HTML/CSS/JS frontend, sample manufacturing data, and comprehensive README Installation and Configuration Guide – Detailed setup instructions, API documentation, troubleshooting procedures, and deployment guidance Microsoft Foundry Local Documentation – Official SDK reference, model catalog, hardware requirements, and performance tuning guidance Sample Manufacturing Data Format – JSON structure examples for equipment telemetry, maintenance logs, alert definitions, and operational events Backend Implementation Reference – Express server architecture, Foundry Local SDK integration patterns, API endpoint implementations, and error handling OPC Foundation – Industrial communication standards (OPC-UA, OPC DA) for SCADA system integration and PLC connectivity ISA Standards – International Society of Automation standards for industrial systems, SCADA architecture, and manufacturing execution systems EdgeAI for Beginner - Learn more about Edge AI using these course materials The manufacturing intelligence implementation demonstrates that sophisticated AI capabilities can run entirely on-premises without compromising operational requirements. Facilities gain predictive maintenance insights, natural language operational support, and automated equipment analysis while maintaining complete data sovereignty, zero network dependency, and deterministic performance characteristics essential for production environments.Building HIPAA-Compliant Medical Transcription with Local AI
Building HIPAA-Compliant Medical Transcription with Local AI Introduction Healthcare organizations generate vast amounts of spoken content, patient consultations, research interviews, clinical notes, medical conferences. Transcribing these recordings traditionally requires either manual typing (time-consuming and expensive) or cloud transcription services (creating immediate HIPAA compliance concerns). Every audio file sent to external APIs exposes Protected Health Information (PHI), requires Business Associate Agreements, creates audit trails on third-party servers, and introduces potential breach vectors. This sample solution lies in on-premises voice-to-text systems that process audio entirely locally, never sending PHI beyond organizational boundaries. This article demonstrates building a sample medical transcription application using FLWhisper, ASP.NET Core, C#, and Microsoft Foundry Local with OpenAI Whisper models. You'll learn how to build sample HIPAA-compliant audio processing, integrate Whisper models for medical terminology accuracy, design privacy-first API patterns, and build responsive web UIs for healthcare workflows. Whether you're developing electronic health record (EHR) integrations, building clinical research platforms, or implementing dictation systems for medical practices, this sample could be a great starting point for privacy-first speech recognition. Why Local Transcription Is Critical for Healthcare Healthcare data handling is fundamentally different from general business data due to HIPAA regulations, state privacy laws, and professional ethics obligations. Understanding these requirements explains why cloud transcription services, despite their convenience, create unacceptable risks for medical applications. HIPAA compliance mandates strict controls over PHI. Every system that touches patient data must implement administrative, physical, and technical safeguards. Cloud transcription APIs require Business Associate Agreements (BAAs), but even with paperwork, you're entrusting PHI to external systems. Every API call creates logs on vendor servers, potentially in multiple jurisdictions. Data breaches at transcription vendors expose patient information, creating liability for healthcare organizations. On-premises processing eliminates these third-party risks entirely, PHI never leaves your controlled environment. US State laws increasingly add requirements beyond HIPAA. California's CCPA, New York's SHIELD Act, and similar legislation create additional compliance obligations. International regulations like GDPR prohibit transferring health data outside approved jurisdictions. Local processing simplifies compliance by keeping data within organizational boundaries. Research applications face even stricter requirements. Institutional Review Boards (IRBs) often require explicit consent for data sharing with external parties. Cloud transcription may violate study protocols that promise "no third-party data sharing." Clinical trials in pharmaceutical development handle proprietary information alongside PHI, double jeopardy for data exposure. Local transcription maintains research integrity while enabling audio analysis. Cost considerations favor local deployment at scale. Medical organizations generate substantial audio, thousands of patient encounters monthly. Cloud APIs charge per minute of audio, creating significant recurring costs. Local models have fixed infrastructure costs that scale economically. A modest GPU server can process hundreds of hours monthly at predictable expense. Latency matters for clinical workflows. Doctors and nurses need transcriptions available immediately after patient encounters to review and edit while details are fresh. Cloud APIs introduce network delays, especially problematic in rural health facilities with limited connectivity. Local inference provides <1 second turnaround for typical consultation lengths. Application Architecture: ASP.NET Core with Foundry Local The sample FLWhisper application implements clean separation between audio handling, AI inference, and state management using modern .NET patterns: The ASP.NET Core 10 minimal API provides HTTP endpoints for health checks, audio transcription, and sample file streaming. Minimal APIs reduce boilerplate while maintaining full middleware support for error handling, authentication, and CORS. The API design follows OpenAI's transcription endpoint specification, enabling drop-in replacement for existing integrations. The service layer encapsulates business logic: FoundryModelService manages model loading and lifetime, TranscriptionService handles audio processing and AI inference, and SampleAudioService provides demonstration files for testing. This separation enables easy testing, dependency injection, and service swapping. Foundry Local integration uses the Microsoft.AI.Foundry.Local.WinML SDK. Unlike cloud APIs requiring authentication and network calls, this SDK communicates directly with the local Foundry service via in-process calls. Models load once at startup, remaining resident in memory for sub-second inference on subsequent requests. The static file frontend delivers vanilla HTML/CSS/JavaScript, no framework overhead. This simplicity aids healthcare IT security audits and enables deployment on locked-down hospital networks. The UI provides file upload, sample selection, audio preview, transcription requests, and result display with copy-to-clipboard functionality. Here's the architectural flow for transcription requests: Web UI (Upload Audio File) ↓ POST /v1/audio/transcriptions (Multipart Form Data) ↓ ASP.NET Core API Route ↓ TranscriptionService.TranscribeAudio(audioStream) ↓ Foundry Local Model (Whisper Medium locally) ↓ Text Result + Metadata (language, duration) ↓ Return JSON/Text Response ↓ Display in UI This architecture embodies several healthcare system design principles: Data never leaves the device: All processing occurs on-premises, no external API calls No data persistence by default: Audio and transcripts are session-only, never saved unless explicitly configured Comprehensive health checks: System readiness verification before accepting PHI Audit logging support: Structured logging for compliance documentation Graceful degradation: Clear error messages when models unavailable rather than silent failures Setting Up Foundry Local with Whisper Models Foundry Local supports multiple Whisper model sizes, each with different accuracy/speed tradeoffs. For medical transcription, accuracy is paramount—misheard drug names or dosages create patient safety risks: # Install Foundry Local (Windows) winget install Microsoft.FoundryLocal # Verify installation foundry --version # Download Whisper Medium model (optimal for medical accuracy) foundry model add openai-whisper-medium-generic-cpu:1 # Check model availability foundry model list Whisper Medium (769M parameters) provides the best balance for medical use. Smaller models (Tiny, Base) miss medical terminology frequently. Larger models (Large) offer marginal accuracy gains at 3x inference time. Medium handles medical vocabulary well, drug names, anatomical terms, procedure names, while processing typical consultation audio (5-10 minutes) in under 30 seconds. The application detects and loads the model automatically: // Services/FoundryModelService.cs using Microsoft.AI.Foundry.Local.WinML; public class FoundryModelService { private readonly ILogger _logger; private readonly FoundryOptions _options; private ILocalAIModel? _loadedModel; public FoundryModelService( ILogger logger, IOptions options) { _logger = logger; _options = options.Value; } public async Task InitializeModelAsync() { try { _logger.LogInformation( "Loading Foundry model: {ModelAlias}", _options.ModelAlias ); // Load model from Foundry Local _loadedModel = await FoundryClient.LoadModelAsync( modelAlias: _options.ModelAlias, cancellationToken: CancellationToken.None ); if (_loadedModel == null) { _logger.LogWarning("Model loaded but returned null instance"); return false; } _logger.LogInformation( "Successfully loaded model: {ModelAlias}", _options.ModelAlias ); return true; } catch (Exception ex) { _logger.LogError( ex, "Failed to load Foundry model: {ModelAlias}", _options.ModelAlias ); return false; } } public ILocalAIModel? GetLoadedModel() => _loadedModel; public async Task UnloadModelAsync() { if (_loadedModel != null) { await FoundryClient.UnloadModelAsync(_loadedModel); _loadedModel = null; _logger.LogInformation("Model unloaded"); } } } Configuration lives in appsettings.json , enabling easy customization without code changes: { "Foundry": { "ModelAlias": "whisper-medium", "LogLevel": "Information" }, "Transcription": { "MaxAudioDurationSeconds": 300, "SupportedFormats": ["wav", "mp3", "m4a", "flac"], "DefaultLanguage": "en" } } Implementing Privacy-First Transcription Service The transcription service handles audio processing while maintaining strict privacy controls. No audio or transcript persists beyond the HTTP request lifecycle unless explicitly configured: // Services/TranscriptionService.cs public class TranscriptionService { private readonly FoundryModelService _modelService; private readonly ILogger _logger; public async Task TranscribeAudioAsync( Stream audioStream, string originalFileName, TranscriptionOptions? options = null) { options ??= new TranscriptionOptions(); var startTime = DateTime.UtcNow; try { // Validate audio format ValidateAudioFormat(originalFileName); // Get loaded model var model = _modelService.GetLoadedModel(); if (model == null) { throw new InvalidOperationException("Whisper model not loaded"); } // Create temporary file (automatically deleted after transcription) using var tempFile = new TempAudioFile(audioStream); // Execute transcription _logger.LogInformation( "Starting transcription for file: {FileName}", originalFileName ); var transcription = await model.TranscribeAsync( audioFilePath: tempFile.Path, language: options.Language, cancellationToken: CancellationToken.None ); var duration = (DateTime.UtcNow - startTime).TotalSeconds; _logger.LogInformation( "Transcription completed in {Duration:F2}s", duration ); return new TranscriptionResult { Text = transcription.Text, Language = transcription.Language ?? options.Language, Duration = transcription.AudioDuration, ProcessingTimeSeconds = duration, FileName = originalFileName, Timestamp = DateTime.UtcNow }; } catch (Exception ex) { _logger.LogError( ex, "Transcription failed for file: {FileName}", originalFileName ); throw; } } private void ValidateAudioFormat(string fileName) { var extension = Path.GetExtension(fileName).TrimStart('.'); var supportedFormats = new[] { "wav", "mp3", "m4a", "flac", "ogg" }; if (!supportedFormats.Contains(extension.ToLowerInvariant())) { throw new ArgumentException( $"Unsupported audio format: {extension}. " + $"Supported: {string.Join(", ", supportedFormats)}" ); } } } // Temporary file wrapper that auto-deletes internal class TempAudioFile : IDisposable { public string Path { get; } public TempAudioFile(Stream sourceStream) { Path = System.IO.Path.GetTempFileName(); using var fileStream = File.OpenWrite(Path); sourceStream.CopyTo(fileStream); } public void Dispose() { try { if (File.Exists(Path)) { File.Delete(Path); } } catch { // Ignore deletion errors in temp folder } } } This service demonstrates several privacy-first patterns: Temporary file lifecycle management: Audio written to temp storage, automatically deleted after transcription No implicit persistence: Results returned to caller, not saved by service Format validation: Accept only supported audio formats to prevent processing errors Comprehensive logging: Audit trail for compliance without logging PHI content Error isolation: Exceptions contain diagnostic info but no patient data Building the OpenAI-Compatible REST API The API endpoint mirrors OpenAI's transcription API specification, enabling existing integrations to work without modifications: // Program.cs var builder = WebApplication.CreateBuilder(args); // Configure services builder.Services.Configure( builder.Configuration.GetSection("Foundry") ); builder.Services.AddSingleton(); builder.Services.AddScoped(); builder.Services.AddHealthChecks() .AddCheck("foundry-health"); var app = builder.Build(); // Load model at startup var modelService = app.Services.GetRequiredService(); await modelService.InitializeModelAsync(); app.UseHealthChecks("/health"); app.MapHealthChecks("/api/health/status"); // OpenAI-compatible transcription endpoint app.MapPost("/v1/audio/transcriptions", async ( HttpRequest request, TranscriptionService transcriptionService, ILogger logger) => { if (!request.HasFormContentType) { return Results.BadRequest(new { error = "Content-Type must be multipart/form-data" }); } var form = await request.ReadFormAsync(); // Extract audio file var audioFile = form.Files.GetFile("file"); if (audioFile == null || audioFile.Length == 0) { return Results.BadRequest(new { error = "Audio file required in 'file' field" }); } // Parse options var format = form["format"].ToString() ?? "text"; var language = form["language"].ToString() ?? "en"; try { // Process transcription using var stream = audioFile.OpenReadStream(); var result = await transcriptionService.TranscribeAudioAsync( audioStream: stream, originalFileName: audioFile.FileName, options: new TranscriptionOptions { Language = language } ); // Return in requested format if (format == "json") { return Results.Json(new { text = result.Text, language = result.Language, duration = result.Duration }); } else { // Default: plain text return Results.Text(result.Text); } } catch (Exception ex) { logger.LogError(ex, "Transcription request failed"); return Results.StatusCode(500); } }) .DisableAntiforgery() // File uploads need CSRF exemption .WithName("TranscribeAudio") .WithOpenApi(); app.Run(); Example API usage: # PowerShell $audioFile = Get-Item "consultation-recording.wav" $response = Invoke-RestMethod ` -Uri "http://localhost:5192/v1/audio/transcriptions" ` -Method Post ` -Form @{ file = $audioFile; format = "json" } Write-Output $response.text # cURL curl -X POST http://localhost:5192/v1/audio/transcriptions \ -F "file=@consultation-recording.wav" \ -F "format=json" Building the Interactive Web Frontend The web UI provides a user-friendly interface for non-technical medical staff to transcribe recordings: SarahCare Medical Transcription The JavaScript handles file uploads and API interactions: // wwwroot/app.js let selectedFile = null; async function checkHealth() { try { const response = await fetch('/health'); const statusEl = document.getElementById('status'); if (response.ok) { statusEl.className = 'status-badge online'; statusEl.textContent = '✓ System Ready'; } else { statusEl.className = 'status-badge offline'; statusEl.textContent = '✗ System Unavailable'; } } catch (error) { console.error('Health check failed:', error); } } function handleFileSelect(event) { const file = event.target.files[0]; if (!file) return; selectedFile = file; // Show file info const fileInfo = document.getElementById('fileInfo'); fileInfo.textContent = `Selected: ${file.name} (${formatFileSize(file.size)})`; fileInfo.classList.remove('hidden'); // Enable audio preview const preview = document.getElementById('audioPreview'); preview.src = URL.createObjectURL(file); preview.classList.remove('hidden'); // Enable transcribe button document.getElementById('transcribeBtn').disabled = false; } async function transcribeAudio() { if (!selectedFile) return; const loadingEl = document.getElementById('loadingIndicator'); const resultEl = document.getElementById('resultSection'); const transcribeBtn = document.getElementById('transcribeBtn'); // Show loading state loadingEl.classList.remove('hidden'); resultEl.classList.add('hidden'); transcribeBtn.disabled = true; try { const formData = new FormData(); formData.append('file', selectedFile); formData.append('format', 'json'); const startTime = Date.now(); const response = await fetch('/v1/audio/transcriptions', { method: 'POST', body: formData }); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const result = await response.json(); const processingTime = ((Date.now() - startTime) / 1000).toFixed(1); // Display results document.getElementById('transcriptionText').value = result.text; document.getElementById('resultDuration').textContent = `Duration: ${result.duration.toFixed(1)}s`; document.getElementById('resultLanguage').textContent = `Language: ${result.language}`; resultEl.classList.remove('hidden'); console.log(`Transcription completed in ${processingTime}s`); } catch (error) { console.error('Transcription failed:', error); alert(`Transcription failed: ${error.message}`); } finally { loadingEl.classList.add('hidden'); transcribeBtn.disabled = false; } } function copyToClipboard() { const text = document.getElementById('transcriptionText').value; navigator.clipboard.writeText(text) .then(() => alert('Copied to clipboard')) .catch(err => console.error('Copy failed:', err)); } // Initialize window.addEventListener('load', () => { checkHealth(); loadSamplesList(); }); Key Takeaways and Production Considerations Building HIPAA-compliant voice-to-text systems requires architectural decisions that prioritize data privacy over convenience. The FLWhisper application demonstrates that you can achieve accurate medical transcription, fast processing times, and intuitive user experiences entirely on-premises. Critical lessons for healthcare AI: Privacy by architecture: Design systems where PHI never exists outside controlled environments, not as a configuration option No persistence by default: Audio and transcripts should be ephemeral unless explicitly saved with proper access controls Model selection matters: Whisper Medium provides medical terminology accuracy that smaller models miss Health checks enable reliability: Systems should verify model availability before accepting PHI Audit logging without content logging: Track operations for compliance without storing sensitive data in logs For production deployment in clinical settings, integrate with EHR systems via HL7/FHIR interfaces. Implement role-based access control with Active Directory integration. Add digital signatures for transcript authentication. Configure automatic PHI redaction using clinical NLP models. Deploy on HIPAA-compliant infrastructure with proper physical security. Implement comprehensive audit logging meeting compliance requirements. The complete implementation with ASP.NET Core API, Foundry Local integration, sample audio files, and comprehensive tests is available at github.com/leestott/FLWhisper. Clone the repository and follow the setup guide to experience privacy-first medical transcription. Resources and Further Reading FLWhisper Repository - Complete C# implementation with .NET 10 Quick Start Guide - Installation and usage instructions Microsoft Foundry Local Documentation - SDK reference and model catalog OpenAI Whisper Documentation - Model architecture and capabilities HIPAA Compliance Guidelines - HHS official guidance Testing Guide - Comprehensive test suite documentation🚀 AI Toolkit for VS Code — February 2026 Update
February brings a major milestone for AI Toolkit. Version 0.30.0 is packed with new capabilities that make agent development more discoverable, debuggable, and production-ready—from a brand-new Tool Catalog, to an end-to-end Agent Inspector, to treating evaluations as first-class tests. 🔧 New in v0.30.0 🧰 Tool Catalog: One place to discover and manage agent tools The new Tool Catalog is a centralized hub for discovering, configuring, and integrating tools into your AI agents. Instead of juggling scattered configs and definitions, you now get a unified experience for tool management: Browse, search, and filter tools from the public Foundry catalog and local stdio MCP servers Configure connection settings for each tool directly in VS Code Add tools to agents seamlessly via Agent Builder Manage the full tool lifecycle: add, update, or remove tools with confidence Why it matters: expanding your agent’s capabilities is now a few clicks away—and stays manageable as your agent grows. 🕵️ Agent Inspector: Debug agents like real software The new Agent Inspector turns agent debugging into a first-class experience inside VS Code. Just press F5 and launch your agent with full debugger support. Key highlights: One-click F5 debugging with breakpoints, variable inspection, and step-through execution Copilot auto-configuration that scaffolds agent code, endpoints, and debugging setup Production-ready code generated using the Hosted Agent SDK, ready for Microsoft Foundry Real-time visualization of streaming responses, tool calls, and multi-agent workflows Quick code navigation—double-click workflow nodes to jump straight to source Unified experience combining chat and workflow visualization in one view Why it matters: agents are no longer black boxes—you can see exactly what’s happening, when, and why. 🧪 Evaluation as Tests: Treat quality like code With Evaluation as Tests, agent quality checks now fit naturally into existing developer workflows. What’s new: Define evaluations as test cases using familiar pytest syntax and Eval Runner SDK annotations Run evaluations directly from VS Code Test Explorer, mixing and matching test cases Analyze results in a tabular view with Data Wrangler integration Submit evaluation definitions to run at scale in Microsoft Foundry Why it matters: evaluations are no longer ad-hoc scripts—they’re versioned, repeatable, and CI-friendly. 🔄 Improvements across the Toolkit 🧱 Agent Builder Agent Builder received a major usability refresh: Redesigned layout for better navigation and focus Quick switcher to move between agents effortlessly Support for authoring, running, and saving Foundry prompt agents Add tools to Foundry prompt agents directly from the Tool Catalog or built-in tools New Inspire Me feature to help you get started when drafting agent instructions Numerous performance and stability improvements 🤖 Model Catalog Added support for models using the OpenAI Response API, including gpt-5.2-codex General performance and reliability improvements 🧠 Build Agent with GitHub Copilot New Workflow entry point to quickly generate multi-agent workflows with Copilot Ability to orchestrate workflows by selecting prompt agents from Foundry 🔁 Conversion & Profiling Generate interactive playgrounds for history models Added Qualcomm GPU recipes Show resource usage for Phi Silica directly in Model Playground ✨ Wrapping up Version 0.30.0 is a big step forward for AI Toolkit. With better discoverability, real debugging, structured evaluation, and deeper Foundry integration, building AI agents in VS Code now feels much closer to building production software. As always, we’d love your feedback—keep it coming, and happy agent building! 🚀Teaching AI Development Through Gamification:
Introduction Learning AI development can feel overwhelming. Developers face abstract concepts like embeddings, prompt engineering, and workflow orchestration topics that traditional tutorials struggle to make tangible. How do you teach someone what an embedding "feels like" or why prompt engineering matters beyond theoretical examples? The answer lies in experiential learning through gamification. Instead of reading about AI concepts, what if developers could play a game that teaches these ideas through progressively challenging levels, immediate feedback, and real AI interactions? This article explores exactly that: building an educational adventure game that transforms AI learning from abstract theory into hands-on exploration. We'll dive into Foundry Local Learning Adventure, a JavaScript-based game that teaches AI fundamentals through five interactive levels. You'll learn how to create engaging educational experiences, integrate local AI models using Foundry Local, design progressive difficulty curves, and build cross-platform applications that run both in browsers and terminals. Whether you're an educator designing technical curriculum or a developer building learning tools, this architecture provides a proven blueprint for gamified technical education. Why Gamification Works for Technical Learning Traditional technical education follows a predictable pattern: read documentation, watch tutorials, attempt exercises, struggle with setup, eventually give up. The problem isn't content quality, it's engagement and friction. Gamification addresses both issues simultaneously. By framing learning as progression through levels, you create intrinsic motivation. Each completed challenge feels like unlocking a new ability in a game, triggering the same dopamine response that keeps players engaged in entertainment experiences. Progress is visible, achievements are celebrated, and setbacks feel like natural parts of the journey rather than personal failures. More importantly, gamification reduces friction. Instead of "install dependencies, configure API keys, read documentation, write code, debug errors," learners simply start the game and begin playing. The game handles setup, provides guardrails, and offers immediate feedback. When a concept clicks, the game celebrates it. When learners struggle, hints appear automatically. For AI development specifically, gamification solves a unique challenge: making probabilistic, non-deterministic systems feel approachable. Traditional programming has clear right and wrong answers, but AI outputs vary. A game can frame this variability as exploration rather than failure, teaching developers to evaluate AI responses critically while maintaining confidence. Architecture Overview: Dual-Platform Design for Maximum Reach The Foundry Local Learning Adventure implements a dual-platform architecture with separate but consistent implementations for web browsers and command-line terminals. This design maximizes accessibility, learners can start playing instantly in a browser, then graduate to CLI mode for the full terminal experience when they're ready to go deeper. The web version prioritizes zero-friction onboarding. It's deployed to GitHub Pages and can also be opened locally via a simple HTTP server, no build step, no package managers. The game starts with simulated AI responses in demo mode, but crucially, it also supports real AI responses when Foundry Local is installed. The web version auto-discovers Foundry Local's dynamic port through a foundry-port.json file (written by the startup scripts) or by scanning common ports. Progress saves to localStorage, badges unlock as you complete challenges, and an AI-powered mentor named Sage guides you through a chat widget in the corner. This version is perfect for classrooms, conference demos, and learners who want to try before committing to a full CLI setup. The CLI version provides the full terminal experience with real AI interactions. Built on Node.js with ES modules, this version features a custom FoundryLocalClient class that connects to Foundry Local's OpenAI-compatible REST API. Instead of relying on an external SDK, the game implements its own API client with automatic port discovery, model selection, and graceful fallback to demo mode. The terminal interface includes a rich command system ( play , hint , ask , explain , progress , badges ) and the Sage mentor provides contextual guidance throughout. Both versions implement the same five levels and learning objectives independently. The CLI uses game/src/game.js , levels.js , and mentor.js as ES modules, while the web version uses game/web/game-web.js and game-data.js . A key innovation is the automatic port discovery system, which eliminates manual configuration: // 3-tier port discovery strategy (game/src/game.js) class FoundryLocalClient { constructor() { this.commonPorts = [61341, 5272, 51319, 5000, 8080]; this.mode = 'demo'; // 'local', 'azure', or 'demo' } async initialize() { // Tier 1: CLI discovery - parse 'foundry service status' output const cliPort = await this.discoverPortViaCLI(); if (cliPort) { this.baseUrl = cliPort; this.mode = 'local'; return; } // Tier 2: Try configured URL from config.json if (await this.tryFoundryUrl(config.foundryLocal.baseUrl)) { this.mode = 'local'; return; } // Tier 3: Scan common ports for (const port of this.commonPorts) { if (await this.tryFoundryUrl(`http://127.0.0.1:${port}`)) { this.mode = 'local'; return; } } // Fallback: demo mode with simulated responses console.log('💡 Running in demo mode (no Foundry Local detected)'); this.mode = 'demo'; } async chat(messages, options = {}) { if (this.mode === 'demo') return this.getDemoResponse(messages); const response = await fetch(`${this.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: this.selectedModel, messages, temperature: options.temperature || 0.7, max_tokens: options.max_tokens || 300 }) }); const data = await response.json(); return data.choices[0].message.content; } } This architecture demonstrates several key principles for educational software: Progressive disclosure: Start simple (web demo mode), add complexity optionally (real AI via Foundry Local or Azure) Consistent learning outcomes: Both platforms teach the same five concepts through independently implemented but equivalent experiences Zero barriers to entry: No installation required for the web version eliminates the #1 reason learners abandon technical tutorials Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Graceful degradation: Three connection modes (local, Azure, demo) ensure the game always works regardless of setup Level Design: Teaching AI Concepts Through Progressive Challenges The game's five levels form a carefully designed curriculum that builds AI understanding incrementally. Each level introduces one core concept, provides hands-on practice, and validates learning before proceeding. Level 1: Meet the Model teaches the fundamental request-response pattern. Learners send their first message to an AI and see it respond. The challenge is deliberately trivial, just say hello, because the goal is building confidence. The level succeeds when the learner realizes "I can talk to an AI and it understands me." This moment of agency sets the foundation for everything else. The implementation focuses on positive reinforcement. In the CLI version, the Sage mentor celebrates each completion with contextual messages, while the web version displays inline celebration banners with badge animations: // Level 1 execution (game/src/game.js) async executeLevel1() { const level = this.levels.getLevel(1); this.displayLevelHeader(level); // Sage introduces the level const intro = await this.mentor.introduceLevel(1); console.log(`\n🧙 Sage: ${intro}`); const userPrompt = await this.askQuestion('\nYour prompt: '); console.log('\n🤖 AI is thinking...'); const response = await this.client.chat([ { role: 'system', content: 'You are Sage, a friendly AI mentor.' }, { role: 'user', content: userPrompt } ]); console.log(`\n📨 AI Response:\n${response}`); if (response && response.length > 10) { // Sage celebrates const celebration = await this.mentor.celebrateLevelComplete(1); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n🎯 You earned the Prompt Apprentice badge!'); console.log('🏆 +100 points'); this.progress.completeLevel(1, 100, '🎯 Prompt Apprentice'); } } This celebration pattern repeats throughout, explicit acknowledgment of success via the Sage mentor, explanation of what was learned, and a preview of what's next. The mentor system ( game/src/mentor.js ) provides contextual encouragement using AI-generated or pre-written fallback messages, transforming abstract concepts into concrete achievements. Level 2: Prompt Mastery introduces prompt quality through comparison. The game presents a deliberately poor prompt: "tell me stuff about coding." Learners must rewrite it to be specific, contextual, and actionable. The game runs both prompts, displays results side-by-side, and asks learners to evaluate the difference. // Level 2: Prompt Improvement (game/src/game.js) async executeLevel2() { const level = this.levels.getLevel(2); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(2); console.log(`\n🧙 Sage: ${intro}`); // Show the bad prompt const badPrompt = "tell me stuff about coding"; console.log(`\n❌ Poor prompt: "${badPrompt}"`); console.log('\n🤖 Getting response to bad prompt...'); const badResponse = await this.client.chat([ { role: 'user', content: badPrompt } ]); console.log(`\n📊 Bad prompt result:\n${badResponse}`); // Get the learner's improved version console.log('\n✍️ Now write a BETTER prompt about the same topic:'); const goodPrompt = await this.askQuestion('Your improved prompt: '); console.log('\n🤖 Getting response to your prompt...'); const goodResponse = await this.client.chat([ { role: 'user', content: goodPrompt } ]); console.log(`\n📊 Your prompt result:\n${goodResponse}`); // Evaluate: improved prompt should be longer and more specific const isImproved = goodPrompt.length > badPrompt.length && goodResponse.length > 0; if (isImproved) { const celebration = await this.mentor.celebrateLevelComplete(2); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n✨ You earned the Prompt Engineer badge!'); console.log('🏆 +150 points'); this.progress.completeLevel(2, 150, '✨ Prompt Engineer'); } else { const hint = await this.mentor.provideHint(2); console.log(`\n💡 Sage: ${hint}`); } } This comparative approach is powerful, learners don't just read about prompt engineering, they experience its impact directly. The before/after comparison makes quality differences undeniable. Level 3: Embeddings Explorer demystifies semantic search through practical demonstration. Learners search a knowledge base about Foundry Local using natural language queries. The game shows how embedding similarity works by returning relevant content even when exact keywords don't match. // Level 3: Embedding Search (game/src/game.js) async executeLevel3() { const level = this.levels.getLevel(3); this.displayLevelHeader(level); // Knowledge base loaded from game/data/knowledge-base.json const knowledgeBase = [ { id: 1, content: "Foundry Local runs AI models entirely on your device" }, { id: 2, content: "Embeddings convert text into numerical vectors" }, { id: 3, content: "Cosine similarity measures how related two texts are" }, // ... more entries about AI and Foundry Local ]; const query = await this.askQuestion('\n🔍 Search query: '); // Get embedding for user's query const queryEmbedding = await this.client.getEmbedding(query); // Get embeddings for all knowledge base entries const results = []; for (const item of knowledgeBase) { const itemEmbedding = await this.client.getEmbedding(item.content); const similarity = this.cosineSimilarity(queryEmbedding, itemEmbedding); results.push({ ...item, similarity }); } // Sort by similarity and show top matches results.sort((a, b) => b.similarity - a.similarity); console.log('\n📑 Top matches:'); results.slice(0, 3).forEach((r, i) => { console.log(` ${i + 1}. (${(r.similarity * 100).toFixed(1)}%) ${r.content}`); }); } // Cosine similarity calculation (also in TaskHandler) cosineSimilarity(a, b) { const dot = a.reduce((sum, val, i) => sum + val * b[i], 0); const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); return dot / (magA * magB); } // Demo mode generates pseudo-embeddings when Foundry isn't available getPseudoEmbedding(text) { // 128-dimension hash-based vector for offline demonstration const embedding = new Array(128).fill(0); for (let i = 0; i < text.length; i++) { embedding[i % 128] += text.charCodeAt(i) / 1000; } return embedding; } Learners query things like "How do I run AI offline?" and discover content about Foundry Local's offline capabilities—even though the word "offline" appears nowhere in the result. When Foundry Local is running, the game calls the /v1/embeddings endpoint for real vector representations. In demo mode, a pseudo-embedding function generates 128-dimension hash-based vectors that still demonstrate the concept of similarity search. This concrete demonstration of semantic understanding beats any theoretical explanation. Level 4: Workflow Wizard teaches AI pipeline composition. Learners build a three-step workflow: summarize text → extract keywords → generate questions. Each step uses the previous output as input, demonstrating how complex AI tasks decompose into chains of simpler operations. // Level 4: Workflow Builder (game/src/game.js) async executeLevel4() { const level = this.levels.getLevel(4); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(4); console.log(`\n🧙 Sage: ${intro}`); console.log('\n📝 Enter text for the 3-step AI pipeline:'); const inputText = await this.askQuestion('Input text: '); // Step 1: Summarize console.log('\n⚙️ Step 1: Summarizing...'); const summary = await this.client.chat([ { role: 'system', content: 'Summarize this in 2 sentences.' }, { role: 'user', content: inputText } ]); console.log(` Result: ${summary}`); // Step 2: Extract keywords (chained from Step 1 output) console.log('\n🔑 Step 2: Extracting keywords...'); const keywords = await this.client.chat([ { role: 'system', content: 'Extract 5 important keywords.' }, { role: 'user', content: summary } ]); console.log(` Keywords: ${keywords}`); // Step 3: Generate questions (chained from Step 2 output) console.log('\n❓ Step 3: Generating study questions...'); const questions = await this.client.chat([ { role: 'system', content: 'Create 3 quiz questions about these topics.' }, { role: 'user', content: keywords } ]); console.log(` Questions:\n${questions}`); console.log('\n✅ Workflow complete!'); const celebration = await this.mentor.celebrateLevelComplete(4); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n⚡ You earned the Workflow Wizard badge!'); console.log('🏆 +250 points'); this.progress.completeLevel(4, 250, '⚡ Workflow Wizard'); } This level bridges the gap between "toy examples" and real applications. Learners see firsthand how combining simple AI operations creates sophisticated functionality. Level 5: Build Your Own Tool challenges learners to create a custom AI-powered tool by selecting from pre-built templates and configuring them. Rather than asking learners to write arbitrary code, the game provides four structured templates that demonstrate how AI tools work in practice: // Level 5: Tool Builder templates (game/web/game-web.js) const TOOL_TEMPLATES = [ { id: 'summarizer', name: '📝 Text Summarizer', description: 'Summarizes long text into key points', systemPrompt: 'You are a text summarization tool. Provide concise summaries.', exampleInput: 'Paste any long article or document...' }, { id: 'translator', name: '🌐 Code Translator', description: 'Translates code between programming languages', systemPrompt: 'You are a code translation tool. Convert code accurately.', exampleInput: 'function hello() { console.log("Hello!"); }' }, { id: 'reviewer', name: '🔍 Code Reviewer', description: 'Reviews code for bugs, style, and improvements', systemPrompt: 'You are a code review tool. Identify issues and suggest fixes.', exampleInput: 'Paste code to review...' }, { id: 'custom', name: '✨ Custom Tool', description: 'Design your own AI tool with a custom system prompt', systemPrompt: '', // Learner provides this exampleInput: '' } ]; // Tool testing sends the configured system prompt + user input to Foundry Local async function testTool(template, userInput) { const response = await callFoundryAPI([ { role: 'system', content: template.systemPrompt }, { role: 'user', content: userInput } ]); console.log(`🔧 Tool output: ${response}`); return response; } This template-based approach is safer and more educational than arbitrary code execution. Learners select a template, customize its system prompt, test it with sample input, and see how the AI responds differently based on the tool's configuration. The "Custom Tool" option lets advanced learners design their own system prompts from scratch. Completing this level marks true understanding—learners aren't just using AI, they're shaping what it can do through prompt design and tool composition. Building the Web Version: Zero-Install Educational Experience The web version demonstrates how to create educational software that requires absolutely zero setup. This is critical for workshops, classroom settings, and casual learners who won't commit to installation until they see value. The architecture is deliberately simple, vanilla JavaScript with ES6 modules, no build tools, no package managers. The HTML includes a multi-screen layout with a welcome screen, level selection grid, game area, and modals for progress, badges, help, and game completion: <!-- game/web/index.html --> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Foundry Local Learning Adventure</title> <link rel="stylesheet" href="styles.css"> </head> <body> <!-- Welcome Screen with name input --> <div id="welcome-screen" class="screen active"> <h1>🎮 Foundry Local Learning Adventure</h1> <p>Master Microsoft Foundry AI - One Level at a Time!</p> <input type="text" id="player-name" placeholder="Enter your name"> <button id="start-btn">Start Adventure</button> <div id="foundry-status"><!-- Auto-detected connection status --></div> </div> <!-- Menu Screen with level grid --> <div id="menu-screen" class="screen"> <div class="level-grid"> <!-- 5 level cards with lock/unlock states --> </div> <div class="stats-bar"> <span id="points-display">0 points</span> <span id="badges-count">0/5 badges</span> </div> </div> <!-- Level Screen with task area --> <div id="level-screen" class="screen"> <div id="level-header"></div> <div id="task-area"><!-- Level-specific UI loads here --></div> <div id="response-area"></div> <div id="hint-area"></div> </div> <!-- Sage Mentor Chat Widget (fixed bottom-right) --> <div id="mentor-chat" class="mentor-widget"> <div class="mentor-header">🧙 Sage (AI Mentor)</div> <div id="mentor-messages"></div> <input type="text" id="mentor-input" placeholder="Ask Sage anything..."> </div> <script type="module" src="game-data.js"></script> <script type="module" src="game-web.js"></script> </body> </html> A critical feature of the web version is its ability to connect to a real Foundry Local instance. On startup, the game checks for a foundry-port.json file (written by the cross-platform start scripts) and falls back to scanning common ports: // game/web/game-web.js - Foundry Local auto-discovery let foundryConnection = { connected: false, baseUrl: null }; async function checkFoundryConnection() { // Try reading port from discovery file (written by start scripts) const discoveredPort = await readDiscoveredPort(); if (discoveredPort) { try { const resp = await fetch(`${discoveredPort}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: discoveredPort }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { /* continue to port scan */ } } // Scan common Foundry Local ports const ports = [61341, 5272, 51319, 5000, 8080]; for (const port of ports) { try { const resp = await fetch(`http://127.0.0.1:${port}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: `http://127.0.0.1:${port}` }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { continue; } } // Demo mode - use simulated responses from DEMO_RESPONSES updateStatusBadge('🟡 Demo Mode (install Foundry Local for real AI)'); } async function callFoundryAPI(messages) { if (!foundryConnection.connected) { return getDemoResponse(messages); // Simulated responses } const resp = await fetch(`${foundryConnection.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'auto', messages, temperature: 0.7 }) }); const data = await resp.json(); return data.choices[0].message.content; } The web version also includes level-specific UIs: each level type has its own builder function that constructs the appropriate interface. For example, Level 2 (Prompt Improvement) shows a split-view with the bad prompt result on one side and the learner's improved prompt on the other. Level 3 (Embeddings) presents a search interface with similarity scores. Level 5 (Tool Builder) offers a template selector with four options (Text Summarizer, Code Translator, Code Reviewer, and Custom). This architecture teaches several patterns for web-based educational tools: LocalStorage for persistence: Progress survives page refreshes without requiring accounts or databases ES6 modules for organization: Clean separation between game data ( game-data.js ) and engine ( game-web.js ) Hybrid AI mode: Real AI when Foundry Local is available, simulated responses when it's not—same code path for both Multi-screen navigation: Welcome, menu, level, and completion screens provide clear progression Always-available mentor: The Sage chat widget in the corner lets learners ask questions at any point Implementing the CLI Version with Real AI Integration The CLI version provides the authentic AI development experience. This version requires Node.js and Foundry Local, but rewards setup effort with genuine model interactions. Installation uses a startup script that handles prerequisites: #!/bin/bash # scripts/start-game.sh echo "🎮 Starting Foundry Local Learning Adventure..." # Check Node.js if ! command -v node &> /dev/null; then echo "❌ Node.js not found. Install from https://nodejs.org/" exit 1 fi # Check Foundry Local if ! command -v foundry &> /dev/null; then echo "❌ Foundry Local not found." echo " Install: winget install Microsoft.FoundryLocal" exit 1 fi # Start Foundry service echo "🚀 Starting Foundry Local service..." foundry service start # Wait for service sleep 2 # Load model echo "📦 Loading Phi-4 model..." foundry model load phi-4 # Install dependencies echo "📥 Installing game dependencies..." npm install # Start game echo "✅ Launching game..." npm start The game logic integrates with Foundry Local using the official SDK: // game/src/game.js import { FoundryLocalClient } from 'foundry-local-sdk'; import readline from 'readline/promises'; const client = new FoundryLocalClient({ endpoint: 'http://127.0.0.1:5272' // Default Foundry Local port }); async function getAIResponse(prompt, level) { try { const startTime = Date.now(); const completion = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are Sage, a friendly AI mentor teaching ${LEVELS[level-1].title}.` }, { role: 'user', content: prompt } ], temperature: 0.7, max_tokens: 300 }); const latency = Date.now() - startTime; console.log(`\n⏱️ AI responded in ${latency}ms`); return completion.choices[0].message.content; } catch (error) { console.error('❌ AI error:', error.message); console.log('💡 Falling back to demo mode...'); return getDemoResponse(prompt, level); } } async function playLevel(levelNumber) { const level = LEVELS[levelNumber - 1]; console.clear(); console.log(`\n${'='.repeat(60)}`); console.log(` Level ${levelNumber}: ${level.title}`); console.log(`${'='.repeat(60)}\n`); console.log(`🎯 ${level.objective}\n`); console.log(`📚 ${level.description}\n`); const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const userPrompt = await rl.question('Your prompt: '); rl.close(); console.log('\n🤖 AI is thinking...'); const response = await getAIResponse(userPrompt, levelNumber); console.log(`\n📨 AI Response:\n${response}\n`); // Evaluate success if (level.successCriteria(response, userPrompt)) { celebrateSuccess(level); updateProgress(levelNumber); if (levelNumber < 5) { const playNext = await askYesNo('Play next level?'); if (playNext) { await playLevel(levelNumber + 1); } } else { showGameComplete(); } } else { console.log(`\n💡 Hint: ${level.hints[0]}\n`); const retry = await askYesNo('Try again?'); if (retry) { await playLevel(levelNumber); } } } The CLI version adds several enhancements that deepen learning: Latency visibility: Display response times so learners understand local vs cloud performance differences Graceful fallback: If Foundry Local fails, switch to demo mode automatically rather than crashing Interactive prompts: Use readline for natural command-line interaction patterns Progress persistence: Save to JSON files so learners can pause and resume Command history: Log all prompts and responses for learners to review their progression Key Takeaways and Educational Design Principles Building effective educational software for technical audiences requires balancing several competing concerns: accessibility vs authenticity, simplicity vs depth, guidance vs exploration. The Foundry Local Learning Adventure succeeds by making deliberate architectural choices that prioritize learner experience. Key principles demonstrated: Zero-friction starts win: The web version eliminates all setup barriers, maximizing the chance learners will actually begin Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Progressive challenge curves build confidence: Each level introduces exactly one new concept, building on previous knowledge Immediate feedback accelerates learning: Learners know instantly if they succeeded, with Sage providing contextual explanations Real tools create transferable skills: The CLI version uses professional developer patterns (OpenAI-compatible REST APIs, ES modules, readline) that apply beyond the game Celebration creates emotional investment: Badges, points, and Sage's encouragement transform learning into achievement Dual platforms expand reach: Web attracts casual learners, CLI converts them to serious practitioners—and both support real AI Graceful degradation ensures reliability: Three connection modes (local, Azure, demo) mean the game always works regardless of setup To extend this approach for your own educational projects, consider: Domain-specific challenges: Adapt level structure to your technical domain (e.g., API design, database optimization, security practices) Multiplayer competitions: Add leaderboards and time trials to introduce social motivation Adaptive difficulty: Track learner performance and adjust challenge difficulty dynamically Sandbox modes: After completing the curriculum, provide free-play areas for experimentation Community sharing: Let learners share custom levels or challenges they've created The complete implementation with all levels, both web and CLI versions, comprehensive tests, and deployment guides is available at github.com/leestott/FoundryLocal-LearningAdventure. You can play the web version immediately at leestott.github.io/FoundryLocal-LearningAdventure or clone the repository to experience the full CLI version with real AI. Resources and Further Reading Foundry Local Learning Adventure Repository - Complete source code for both web and CLI versions Play Online Now - Try the web version instantly in your browser (supports real AI with Foundry Local installed) Microsoft Foundry Local Documentation - Official SDK and CLI reference Contributing Guide - How to contribute new levels or improvementsBuilding a Smart Building HVAC Digital Twin with AI Copilot Using Foundry Local
Introduction Building operations teams face a constant challenge: optimizing HVAC systems for energy efficiency while maintaining occupant comfort and air quality. Traditional building management systems display raw sensor data, temperatures, pressures, CO₂ levels—but translating this into actionable insights requires deep HVAC expertise. What if operators could simply ask "Why is the third floor so warm?" and get an intelligent answer grounded in real building state? This article demonstrates building a sample smart building digital twin with an AI-powered operations copilot, implemented using DigitalTwin, React, Three.js, and Microsoft Foundry Local. You'll learn how to architect physics-based simulators that model thermal dynamics, implement 3D visualizations of building systems, integrate natural language AI control, and design fault injection systems for testing and training. Whether you're building IoT platforms for commercial real estate, designing energy management systems, or implementing predictive maintenance for building automation, this sample provides proven patterns for intelligent facility operations. Why Digital Twins Matter for Building Operations Physical buildings generate enormous operational data but lack intelligent interpretation layers. A 50,000 square foot office building might have 500+ sensors streaming metrics every minute, zone temperatures, humidity levels, equipment runtimes, energy consumption. Traditional BMS (Building Management Systems) visualize this data as charts and gauges, but operators must manually correlate patterns, diagnose issues, and predict failures. Digital twins solve this through physics-based simulation coupled with AI interpretation. Instead of just displaying current temperature readings, a digital twin models thermal dynamics, heat transfer rates, HVAC response characteristics, occupancy impacts. When conditions deviate from expectations, the twin compares observed versus predicted states, identifying root causes. Layer AI on top, and operators get natural language explanations: "The conference room is 3 degrees too warm because the VAV damper is stuck at 40% open, reducing airflow by 60%." This application focuses on HVAC, the largest building energy consumer, typically 40-50% of total usage. Optimizing HVAC by just 10% through better controls can save thousands of dollars monthly while improving occupant satisfaction. The digital twin enables "what-if" scenarios before making changes: "What happens to energy consumption and comfort if we raise the cooling setpoint by 2 degrees during peak demand response events?" Architecture: Three-Tier Digital Twin System The application implements a clean three-tier architecture separating visualization, simulation, and state management: The frontend uses React with Three.js for 3D visualization. Users see an interactive 3D model of the three-floor building with color-coded zones indicating temperature and CO₂ levels. Click any equipment, AHUs, VAVs, chillers, to see detailed telemetry. The control panel enables adjusting setpoints, running simulation steps, and activating demand response scenarios. Real-time charts display KPIs: energy consumption, comfort compliance, air quality levels. The backend Node.js/Express server orchestrates simulation and state management. It maintains the digital twin state as JSON, the single source of truth for all equipment, zones, and telemetry. REST API endpoints handle control requests, simulation steps, and AI copilot queries. WebSocket connections push real-time updates to the frontend for live monitoring. The HVAC simulator implements physics-based models: 1R1C thermal models for zones, affinity laws for fan power, chiller COP calculations, CO₂ mass balance equations. Foundry Local provides AI copilot capabilities. The backend uses foundry-local-sdk to query locally running models. Natural language queries ("How's the lobby temperature?") get answered with building state context. The copilot can explain anomalies, suggest optimizations, and even execute commands when explicitly requested. Implementing Physics-Based HVAC Simulation Accurate simulation requires modeling actual HVAC physics. The simulator implements several established building energy models: // backend/src/simulator/thermal-model.js class ZoneThermalModel { // 1R1C (one resistance, one capacitance) thermal model static calculateTemperatureChange(zone, delta_t_seconds) { const C_thermal = zone.volume * 1.2 * 1000; // Heat capacity (J/K) const R_thermal = zone.r_value * zone.envelope_area; // Thermal resistance // Internal heat gains (occupancy, equipment, lighting) const Q_internal = zone.occupancy * 100 + // 100W per person zone.equipment_load + zone.lighting_load; // Cooling/heating from HVAC const airflow_kg_s = zone.vav.airflow_cfm * 0.0004719; // CFM to kg/s const c_p_air = 1006; // Specific heat of air (J/kg·K) const Q_hvac = airflow_kg_s * c_p_air * (zone.vav.supply_temp - zone.temperature); // Envelope losses const Q_envelope = (zone.outdoor_temp - zone.temperature) / R_thermal; // Net energy balance const Q_net = Q_internal + Q_hvac + Q_envelope; // Temperature change: Q = C * dT/dt const dT = (Q_net / C_thermal) * delta_t_seconds; return zone.temperature + dT; } } This model captures essential thermal dynamics while remaining computationally fast enough for real-time simulation. It accounts for internal heat generation from occupants and equipment, HVAC cooling/heating contributions, and heat loss through the building envelope. The CO₂ model uses mass balance equations: class AirQualityModel { static calculateCO2Change(zone, delta_t_seconds) { // CO₂ generation from occupants const G_co2 = zone.occupancy * 0.0052; // L/s per person at rest // Outdoor air ventilation rate const V_oa = zone.vav.outdoor_air_cfm * 0.000471947; // CFM to m³/s // CO₂ concentration difference (indoor - outdoor) const delta_CO2 = zone.co2_ppm - 400; // Outdoor ~400ppm // Mass balance: dC/dt = (G - V*ΔC) / Volume const dCO2_dt = (G_co2 - V_oa * delta_CO2) / zone.volume; return zone.co2_ppm + (dCO2_dt * delta_t_seconds); } } These models execute every simulation step, updating the entire building state: async function simulateStep(twin, timestep_minutes) { const delta_t = timestep_minutes * 60; // Convert to seconds // Update each zone for (const zone of twin.zones) { zone.temperature = ZoneThermalModel.calculateTemperatureChange(zone, delta_t); zone.co2_ppm = AirQualityModel.calculateCO2Change(zone, delta_t); } // Update equipment based on zone demands for (const vav of twin.vavs) { updateVAVOperation(vav, twin.zones); } for (const ahu of twin.ahus) { updateAHUOperation(ahu, twin.vavs); } updateChillerOperation(twin.chiller, twin.ahus); updateBoilerOperation(twin.boiler, twin.ahus); // Calculate system KPIs twin.kpis = calculateSystemKPIs(twin); // Detect alerts twin.alerts = detectAnomalies(twin); // Persist updated state await saveTwinState(twin); return twin; } 3D Visualization with React and Three.js The frontend renders an interactive 3D building view that updates in real-time as conditions change. Using React Three Fiber simplifies Three.js integration with React's component model: // frontend/src/components/BuildingView3D.jsx import { Canvas } from '@react-three/fiber'; import { OrbitControls } from '@react-three/drei'; export function BuildingView3D({ twinState }) { return ( {/* Render building floors */} {twinState.zones.map(zone => ( selectZone(zone.id)} /> ))} {/* Render equipment */} {twinState.ahus.map(ahu => ( ))} ); } function ZoneMesh({ zone, onClick }) { const color = getTemperatureColor(zone.temperature, zone.setpoint); return ( ); } function getTemperatureColor(current, setpoint) { const deviation = current - setpoint; if (Math.abs(deviation) < 1) return '#00ff00'; // Green: comfortable if (Math.abs(deviation) < 3) return '#ffff00'; // Yellow: acceptable return '#ff0000'; // Red: uncomfortable } This visualization immediately shows building state at a glance, operators see "hot spots" in red, comfortable zones in green, and can click any area for detailed metrics. Integrating AI Copilot for Natural Language Control The AI copilot transforms building data into conversational insights. Instead of navigating multiple screens, operators simply ask questions: // backend/src/routes/copilot.js import { FoundryLocalClient } from 'foundry-local-sdk'; const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); router.post('/api/copilot/chat', async (req, res) => { const { message } = req.body; // Load current building state const twin = await loadTwinState(); // Build context for AI const context = buildBuildingContext(twin); const completion = await foundry.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are an HVAC operations assistant for a 3-floor office building. Current Building State: ${context} Answer questions about equipment status, comfort conditions, and energy usage. Provide specific, actionable information based on the current data. Do not speculate beyond provided information.` }, { role: 'user', content: message } ], temperature: 0.3, max_tokens: 300 }); res.json({ response: completion.choices[0].message.content, model: 'phi-4', timestamp: new Date().toISOString() }); }); function buildBuildingContext(twin) { const alerts = twin.alerts.filter(a => a.active); const zoneStatus = twin.zones.map(z => `${z.name}: ${z.temperature.toFixed(1)}°F (setpoint: ${z.setpoint}°F), CO₂: ${z.co2_ppm}ppm, ${z.occupancy} people` ); return ` Zones: ${zoneStatus.join('\n')} Active Alerts: ${alerts.length} ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Energy: ${twin.kpis.total_power_kw.toFixed(1)} kW Chiller: ${twin.chiller.percent_load}% load, ${twin.chiller.cop.toFixed(2)} COP `.trim(); } Example interactions: User: "Why is the third floor conference room warm?" AI: "The Executive Suite on Floor 3 is currently 75.2°F, 3 degrees above the 72°F setpoint. This is due to high occupancy (8 people) and VAV-F3-EXEC operating at reduced airflow (285 CFM vs 450 CFM capacity). AHU-2 is supplying adequate cool air at 55°F. Recommend checking VAV damper position—it may be partially stuck." User: "What's our current energy usage?" AI: "Total building power consumption is 47.3 kW. Breakdown: Chiller (24.1 kW, 51%), AHU fans (12.8 kW, 27%), Pumps (7.2 kW, 15%), Other (3.2 kW, 7%). Chiller is operating at 62% load with a COP of 4.8, which is efficient for current conditions." User: "Activate demand response level 2" AI: "I cannot directly control building systems for safety reasons. To activate DR Level 2, use the Control Panel and click 'Demand Response' → 'Level 2'. This will raise cooling setpoints by 3°F and reduce auxiliary loads, targeting 15% energy reduction." The AI provides grounded, specific answers citing actual equipment IDs and metrics. It refuses to directly execute control commands, instead guiding operators to explicit control interfaces, a critical safety pattern for building systems. Fault Injection for Testing and Training Real building operations experience equipment failures, stuck dampers, sensor drift, communication losses. The digital twin includes comprehensive fault injection capabilities to train operators and test control logic: // backend/src/simulator/fault-injector.js const FAULT_CATALOG = { chillerFailure: { description: 'Chiller compressor failure', apply: (twin) => { twin.chiller.status = 'FAULT'; twin.chiller.cooling_output = 0; twin.alerts.push({ id: 'chiller-fault', severity: 'CRITICAL', message: 'Chiller compressor failure - no cooling available', equipment: 'CHILLER-01' }); } }, stuckVAVDamper: { description: 'VAV damper stuck at current position', apply: (twin, vavId) => { const vav = twin.vavs.find(v => v.id === vavId); vav.damper_stuck = true; vav.damper_position_fixed = vav.damper_position; twin.alerts.push({ id: `vav-stuck-${vavId}`, severity: 'HIGH', message: `VAV ${vavId} damper stuck at ${vav.damper_position}%`, equipment: vavId }); } }, sensorDrift: { description: 'Temperature sensor reading 5°F high', apply: (twin, zoneId) => { const zone = twin.zones.find(z => z.id === zoneId); zone.sensor_drift = 5.0; zone.temperature_measured = zone.temperature_actual + 5.0; } }, communicationLoss: { description: 'Equipment communication timeout', apply: (twin, equipmentId) => { const equipment = findEquipmentById(twin, equipmentId); equipment.comm_status = 'OFFLINE'; equipment.stale_data = true; twin.alerts.push({ id: `comm-loss-${equipmentId}`, severity: 'MEDIUM', message: `Lost communication with ${equipmentId}`, equipment: equipmentId }); } } }; router.post('/api/twin/fault', async (req, res) => { const { faultType, targetEquipment } = req.body; const twin = await loadTwinState(); const fault = FAULT_CATALOG[faultType]; if (!fault) { return res.status(400).json({ error: 'Unknown fault type' }); } fault.apply(twin, targetEquipment); await saveTwinState(twin); res.json({ message: `Applied fault: ${fault.description}`, affectedEquipment: targetEquipment, timestamp: new Date().toISOString() }); }); Operators can inject faults to practice diagnosis and response. Training scenarios might include: "The chiller just failed during a heat wave, how do you maintain comfort?" or "Multiple VAV dampers are stuck, which zones need immediate attention?" Key Takeaways and Production Deployment Building a physics-based digital twin with AI capabilities requires balancing simulation accuracy with computational performance, providing intuitive visualization while maintaining technical depth, and enabling AI assistance without compromising safety. Key architectural lessons: Physics models enable prediction: Comparing predicted vs observed behavior identifies anomalies that simple thresholds miss 3D visualization improves spatial understanding: Operators immediately see which floors or zones need attention AI copilots accelerate diagnosis: Natural language queries get answers in seconds vs. minutes of manual data examination Fault injection validates readiness: Testing failure scenarios prepares operators for real incidents JSON state enables integration: Simple file-based state makes connecting to real BMS systems straightforward For production deployment, connect the twin to actual building systems via BACnet, Modbus, or MQTT integrations. Replace simulated telemetry with real sensor streams. Calibrate model parameters against historical building performance. Implement continuous learning where the twin's predictions improve as it observes actual building behavior. The complete implementation with simulation engine, 3D visualization, AI copilot, and fault injection system is available at github.com/leestott/DigitalTwin. Clone the repository and run the startup scripts to explore the digital twin, no building hardware required. Resources and Further Reading Smart Building HVAC Digital Twin Repository - Complete source code and simulation engine Setup and Quick Start Guide - Installation instructions and usage examples Microsoft Foundry Local Documentation - AI integration reference HVAC Simulation Documentation - Physics model details and calibration Three.js Documentation - 3D visualization framework ASHRAE Standards - Building energy modeling standardsAgents League: Join the Reasoning Agents Track
In a previous blog post, we introduced Agents League, a two‑week AI agent challenge running February 16–27, and gave an overview of the three available tracks. In this post, we’ll zoom in on one of them in particular:🧠 The Reasoning Agents track, built on Microsoft Foundry. If you’re interested in multi‑step reasoning, planning, verification, and multi‑agent collaboration, this is the track designed for you. What Do We Mean by “Reasoning Agents”? Reasoning agents go beyond simple prompt–response interactions. They are agents that can: Plan how to approach a task Break problems into steps Reason across intermediate results Verify or critique their own outputs Collaborate with other agents to solve more complex problems With Microsoft Foundry (via UI or SDK) and/or the Microsoft Agent Framework, you can design agent systems that reflect real‑world decision‑making patterns—closer to how teams of humans work together. Why Build Reasoning Agents on Microsoft Foundry? Microsoft Foundry provides production‑ready building blocks for agentic systems, without locking you into a single way of working. For the Reasoning Agents track, Foundry enables you to: Define agent roles (planner, executor, verifier, critic, etc.) Orchestrate multi‑agent workflows Integrate tools, APIs, and MCP servers Apply structured reasoning patterns Observe and debug agent behavior as it runs You can work visually in the Foundry UI, programmatically via the SDK, or mix both approaches depending on your project. How to get started? Your first step to enter the arena is registering to the Agents League challenge: https://aka.ms/agentsleague/register. After you registered, navigate to the Reasoning Agent Starter Kit, to get more context about the challenge scenario, an example of multi-agent architecture to address it, along with some guidelines on the tech stack to use and useful resources to get started. There’s no single “correct” project, feel free to unleash your creativity and leverage AI-assisted development tools to accelerate your build process (e.g. GitHub Copilot). 👉 View the Reasoning Agents starter kit: https://github.com/microsoft/agentsleague/starter-kits Live Coding Battle: Reasoning Agents 📽️ Wednesday, Feb 18 – 9:00 AM PT During Week 1, we’re hosting a live coding battle dedicated entirely to the Reasoning Agents track. You’ll watch experienced developers from the community: Design agent architectures live Explain reasoning strategies and trade‑offs Make real‑time decisions about agent roles, tools, and flows The session is streamed on Microsoft Reactor and recorded, so you can watch it live (highly recommended for the best experience!) or later at your convenience. AMA Session on Discord 💬 Wednesday, Feb 25 – 9:00 AM PT In Week 2, it’s your turn to build—and ask questions. Join the Reasoning Agents AMA on Discord to: Ask about agent architecture and reasoning patterns Get clarification on Foundry capabilities Discuss MCP integration and multi‑agent design Get unstuck when your agent doesn’t behave as expected Prizes, Badges, and Recognition 🏆 $500 for the Reasoning Agents track winner 🎖️ Digital badge for everyone who registers and submits a project Important reminder: 👉 You must register before submitting to be eligible for prizes and the badge. Beyond the rewards, every participant receives feedback from Microsoft product teams, which is often the most valuable prize of all. Ready to Build Agents That Reason? If you’ve been curious about: Agentic architectures Multi‑step reasoning Verification and self‑reflection Building AI systems that explain their thinking …then the Reasoning Agents track is your arena. 📝 Register here: https://aka.ms/agentsleague/register 💬 Join Discord: https://aka.ms/agentsleague/discord 📽️ Watch live battles: https://aka.ms/agentsleague/battles The league starts February 16. The reasoning begins now.Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema
Current Situation The synthetic dataset created from AI Foundry Data Synthetic Data is generated in the following messages format { "messages": [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": "What is the primary purpose?" }, { "role": "assistant", "content": "The primary purpose is..." } ] } Challenge When attempting evaluation, especially RAG evaluation, the documentation indicates that the dataset must contain structured fields such as question - The query being asked ground_truth - The expected answer Recommended additional fields reference_context metadata Example required format { "question": "", "ground_truth": "", "reference_context": "", "metadata": { "document": "" } } Because the synthetic dataset is in messages format, I am unable to directly map it to the required evaluation schema. Question Is there a recommended or supported way to convert the synthetic dataset generated in AI Foundry messages format into the structured format required for evaluation? Can the user role be mapped to question? Can the assistant role be mapped to ground_truth? Is there any built in transformation option within AI Foundry?37Views0likes0CommentsBuilding Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.Is there a way to connect 2 Ai foundry to the same cosmos containers?
I defined Azure AI Foundry Connection for Azure Cosmos DB and BYO Thread Storage in Azure AI Agent Service by using these instructions: Integration with Azure AI Agent Service - Azure Cosmos DB for NoSQL | Microsoft Learn I see that it created 3 containers under the cosmos I provided: <guid>-agent-entity-store v-system-thread-message-store <guid>-thread-message-store Now I created another AI foundry and added a connection for the same AI foundry, and it created 3 different containers under the same DB. Is there a way that they'll use the same exact containers? I want to use multiple AI foundries, and they will use the same Cosmos containers to manage the data.41Views0likes0CommentsAgents League: Two Weeks, Three Tracks, One Challenge
We're inviting all developers to join Agents League, running February 16-27. It's a two-week challenge where you'll build AI agents using production-ready tools, learn from live coding sessions, and get feedback directly from Microsoft product teams. We've put together starter kits for each track to help you get up and running quickly that also includes requirements and guidelines. Whether you want to explore what GitHub Copilot can do beyond autocomplete, build reasoning agents on Microsoft Foundry, or create enterprise integrations for Microsoft 365 Copilot, we have a track for you. Important: Register first to be eligible for prizes and your digital badge. Without registration, you won't qualify for awards or receive a badge when you submit. What Is Agents League? It's a 2-week competition that combines learning with building: 📽️ Live coding battles – Watch Product teams, MVPs and community members tackle challenges in real-time on Microsoft Reactor 💻 Async challenges – Build at your own pace, on your schedule 💬 Discord community – Connect with other participants, join AMAs, and get help when you need it 🏆 Prizes – $500 per track winner, plus GitHub Copilot Pro subscriptions for top picks The Three Tracks 🎨 Creative Apps — Build with GitHub Copilot (Chat, CLI, or SDK) 🧠 Reasoning Agents — Build with Microsoft Foundry 💼 Enterprise Agents — Build with M365 Agents Toolkit (or Copilot Studio) More details on each track below, or jump straight to the starter kits. The Schedule Agents League starts on February 16th and runs through Feburary 27th. Within 2 weeks, we host live battles on Reactor and AMA sessions on Discord. Week 1: Live Battles (Feb 17-19) We're kicking off with live coding battles streamed on Microsoft Reactor. Watch experienced developers compete in real-time, explaining their approach and architectural decisions as they go. Tue Feb 17, 9 AM PT — 🎨 Creative Apps battle Wed Feb 18, 9 AM PT — 🧠 Reasoning Agents battle Thu Feb 19, 9 AM PT — 💼 Enterprise Agents battle All sessions are recorded, so you can watch on your own schedule. Week 2: Build + AMAs (Feb 24-26) This is your time to build and ask questions on Discord. The async format means you work when it suits you, evenings, weekends, whatever fits your schedule. We're also hosting AMAs on Discord where you can ask questions directly to Microsoft experts and product teams: Tue Feb 24, 9 AM PT — 🎨 Creative Apps AMA Wed Feb 25, 9 AM PT — 🧠 Reasoning Agents AMA Thu Feb 26, 9 AM PT — 💼 Enterprise Agents AMA Bring your questions, get help when you're stuck, and share what you're building with the community. Pick Your Track We've created a starter kit for each track with setup guides, project ideas, and example scenarios to help you get started quickly. 🎨 Creative Apps Tool: GitHub Copilot (Chat, CLI, or SDK) Build innovative, imaginative applications that showcase the potential of AI-assisted development. All application types are welcome, web apps, CLI tools, games, mobile apps, desktop applications, and more. The starter kit walks you through GitHub Copilot's different modes and provides prompting tips to get the best results. View the Creative Apps starter kit. 🧠 Reasoning Agents Tool: Microsoft Foundry (UI or SDK) and/or Microsoft Agent Framework Build a multi-agent system that leverages advanced reasoning capabilities to solve complex problems. This track focuses on agents that can plan, reason through multi-step problems, and collaborate. The starter kit includes architecture patterns, reasoning strategies (planner-executor, critic/verifier, self-reflection), and integration guides for tools and MCP servers. View the Reasoning Agents starter kit. 💼 Enterprise Agents Tool: M365 Agents Toolkit or Copilot Studio Create intelligent agents that extend Microsoft 365 Copilot to address real-world enterprise scenarios. Your agent must work on Microsoft 365 Copilot Chat. Bonus points for: MCP server integration, OAuth security, Adaptive Cards UI, connected agents (multi-agent architecture). View the Enterprise Agents starter kit. Prizes & Recognition To be eligible for prizes and your digital badge, you must register before submitting your project. Category Winners ($500 each): 🎨 Creative Apps winner 🧠 Reasoning Agents winner 💼 Enterprise Agents winner GitHub Copilot Pro subscriptions: Community Favorite (voted by participants on Discord) Product Team Picks (selected by Microsoft product teams) Everyone who registers and submits a project wins: A digital badge to showcase their participation. Beyond the prizes, every participant gets feedback from the teams who built these tools, a valuable opportunity to learn and improve your approach to AI agent development. How to Get Started Register first — This is required to be eligible for prizes and to receive your digital badge. Without registration, your submission won't qualify for awards or a badge. Pick a track — Choose one track. Explore the starter kits to help you decide. Watch the battles — See how experienced developers approach these challenges. Great for learning even if you're still deciding whether to compete. Build your project — You have until Feb 27. Work on your own schedule. Submit via GitHub — Open an issue using the project submission template. Join us on Discord — Get help, share your progress, and vote for your favorite projects on Discord. Links Register: https://aka.ms/agentsleague/register Starter Kits: https://github.com/microsoft/agentsleague/starter-kits Discord: https://aka.ms/agentsleague/discord Live Battles: https://aka.ms/agentsleague/battles Submit Project: Project submission template