ai
244 TopicsBuilding HIPAA-Compliant Medical Transcription with Local AI
Building HIPAA-Compliant Medical Transcription with Local AI Introduction Healthcare organizations generate vast amounts of spoken content, patient consultations, research interviews, clinical notes, medical conferences. Transcribing these recordings traditionally requires either manual typing (time-consuming and expensive) or cloud transcription services (creating immediate HIPAA compliance concerns). Every audio file sent to external APIs exposes Protected Health Information (PHI), requires Business Associate Agreements, creates audit trails on third-party servers, and introduces potential breach vectors. This sample solution lies in on-premises voice-to-text systems that process audio entirely locally, never sending PHI beyond organizational boundaries. This article demonstrates building a sample medical transcription application using FLWhisper, ASP.NET Core, C#, and Microsoft Foundry Local with OpenAI Whisper models. You'll learn how to build sample HIPAA-compliant audio processing, integrate Whisper models for medical terminology accuracy, design privacy-first API patterns, and build responsive web UIs for healthcare workflows. Whether you're developing electronic health record (EHR) integrations, building clinical research platforms, or implementing dictation systems for medical practices, this sample could be a great starting point for privacy-first speech recognition. Why Local Transcription Is Critical for Healthcare Healthcare data handling is fundamentally different from general business data due to HIPAA regulations, state privacy laws, and professional ethics obligations. Understanding these requirements explains why cloud transcription services, despite their convenience, create unacceptable risks for medical applications. HIPAA compliance mandates strict controls over PHI. Every system that touches patient data must implement administrative, physical, and technical safeguards. Cloud transcription APIs require Business Associate Agreements (BAAs), but even with paperwork, you're entrusting PHI to external systems. Every API call creates logs on vendor servers, potentially in multiple jurisdictions. Data breaches at transcription vendors expose patient information, creating liability for healthcare organizations. On-premises processing eliminates these third-party risks entirely, PHI never leaves your controlled environment. US State laws increasingly add requirements beyond HIPAA. California's CCPA, New York's SHIELD Act, and similar legislation create additional compliance obligations. International regulations like GDPR prohibit transferring health data outside approved jurisdictions. Local processing simplifies compliance by keeping data within organizational boundaries. Research applications face even stricter requirements. Institutional Review Boards (IRBs) often require explicit consent for data sharing with external parties. Cloud transcription may violate study protocols that promise "no third-party data sharing." Clinical trials in pharmaceutical development handle proprietary information alongside PHI, double jeopardy for data exposure. Local transcription maintains research integrity while enabling audio analysis. Cost considerations favor local deployment at scale. Medical organizations generate substantial audio, thousands of patient encounters monthly. Cloud APIs charge per minute of audio, creating significant recurring costs. Local models have fixed infrastructure costs that scale economically. A modest GPU server can process hundreds of hours monthly at predictable expense. Latency matters for clinical workflows. Doctors and nurses need transcriptions available immediately after patient encounters to review and edit while details are fresh. Cloud APIs introduce network delays, especially problematic in rural health facilities with limited connectivity. Local inference provides <1 second turnaround for typical consultation lengths. Application Architecture: ASP.NET Core with Foundry Local The sample FLWhisper application implements clean separation between audio handling, AI inference, and state management using modern .NET patterns: The ASP.NET Core 10 minimal API provides HTTP endpoints for health checks, audio transcription, and sample file streaming. Minimal APIs reduce boilerplate while maintaining full middleware support for error handling, authentication, and CORS. The API design follows OpenAI's transcription endpoint specification, enabling drop-in replacement for existing integrations. The service layer encapsulates business logic: FoundryModelService manages model loading and lifetime, TranscriptionService handles audio processing and AI inference, and SampleAudioService provides demonstration files for testing. This separation enables easy testing, dependency injection, and service swapping. Foundry Local integration uses the Microsoft.AI.Foundry.Local.WinML SDK. Unlike cloud APIs requiring authentication and network calls, this SDK communicates directly with the local Foundry service via in-process calls. Models load once at startup, remaining resident in memory for sub-second inference on subsequent requests. The static file frontend delivers vanilla HTML/CSS/JavaScript, no framework overhead. This simplicity aids healthcare IT security audits and enables deployment on locked-down hospital networks. The UI provides file upload, sample selection, audio preview, transcription requests, and result display with copy-to-clipboard functionality. Here's the architectural flow for transcription requests: Web UI (Upload Audio File) ↓ POST /v1/audio/transcriptions (Multipart Form Data) ↓ ASP.NET Core API Route ↓ TranscriptionService.TranscribeAudio(audioStream) ↓ Foundry Local Model (Whisper Medium locally) ↓ Text Result + Metadata (language, duration) ↓ Return JSON/Text Response ↓ Display in UI This architecture embodies several healthcare system design principles: Data never leaves the device: All processing occurs on-premises, no external API calls No data persistence by default: Audio and transcripts are session-only, never saved unless explicitly configured Comprehensive health checks: System readiness verification before accepting PHI Audit logging support: Structured logging for compliance documentation Graceful degradation: Clear error messages when models unavailable rather than silent failures Setting Up Foundry Local with Whisper Models Foundry Local supports multiple Whisper model sizes, each with different accuracy/speed tradeoffs. For medical transcription, accuracy is paramount—misheard drug names or dosages create patient safety risks: # Install Foundry Local (Windows) winget install Microsoft.FoundryLocal # Verify installation foundry --version # Download Whisper Medium model (optimal for medical accuracy) foundry model add openai-whisper-medium-generic-cpu:1 # Check model availability foundry model list Whisper Medium (769M parameters) provides the best balance for medical use. Smaller models (Tiny, Base) miss medical terminology frequently. Larger models (Large) offer marginal accuracy gains at 3x inference time. Medium handles medical vocabulary well, drug names, anatomical terms, procedure names, while processing typical consultation audio (5-10 minutes) in under 30 seconds. The application detects and loads the model automatically: // Services/FoundryModelService.cs using Microsoft.AI.Foundry.Local.WinML; public class FoundryModelService { private readonly ILogger _logger; private readonly FoundryOptions _options; private ILocalAIModel? _loadedModel; public FoundryModelService( ILogger logger, IOptions options) { _logger = logger; _options = options.Value; } public async Task InitializeModelAsync() { try { _logger.LogInformation( "Loading Foundry model: {ModelAlias}", _options.ModelAlias ); // Load model from Foundry Local _loadedModel = await FoundryClient.LoadModelAsync( modelAlias: _options.ModelAlias, cancellationToken: CancellationToken.None ); if (_loadedModel == null) { _logger.LogWarning("Model loaded but returned null instance"); return false; } _logger.LogInformation( "Successfully loaded model: {ModelAlias}", _options.ModelAlias ); return true; } catch (Exception ex) { _logger.LogError( ex, "Failed to load Foundry model: {ModelAlias}", _options.ModelAlias ); return false; } } public ILocalAIModel? GetLoadedModel() => _loadedModel; public async Task UnloadModelAsync() { if (_loadedModel != null) { await FoundryClient.UnloadModelAsync(_loadedModel); _loadedModel = null; _logger.LogInformation("Model unloaded"); } } } Configuration lives in appsettings.json , enabling easy customization without code changes: { "Foundry": { "ModelAlias": "whisper-medium", "LogLevel": "Information" }, "Transcription": { "MaxAudioDurationSeconds": 300, "SupportedFormats": ["wav", "mp3", "m4a", "flac"], "DefaultLanguage": "en" } } Implementing Privacy-First Transcription Service The transcription service handles audio processing while maintaining strict privacy controls. No audio or transcript persists beyond the HTTP request lifecycle unless explicitly configured: // Services/TranscriptionService.cs public class TranscriptionService { private readonly FoundryModelService _modelService; private readonly ILogger _logger; public async Task TranscribeAudioAsync( Stream audioStream, string originalFileName, TranscriptionOptions? options = null) { options ??= new TranscriptionOptions(); var startTime = DateTime.UtcNow; try { // Validate audio format ValidateAudioFormat(originalFileName); // Get loaded model var model = _modelService.GetLoadedModel(); if (model == null) { throw new InvalidOperationException("Whisper model not loaded"); } // Create temporary file (automatically deleted after transcription) using var tempFile = new TempAudioFile(audioStream); // Execute transcription _logger.LogInformation( "Starting transcription for file: {FileName}", originalFileName ); var transcription = await model.TranscribeAsync( audioFilePath: tempFile.Path, language: options.Language, cancellationToken: CancellationToken.None ); var duration = (DateTime.UtcNow - startTime).TotalSeconds; _logger.LogInformation( "Transcription completed in {Duration:F2}s", duration ); return new TranscriptionResult { Text = transcription.Text, Language = transcription.Language ?? options.Language, Duration = transcription.AudioDuration, ProcessingTimeSeconds = duration, FileName = originalFileName, Timestamp = DateTime.UtcNow }; } catch (Exception ex) { _logger.LogError( ex, "Transcription failed for file: {FileName}", originalFileName ); throw; } } private void ValidateAudioFormat(string fileName) { var extension = Path.GetExtension(fileName).TrimStart('.'); var supportedFormats = new[] { "wav", "mp3", "m4a", "flac", "ogg" }; if (!supportedFormats.Contains(extension.ToLowerInvariant())) { throw new ArgumentException( $"Unsupported audio format: {extension}. " + $"Supported: {string.Join(", ", supportedFormats)}" ); } } } // Temporary file wrapper that auto-deletes internal class TempAudioFile : IDisposable { public string Path { get; } public TempAudioFile(Stream sourceStream) { Path = System.IO.Path.GetTempFileName(); using var fileStream = File.OpenWrite(Path); sourceStream.CopyTo(fileStream); } public void Dispose() { try { if (File.Exists(Path)) { File.Delete(Path); } } catch { // Ignore deletion errors in temp folder } } } This service demonstrates several privacy-first patterns: Temporary file lifecycle management: Audio written to temp storage, automatically deleted after transcription No implicit persistence: Results returned to caller, not saved by service Format validation: Accept only supported audio formats to prevent processing errors Comprehensive logging: Audit trail for compliance without logging PHI content Error isolation: Exceptions contain diagnostic info but no patient data Building the OpenAI-Compatible REST API The API endpoint mirrors OpenAI's transcription API specification, enabling existing integrations to work without modifications: // Program.cs var builder = WebApplication.CreateBuilder(args); // Configure services builder.Services.Configure( builder.Configuration.GetSection("Foundry") ); builder.Services.AddSingleton(); builder.Services.AddScoped(); builder.Services.AddHealthChecks() .AddCheck("foundry-health"); var app = builder.Build(); // Load model at startup var modelService = app.Services.GetRequiredService(); await modelService.InitializeModelAsync(); app.UseHealthChecks("/health"); app.MapHealthChecks("/api/health/status"); // OpenAI-compatible transcription endpoint app.MapPost("/v1/audio/transcriptions", async ( HttpRequest request, TranscriptionService transcriptionService, ILogger logger) => { if (!request.HasFormContentType) { return Results.BadRequest(new { error = "Content-Type must be multipart/form-data" }); } var form = await request.ReadFormAsync(); // Extract audio file var audioFile = form.Files.GetFile("file"); if (audioFile == null || audioFile.Length == 0) { return Results.BadRequest(new { error = "Audio file required in 'file' field" }); } // Parse options var format = form["format"].ToString() ?? "text"; var language = form["language"].ToString() ?? "en"; try { // Process transcription using var stream = audioFile.OpenReadStream(); var result = await transcriptionService.TranscribeAudioAsync( audioStream: stream, originalFileName: audioFile.FileName, options: new TranscriptionOptions { Language = language } ); // Return in requested format if (format == "json") { return Results.Json(new { text = result.Text, language = result.Language, duration = result.Duration }); } else { // Default: plain text return Results.Text(result.Text); } } catch (Exception ex) { logger.LogError(ex, "Transcription request failed"); return Results.StatusCode(500); } }) .DisableAntiforgery() // File uploads need CSRF exemption .WithName("TranscribeAudio") .WithOpenApi(); app.Run(); Example API usage: # PowerShell $audioFile = Get-Item "consultation-recording.wav" $response = Invoke-RestMethod ` -Uri "http://localhost:5192/v1/audio/transcriptions" ` -Method Post ` -Form @{ file = $audioFile; format = "json" } Write-Output $response.text # cURL curl -X POST http://localhost:5192/v1/audio/transcriptions \ -F "file=@consultation-recording.wav" \ -F "format=json" Building the Interactive Web Frontend The web UI provides a user-friendly interface for non-technical medical staff to transcribe recordings: SarahCare Medical Transcription The JavaScript handles file uploads and API interactions: // wwwroot/app.js let selectedFile = null; async function checkHealth() { try { const response = await fetch('/health'); const statusEl = document.getElementById('status'); if (response.ok) { statusEl.className = 'status-badge online'; statusEl.textContent = '✓ System Ready'; } else { statusEl.className = 'status-badge offline'; statusEl.textContent = '✗ System Unavailable'; } } catch (error) { console.error('Health check failed:', error); } } function handleFileSelect(event) { const file = event.target.files[0]; if (!file) return; selectedFile = file; // Show file info const fileInfo = document.getElementById('fileInfo'); fileInfo.textContent = `Selected: ${file.name} (${formatFileSize(file.size)})`; fileInfo.classList.remove('hidden'); // Enable audio preview const preview = document.getElementById('audioPreview'); preview.src = URL.createObjectURL(file); preview.classList.remove('hidden'); // Enable transcribe button document.getElementById('transcribeBtn').disabled = false; } async function transcribeAudio() { if (!selectedFile) return; const loadingEl = document.getElementById('loadingIndicator'); const resultEl = document.getElementById('resultSection'); const transcribeBtn = document.getElementById('transcribeBtn'); // Show loading state loadingEl.classList.remove('hidden'); resultEl.classList.add('hidden'); transcribeBtn.disabled = true; try { const formData = new FormData(); formData.append('file', selectedFile); formData.append('format', 'json'); const startTime = Date.now(); const response = await fetch('/v1/audio/transcriptions', { method: 'POST', body: formData }); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const result = await response.json(); const processingTime = ((Date.now() - startTime) / 1000).toFixed(1); // Display results document.getElementById('transcriptionText').value = result.text; document.getElementById('resultDuration').textContent = `Duration: ${result.duration.toFixed(1)}s`; document.getElementById('resultLanguage').textContent = `Language: ${result.language}`; resultEl.classList.remove('hidden'); console.log(`Transcription completed in ${processingTime}s`); } catch (error) { console.error('Transcription failed:', error); alert(`Transcription failed: ${error.message}`); } finally { loadingEl.classList.add('hidden'); transcribeBtn.disabled = false; } } function copyToClipboard() { const text = document.getElementById('transcriptionText').value; navigator.clipboard.writeText(text) .then(() => alert('Copied to clipboard')) .catch(err => console.error('Copy failed:', err)); } // Initialize window.addEventListener('load', () => { checkHealth(); loadSamplesList(); }); Key Takeaways and Production Considerations Building HIPAA-compliant voice-to-text systems requires architectural decisions that prioritize data privacy over convenience. The FLWhisper application demonstrates that you can achieve accurate medical transcription, fast processing times, and intuitive user experiences entirely on-premises. Critical lessons for healthcare AI: Privacy by architecture: Design systems where PHI never exists outside controlled environments, not as a configuration option No persistence by default: Audio and transcripts should be ephemeral unless explicitly saved with proper access controls Model selection matters: Whisper Medium provides medical terminology accuracy that smaller models miss Health checks enable reliability: Systems should verify model availability before accepting PHI Audit logging without content logging: Track operations for compliance without storing sensitive data in logs For production deployment in clinical settings, integrate with EHR systems via HL7/FHIR interfaces. Implement role-based access control with Active Directory integration. Add digital signatures for transcript authentication. Configure automatic PHI redaction using clinical NLP models. Deploy on HIPAA-compliant infrastructure with proper physical security. Implement comprehensive audit logging meeting compliance requirements. The complete implementation with ASP.NET Core API, Foundry Local integration, sample audio files, and comprehensive tests is available at github.com/leestott/FLWhisper. Clone the repository and follow the setup guide to experience privacy-first medical transcription. Resources and Further Reading FLWhisper Repository - Complete C# implementation with .NET 10 Quick Start Guide - Installation and usage instructions Microsoft Foundry Local Documentation - SDK reference and model catalog OpenAI Whisper Documentation - Model architecture and capabilities HIPAA Compliance Guidelines - HHS official guidance Testing Guide - Comprehensive test suite documentationUpcoming webinar: Maximize the Cost Efficiency of AI Agents on Azure
AI agents are quickly becoming central to how organizations automate work, engage customers, and unlock new insights. But as adoption accelerates, so do questions about cost, ROI, and long-term sustainability. That’s exactly what the Maximize the Cost Efficiency of AI Agents on Azure webinar is designed to address. The webinar will provide practical guidance on building and scaling AI agents on Azure with financial discipline in mind. Rather than focusing only on technology, the session helps learners connect AI design decisions to real business outcomes—covering everything from identifying high-impact use cases and understanding cost drivers to forecasting ROI. Whether you’re just starting your AI journey or expanding AI agents across the enterprise, the session will equip you with strategies to make informed, cost-conscious decisions at every stage—from architecture and model selection to ongoing optimization and governance. Who should attend? If you are in one of these roles and are a decision maker or can influence decision makers in AI decisions or need to show ROI metrics on AI, this session is for you. Developer Administrator Solution Architect AI Engineer Business Analyst Business User Technology Manager Why attending the webinar? In the webinar, you’ll hear how to translate theory into real-world scenarios, walk through common cost pitfalls, and show how organizations are applying these principles in practice. Most importantly, the webinar helps you connect the dots faster, turning what you’ve learned into actionable insights you can apply immediately, ask questions live, and gain clarity on how to maximize ROI while scaling AI responsibly. If you care about building AI agents that are not only innovative but also efficient, governable, and financially sustainable, this training—and this webinar that complements it—are well worth your time. Register for the free webinar today for the event on March 5, 2026, 8:00 AM - 9:00 AM (UTC-08:00) Pacific Time (US & Canada). Who will speak at the webinar? Your speakers will be: Carlotta Castelluccio: Carlotta is a Senior AI Advocate with the mission of helping every developer to succeed with AI, by building innovative solutions responsibly. To achieve this goal, she develops technical content, and she hosts skilling sessions, enabling her audience to take the most out of AI technologies and to have an impact on Microsoft AI products’ roadmap. Nitya Narasimhan: Nitya is a PhD and Polyglot with 25+ years of software research & development experience spanning mobile, web, cloud and AI. She is an innovator (12+ patents), a visual storyteller (@sketchtedocs), and an experienced community builder in the Greater New York area. As a senior AI Advocate on the Core AI Developer Relations team, she acts as "developer 0" for the Microsoft Foundry platform, providing product feedback and empowering AI developers to build trustworthy AI solutions with code samples, open-source curricula and content-initiatives like Model Mondays. Prior to joining Microsoft, she spent a decade in Motorola Labs working on ubiquitous & mobile computing research, founded Google Developer Groups in New York, and consulted for startups building real-time experiences for enterprise. Her current interests span Model understanding & customization, E2E Observability & Safety, and agentic AI workflows for maintainable software. Moderator Lee Stott is a Principal Cloud Advocate at Microsoft, working in the Core AI Developer Relations Team. He helps developers and organizations build responsibly with AI and cloud technologies through open-source projects, technical guidance, and global developer programs. Based in the UK, Lee brings deep hands-on experience across AI, Azure, and developer tooling. .Logic Apps Agentic Workflows with SAP - Part 2: AI Agents
Part 2 focuses on the AI-shaped portion of the destination workflows: how the Logic Apps Agent is configured, how it pulls business rules from SharePoint, and how its outputs are converted into concrete workflow artifacts. In Destination workflow #1, the agent produces three structured outputs—an HTML validation summary, a CSV list of InvalidOrderIds , and an Invalid CSV payload—which drive (1) a verification email, (2) an optional RFC call to persist failed rows as IDocs, and (3) a filtered dataset used for the separate analysis step that returns only analysis (or errors) back to SAP. In Destination workflow #2, the same approach is applied to inbound IDocs: the workflow reconstructs CSV from the custom segment, runs AI validation against the same SharePoint rules, and safely appends results to an append blob using a lease-based write pattern for concurrency. 1. Introduction In Part 1, the goal was to make the integration deterministic: stable payload shapes, stable response shapes, and predictable error propagation across SAP and Logic Apps. Concretely, Part 1 established: how SAP reaches Logic Apps (Gateway/Program ID plumbing) the RFC contracts ( IT_CSV , response envelope, RETURN / MESSAGE , EXCEPTIONMSG ) how the source workflow interprets RFC responses (success vs error) how invalid rows can be persisted into SAP as custom IDocs ( Z_CREATE_ONLINEORDER_IDOC ) and how the second destination workflow receives those IDocs asynchronously With that foundation in place, Part 2 narrows in on the part that is not just plumbing: the agent loop, the tool boundaries, and the output schemas that make AI results usable inside a workflow rather than “generated text you still need to interpret.” The diagram below highlights the portion of the destination workflow where AI is doing real work. The red-circled section is the validation agent loop (rules in, structured validation outputs out), which then fans out into operational actions like email notification, optional IDoc persistence, and filtering for the analysis step. What matters here is the shape of the agent outputs and how they are consumed by the rest of the workflow. The agent is not treated as a black box; it is forced to emit typed, workflow-friendly artifacts (summary + invalid IDs + filtered CSV). Those artifacts are then used deterministically: invalid rows are reported (and optionally persisted as IDocs), while valid rows flow into the analysis stage and ultimately back to SAP. What this post covers In this post, I focus on five practical topics: Agent loop design in Logic Apps: tools, message design, and output schemas that make the agent’s results deterministic enough to automate. External rule retrieval: pulling validation rules from SharePoint and applying them consistently to incoming payloads. Structured validation outputs → workflow actions: producing InvalidOrderIds and a filtered CSV payload that directly drive notifications and SAP remediation. Two-model pattern: a specialized model for validation (agent) and a separate model call for analysis, with a clean handoff between the two. Output shaping for consumption: converting AI output into HTML for email and into the SAP response envelope (analysis/errors only). (Everything else—SAP plumbing, RFC wiring, and response/exception patterns—was covered in Part 1 and is assumed here.) Next, I’ll break down the agent loop itself—the tool sequence, the required output fields, and the exact points where the workflow turns AI output into variables, emails, and SAP actions. Huge thanks to KentWeareMSFT for helping me understand agent loops and design the validation agent structure. And thanks to everyone in 🤖 Agent Loop Demos 🤖 | Microsoft Community Hub for making such great material available. Note: For the full set of assets used here, see the companion GitHub repository (workflows, schemas, SAP ABAP code, and sample files). 2. Validation Agent Loop In this solution, the Data Validation Agent runs inside the destination workflow after the inbound SAP payload has been normalized into a single CSV string. The agent is invoked as a single Logic Apps Agent action, configured with an Azure OpenAI deployment and a short set of instructions. Its inputs are deliberately simple at this stage: the CSV payload (the dataset to validate), and the ValidationRules reference (where the rule document lives), shown in the instructions as a parameter token. The figure below shows the validation agent configuration used in the destination workflow. The top half is the Agent action configuration (model + instructions), and the bottom half shows the toolset that the agent is allowed to use. The key design choice is that the agent is not “free-form chat”: it’s constrained by a small number of tools and a workflow-friendly output contract. What matters most in this configuration is the separation between instructions and tools. The instructions tell the agent what to do (“follow business process steps 1–3”), while the tools define how the agent can interact with external systems and workflow state. This keeps the agent modular: you can change rules in SharePoint or refine summarization expectations without rewriting the overall SAP integration mechanics. Purpose This agent’s job is narrowly scoped: validate the CSV payload from SAP against externally stored business rules and produce outputs that the workflow can use deterministically. In other words, it turns “validation as reasoning” into workflow artifacts (summary + invalid IDs + invalid payload), instead of leaving validation as unstructured prose. In Azure Logic Apps terms, this is an agent loop: an iterative process where an LLM follows instructions and selects from available tools to complete a multi-step task. Logic Apps agent workflows explicitly support this “agent chooses tools to complete tasks” model (see Agent Workflows Concepts). Tools In Logic Apps agent workflows, a tool is a named sequence that contains one or more actions the agent can invoke to accomplish part of its task (see Agent Workflows Concepts). In the screenshot, the agent is configured with three tools, explicitly labeled Get validation rules, Get CSV payload, Summarize CSV payload review. These tool names match the business process in the “Instructions for agent” box (steps 1–3). The next sections of the post go deeper into what each tool does internally; at this level, the important point is simply that the agent is constrained to a small, explicit toolset. Agent execution The screenshot shows the agent configured with: AI model: gpt-5-3 (gpt-5) A connection line: “Connected to … (Azure OpenAI)” Instructions for agent that define the agent’s role and a 3-step business process: Get validation rules (via the ValidationRules reference) Get CSV payload Summarize the CSV payload review, using the validation document This pattern is intentional: The instructions provide the agent’s “operating procedure” in plain language. The tools give the agent: controlled ways to fetch the rule document, access the CSV input, and return structured results. Because the workflow consumes the agent’s outputs downstream, the instruction text is effectively part of your workflow contract (it must remain stable enough that later actions can trust the output shape). Note: If a reader wants to recreate this pattern, the fastest path is: Start with the official overview of agent workflows (Workflows with AI Agents and Models - Azure Logic Apps). Follow a hands-on walkthrough for building an agent workflow and connecting it to an Azure OpenAI deployment (Logic Apps Labs is a good step-by-step reference). [ azure.github.io ] Use the Azure OpenAI connector reference to understand authentication options and operations available in Logic Apps Standard (see Built-in OpenAI Connector) If you’re using Foundry for resource management, review how Foundry connections are created and used, especially when multiple resources/tools are involved (see How to connect to AI foundry). 2.1 Tool 1: Get validation rules The first tool in the validation agent loop is Get validation rules. Its job is to load the business validation rules that will be applied to the incoming CSV payload from SAP. I keep these rules outside the workflow (in a document) so they can be updated without redeploying the Logic App. In this example, the rules are stored in SharePoint, and the tool simply retrieves the document content at runtime. Get validation rules is implemented as a single action called Get validation document. In the designer, you can see it: uses a SharePoint Online connection (SharePoint icon and connector action) calls GetFileContentByPath (shown by the “File Path” input) reads the rule file from the configured Site Address uses the workflow parameter token ValidationRules for the File Path (so the exact rule file location is configurable per environment) The output of this tool is the raw rule document content, which the Data Validation Agent uses in the next steps to validate the CSV payload. The bottom half of the figure shows an excerpt of the rules document. The format is simple and intentionally human-editable: each rule is expressed as FieldName : condition. For example, the visible rules include: PaymentMethod : value must exist PaymentMethod : value cannot be “Cash” OrderStatus : value must be different from “Cancelled” CouponCode : value must have at least 1 character OrderID : value must be unique in the CSV array A scope note: “Do not validate the Date field.” These rules are the “source of truth” for validation. The workflow does not hardcode them into expressions; instead, it retrieves them from SharePoint and passes them into the agent loop so the validation logic remains configurable and auditable (you can always point to the exact rule document used for a given run). A small but intentional rule in the document is “Do not validate the Date field.” That line is there for a practical reason: in an early version of the source workflow, the date column was being corrupted during CSV generation. The validation agent still tried to validate dates (even though date validation wasn’t part of the original intent), and the result was predictable: every row failed validation, leaving nothing to analyze. The upstream issue is fixed now, but I kept this rule in the demo to illustrate an important point: validation is only useful when it’s aligned with the data contract you can actually guarantee at that point in the pipeline. Note: The rules shown here assume the CSV includes a header row (field names in the first line) so the agent can interpret each column by name. If you want the agent to be schema‑agnostic, you can extend the rules with an explicit column mapping, for example: Column 1: Order ID Column 2: Date Column 3: Customer ID … This makes the contract explicit even when headers are missing or unreliable. With the rules loaded, the next tool provides the second input the agent needs: the CSV payload that will be validated against this document. 2.2 Tool 2: Get CSV payload The second tool in the validation agent loop is Get CSV payload. Its purpose is to make the dataset-to-validate explicit: it defines exactly what the agent should treat as “the CSV payload,” rather than relying on implicit workflow context. In this workflow, the CSV is already constructed earlier (as Create_CSV_payload ), and this tool acts as the narrow bridge between that prepared string and the agent’s validation step. Figure: Tool #2 (“Get CSV payload”) defines a single agent parameter and binds it to the workflow’s generated CSV. The figure shows two important pieces: - The tool parameter contract (“Agent Parameters”) On the right, the tool defines an agent parameter named CSV Payload with type String, and the description (highlighted in yellow) makes the intent explicit: “The CSV payload received from SAP and that we validate based on the validation rules.” This parameter is the tool’s interface: it documents what the agent is supposed to provide/consume when using this tool, and it anchors the rest of the validation process to a single, well-defined input. Tools in Logic Apps agent workflows exist specifically to constrain and structure what an agent can do and what data it operates on (see Agent Workflows Concepts). - Why there is an explicit Compose action (“CSV payload”) In the lower-right “Code view,” the tool’s internal action is shown as a standard Compose: { "type": "Compose", "inputs": "@outputs('Create_CSV_payload')" } This is intentional. Even though the CSV already exists in the workflow, the tool still needs a concrete action that produces the value it returns to the agent. The Compose step: pins the tool output to a single source of truth ( Create_CSV_payload ), and creates a stable boundary: “this is the exact CSV string the agent validates,” independent of other workflow state. Put simply: the Compose action isn’t there because Logic Apps can’t access the CSV—it’s there to make the agent/tool interface explicit, repeatable, and easy to troubleshoot. What “tool parameters” are (in practical terms) In Logic Apps agent workflows, a tool is a named sequence of one or more actions that the agent can invoke while executing its instructions. A tool parameter is the tool’s input/output contract exposed to the agent. In this screenshot, that contract is defined under Agent Parameters, where you specify: Name: CSV Payload Type: String Description: “The CSV payload received from SAP…” This matters because it clarifies (for both the model and the human reader) what the tool represents and what data it is responsible for supplying. With Tool #1 providing the rules document and Tool #2 providing the CSV dataset, Tool #3 is where the agent produces workflow-ready outputs (summary + invalid IDs + filtered payload) that the downstream steps can act on. 2.3 Tool 3: Summarize CSV payload review The third tool, Summarize CSV payload review, is where the agent stops being “an evaluator” and becomes a producer of workflow-ready outputs. It does most of the heavy lifting so let's go into the details. Instead of returning one blob of prose, the tool defines three explicit agent parameters—each with a specific format and purpose—so the workflow can reliably consume the results in downstream actions. In Logic Apps agent workflows, tools are explicitly defined tasks the agent can invoke, and each tool can be structured around actions and schemas that keep the loop predictable (see Agent Workflows Concepts). Figure: Tool #3 (“Summarize CSV payload review”) defines three structured agent outputs Description is not just documentation—it’s the contract the model is expected to satisfy, and it strongly shapes what the agent considers “relevant” when generating outputs. The parameters are: Validation summary (String) Goal: a human-readable summary that can be dropped straight into email. In the screenshot, the description is very explicit about shape and content: “expected format is an HTML table” “create a list of all orderids that have failed” “create a CSV document… only for the orderid values that failed… each row on a separate line” “include title row only in the email body” This parameter is designed for presentation: it’s the thing you want humans to read first. InvalidOrderIds (String, CSV format) Goal: a machine-friendly list of identifiers the workflow can use deterministically. The key part of the description (highlighted in the image) is: “The format is CSV.” That single sentence is doing a lot of work: it tells the model to emit a comma-separated list, which you then convert into an array in the workflow using split(...). Invalid CSV payload (String, one row per line) Goal: the failed rows extracted from the original dataset, in a form that downstream steps can reuse. The description constrains the output tightly: “original CSV rows… for the orderid values that failed validation” “each row must be on a separate line” “keep the title row only for the email body and remove it otherwise” This parameter is designed for automation: it becomes input to remediation steps (like transforming rows to XML and creating IDocs), not just a report. What “agent parameters” do here (and why they matter) A useful way to think about agent parameters is: they are the “typed return values” of a tool. Tools in agent workflows exist to structure work into bounded tasks the agent can perform, and a schema/parameter contract makes the results consumable by the rest of the workflow (see Agent Workflows Concepts). In this tool, the parameters serve two purposes at once: They guide the agent toward salient outputs. The descriptions explicitly name what matters: “failed orderids,” “HTML table,” “CSV format,” “one row per line,” “header row rules.” That phrasing makes it much harder for the model to “wander” into irrelevant commentary. They align with how the workflow will parse and use the results. By stating “ InvalidOrderIds is CSV,” you make it trivially parseable (split), and by stating “Invalid CSV payload is one row per line,” you make it easy to feed into later transformations. Why the wording works (and what wording tends to work best) What’s interesting about the parameter descriptions is that they combine three kinds of constraints: Output format constraints (make parsing deterministic) “expected format is an HTML table” “The format is CSV.” “each row must be on a separate line” These format cues help the agent decide what to emit and help you avoid brittle parsing later. Output selection constraints (force relevance) “only for the orderid values that failed validation” “Create a list of all orderids that have failed” This tells the agent what to keep and what to ignore. Output operational constraints (tie outputs to downstream actions) “Include title row only in the email body” “remove it otherwise” This explicitly anticipates downstream usage (email vs remediation), which is exactly the kind of detail models often miss unless you state it. Rule of thumb: wording works best when it describes what to produce, in what format, with what filtering rules, and why the workflow needs it. How these parameters tie directly to the downstream actions The next picture makes the design intent very clear: each parameter is immediately “bound” to a normal workflow value via Compose actions and then used by later steps. This is the pattern we want: agent output → Compose → (optional) normalization → reused by deterministic workflow actions. It’s the opposite of “read the model output and hope.” This is the reusable pattern: Decide the minimal set of outputs the workflow needs. Specify formats that are easy to parse. Write parameter descriptions that encode both selection and formatting constraints. Immediately bind outputs to workflow variables via Compose/ SetVariable actions. The main takeaway from this tool is that the agent is being forced into a structured contract: three outputs with explicit formats and clear intent. That contract is what makes the rest of the workflow deterministic—Compose actions can safely read @agentParameters(...), the workflow can safely split(...) the invalid IDs, and downstream actions can treat the “invalid payload” as real data rather than narrative. I'll show later how this same “parameter-first” design scales to other agent tools. 2.4 Turning agent outputs into a verification email Once the agent has produced structured outputs (Validation summary, InvalidOrderIds , and Invalid CSV payload), the next goal is to make those outputs operational: humans need a quick summary of what failed, and the workflow needs machine‑friendly values it can reuse downstream. The design here is intentionally straightforward: the workflow converts each agent parameter into a first‑class workflow output (via Compose actions and one variable assignment), then binds those values directly into the Office 365 email body. The result is an email that is both readable and actionable—without anyone needing to open run history. The figure below shows how the outputs of Summarize CSV payload review are mapped into the verification email. On the left, the tool produces three values via subsequent actions (Summary, Invalid order ids, and Invalid CSV payload), and the workflow also normalizes the invalid IDs into an array (Save invalid order ids). On the right, the Send verification summary action composes the email body using those same values as dynamic content tokens. Figure: Mapping agent outputs to the verification email The important point is that the email is not constructed by “re-prompting” or “re-summarizing.” It is assembled from already-structured outputs. This mapping is intentionally direct: each piece of the email corresponds to one explicit output from the agent tool. The workflow doesn’t interpret or transform the summary beyond basic formatting—its job is to preserve the agent’s structured outputs and present them consistently. The only normalization step happens for InvalidOrderIds , where the workflow also converts the CSV string into an array ( ArrayOfInvalidOrderIDs ) for later filtering and analysis steps. The next figure shows a sample verification email produced by this pipeline. It illustrates the three-part structure: an HTML validation summary table, the raw invalid order ID list, and the extracted invalid CSV rows: Figure: Sample verification email — validation summary table + invalid order IDs + invalid CSV rows. The extracted artifacts InvalidOrderIds and Invalid CSV payload are used in the downstream actions that persist failed rows as IDocs for later processing, which were presented in Part 1. I will get back to this later to talk about reusing the validation agent. Next however I will go over the data analysis part of the AI integration. 3. Analysis Phase: from validated dataset to HTML output After the validation agent loop finishes, the workflow enters a second AI phase: analysis. The validation phase is deliberately about correctness (what to exclude and why). The analysis phase is about insight, and it runs on the remaining dataset after invalid rows are filtered out. At a high level, this phase has three steps: Call Azure OpenAI to analyze the CSV dataset while explicitly excluding invalid OrderIDs . Extract the model’s text output from the OpenAI response object. Convert the model’s markdown output into HTML so it renders cleanly in email (and in the SAP response envelope). 3.1 OpenAI component: the “Analyze data” call The figure below shows the Analyze data action that drives the analysis phase. This action is executed after the Data Validation Agent completes, and it uses three messages: a system instruction that defines the task, the CSV dataset as input, and a second user message that enumerates the OrderIDs to exclude (the invalid IDs produced by validation). Figure: Azure OpenAI analysis call. The analysis call is structured as: system: define the task and constraints user: provide the dataset user: provide exclusions derived from validation system: Analyze dataset; provide trends/predictions; exclude specified orderids. user: <csv payload=""> user: Excluded orderids: <comma-separated ids="" invalid=""></comma-separated></csv> Two design choices are doing most of the work here: The model is given the dataset and the exclusions separately. This avoids ambiguity: the dataset is one message, and the “do not include these OrderIDs ” constraint is another. The exclusion list is derived from validation output, not re-discovered during analysis. The analysis step doesn’t re-validate; it consumes the validation phase’s results and focuses purely on trends/predictions. 3.2 Processing the response The next figure shows how the workflow turns the Azure OpenAI response into a single string that can be reused for email and for the SAP response. The workflow does three things in sequence: it parses the response JSON, extracts the model’s text content, and then passes that text into an HTML formatter. Figure: Processing the OpenAI response. This is the only part of the OpenAI response you need to understand for this workflow: Analyze_data response └─ choices[] (array) └─ [0] (object) └─ message (object) └─ content (string) <-- analysis text Everything else in the OpenAI response (filters, indexes, metadata) is useful for auditing but not required to build the final user-facing output. 3.3 Crafting the output to HTML The model’s output is plain text and often includes lightweight markdown structures (headings, lists, separators). To make the analysis readable in email (and safe to embed in the SAP response envelope), the workflow converts the markdown into HTML. The script was generated with copilot. Source code snippet may be found in Part 1. The next figure shows what the formatted analysis looks like when rendered. Not the explicit reference to the excluded OrderIDs and summary of the remaining dataset before listing trend observations. Figure: Example analysis output after formatting. 4. Closing the loop: persisting invalid rows as IDocs In Part 1, I introduced an optional remediation branch: when validation finds bad rows, the workflow can persist them into SAP as custom IDocs for later handling. In Part 2, after unpacking the agent loop, I want to reconnect those pieces and show the “end of the story”: the destination workflow creates IDocs for invalid data, and a second destination workflow receives those IDocs and produces a consolidated audit trail in Blob Storage. This final section is intentionally pragmatic. It shows: where the IDoc creation call happens, how the created IDocs arrive downstream, and how to safely handle many concurrent workflow instances writing to the same storage artifact (one instance per IDoc). 4.1 From “verification summary” to “Create all IDocs” The figure below shows the tail end of the verification summary flow. Once the agent produces the structured validation outputs, the workflow first emails the human-readable summary, then converts the invalid CSV rows into an SAP-friendly XML shape, and finally calls the RFC that creates IDocs from those rows. Figure: End of the validation/remediation branch. This is deliberately a “handoff point.” After this step, the invalid rows are no longer just text in an email—they become durable SAP artifacts (IDocs) that can be routed, retried, and processed independently of the original workflow run. 4.2 Z_CREATE_ONLINEORDER_IDOC and the downstream receiver The next figure is the same overview from Part 1. I’m reusing it here because it captures the full loop: the workflow calls Z_CREATE_ONLINEORDER_IDOC , SAP converts the invalid rows into custom IDocs, and Destination workflow #2 receives those IDocs asynchronously (one workflow run per IDoc). Figure 2: Invalid rows persisted as custom IDocs. This pattern is intentionally modular: Destination workflow #1 decides which rows are invalid and optionally persists them. SAP encapsulates the IDoc creation mechanics behind a stable RFC ( Z_CREATE_ONLINEORDER_IDOC ). Destination workflow #2 processes each incoming IDoc independently, which matches how IDoc-driven integrations typically behave in production. 4.3 Two phases in Destination workflow #2: AI agent + Blob Storage logging In the receiver workflow, there are two distinct phases: AI agent phase (per-IDoc): reconstruct a CSV view from the incoming IDoc payload and (optionally) run the same validation logic. Blob storage phase (shared output): append a normalized “verification line” into a shared blob in a concurrency-safe way. It’s worth calling out: in this demo, the IDocs being received were created from already-validated outputs upstream, so you could argue the second validation is redundant. I keep it anyway for two reasons: it demonstrates that the agent tooling is reusable with minimal changes, and in a general integration, Destination workflow #2 may receive IDocs from multiple sources, not only from this pipeline—so “validate on receipt” can still be valuable. 4.3.1 AI agent phase The figure below shows the validation agent used in Destination workflow #2. The key difference from the earlier agent loop is the output format: instead of producing an HTML summary + invalid lists, this agent writes a single “audit line” that includes the IDoc correlation key ( DOCNUM ) along with the order ID and the failed rules. Figure: Destination workflow #2 agent configuration. The reusable part here is the tooling structure: rules still come from the same validation document, the dataset is still supplied as CSV, and the summarization tool outputs a structured value the workflow can consume deterministically. The only meaningful change is “what shape do I want the output to take,” which is exactly what the agent parameter descriptions control. The next figure zooms in on the summarization tool parameter in Destination workflow #2. Instead of three outputs, this tool uses a single parameter ( VerificationInfo ) whose description forces a consistent line format anchored on DOCNUM . Figure 4: VerificationInfo parameter. This is the same design principle as Tool #3 in the first destination workflow: describe the output as a contract, not as a vague request. The parameter description tells the agent exactly what must be present ( DOCNUM + OrderId + failed rules) and therefore makes it straightforward to append the output to a shared log without additional parsing. Interesting snippets Extracting DOCNUM from the IDoc control record and carry it through the run: xpath(xml(triggerBody()?['content']), 'string(/*[local-name()="Receive"] /*[local-name()="idocData"] /*[local-name()="EDI_DC40"] /*[local-name()="DOCNUM"])') 4.3.2 Blob Storage phase Destination workflow #2 runs one instance per inbound IDoc. That means multiple runs can execute at the same time, all trying to write to the same “ ValidationErrorsYYYYMMDD.txt ” artifact. The figure below shows the resulting appended output: one line per IDoc, each line beginning with DOCNUM , which becomes the stable correlation key. Destination workflow #2 runs one instance per inbound IDoc, so multiple instances can attempt to write to the same daily “validation errors” append blob at the same time. The figure below shows the concurrency control pattern I used to make those writes safe: a short lease acquisition loop that retries until it owns the blob lease, then appends the verification line(s), and finally releases the lease. Figure: Concurrency-safe append pattern. Reading the diagram top‑to‑bottom, the workflow uses a simple lease → append → release pattern to make concurrent writes safe. Each instance waits briefly (Delay), attempts to acquire a blob lease (Acquire validation errors blob lease), and loops until it succeeds (Set status code → Until lease is acquired). Once a lease is obtained, the workflow stores the lease ID (Save lease id), appends its verification output under that lease (Append verification results), and then releases the lease (Release the lease) so the next workflow instance can write. Implementation note: the complete configuration for this concurrency pattern (including the HTTP actions, headers, retries, and loop conditions) is included in the attached artifacts, in the workflow JSON for Destination workflow #2. 5. Concluding remarks Part 2 zoomed in on the AI boundary inside the destination workflows and made it concrete: what the agent sees, what it is allowed to do, what it must return, and how those outputs drive deterministic workflow actions. The practical outcomes of Part 2 are: A tool-driven validation agent that produces workflow artifacts, not prose. The validation loop is constrained by tools and parameter schemas so its outputs are immediately consumable: an email-friendly validation summary, a machine-friendly InvalidOrderIds list, and an invalid-row payload that can be remediated. A clean separation between validation and analysis. Validation decides what not to trust (invalid IDs / rows) and analysis focuses on what is interesting in the remaining dataset. The analysis prompt makes the exclusion rule explicit by passing the dataset and excluded IDs as separate messages. A repeatable response-processing pipeline. You extract the model’s text from a stable response path ( choices[0].message.content ), then shape it into HTML once (markdown → HTML) so the same formatted output can be reused for email and the SAP response envelope. A “reuse with minimal changes” pattern across workflows. Destination workflow #2 shows the same agent principles applied to IDoc reception, but with a different output contract optimized for logging: DOCNUM + OrderId + FailedRules . This demonstrates that the real reusable asset is the tool + parameter contract design. Putting It All Together We have a full integration story where SAP, Logic Apps, AI, and IDocs are connected with explicit contracts and predictable behavior: Part 1 established the deterministic integration foundation. SAP ↔ Logic Apps connectivity (gateway/program wiring) RFC payload/response contracts ( IT_CSV , response envelope, error semantics) predictable exception propagation back into SAP an optional remediation branch that persists invalid rows as IDocs via a custom RFC ( Z_CREATE_ONLINEORDER_IDOC ) and the end-to-end response handling pattern in the caller workflow. Part 2 layered AI on top without destabilizing the contracts. Agent loop + tools for rule retrieval and validation output schemas that convert “reasoning” into workflow artifacts a separate analysis step that consumes validated data and produces formatted results and an asynchronous IDoc receiver that logs outcomes safely under concurrency. The reason it works as a two-part series is that the two layers evolve at different speeds: The integration layer (Part 1) should change slowly. It defines interoperability: payload shapes, RFC names, error contracts, and IDoc interfaces. The AI layer (Part 2) is expected to iterate. Prompts, rule documents, output formatting, and agent tool design will evolve as you tune behavior and edge cases. References Logic Apps Agentic Workflows with SAP - Part 1: Infrastructure 🤖 Agent Loop Demos 🤖 | Microsoft Community Hub Agent Workflows Concepts Workflows with AI Agents and Models - Azure Logic Apps Built-in OpenAI Connector How to connect to AI foundry Create Autonomous AI Agent Workflows - Azure Logic Apps Handling Errors in SAP BAPI Transactions Access SAP from workflows Create common SAP workflows Generate Schemas for SAP Artifacts via Workflows Exception Handling | ABAP Keyword Documentation Handling and Propagating Exceptions - ABAP Keyword Documentation SAP .NET Connector 3.1 Overview SAP .NET Connector 3.1 Programming Guide Connect to Azure AI services from Workflows All supporting content for this post may be found in the companion GitHub repository.Logic Apps Agentic Workflows with SAP - Part 1: Infrastructure
When you integrate Azure Logic Apps with SAP, the “hello world” part is usually easy. The part that bites you later is data quality. In SAP-heavy flows, validation isn’t a nice-to-have — it’s what makes the downstream results meaningful. If invalid data slips through, it can get expensive fast: you may create incorrect business documents, trigger follow-up processes, and end up in a cleanup path that’s harder (and more manual) than building validation upfront. And in “all-or-nothing” transactional patterns, things get even more interesting: one bad record can force a rollback strategy, compensating actions, or a whole replay/reconciliation story you didn’t want to own. See for instance Handling Errors in SAP BAPI Transactions | Microsoft Community Hub to get an idea of the complexity in a BizTalk context. That’s the motivation for this post: a practical starter pattern that you can adapt to many data shapes and domains for validating data in a Logic Apps + SAP integration. Note: For the full set of assets used here, see the companion GitHub repository (workflows, schemas, SAP ABAP code, and sample files). 1. Introduction Scenario overview The scenario is intentionally simple, but it mirrors what shows up in real systems: A Logic App workflow sends CSV documents to an SAP endpoint. SAP forwards the payload to a second Logic App workflow that performs: rule-based validation (based on pre-defined rules) analysis/enrichment (market trends, predictions, recommendations) The workflow either: returns validated results (or validation errors) to the initiating workflow, or persists outputs for later use For illustration, I’m using fictitious retail data. The content is made up, but the mechanics are generic: the same approach works for orders, inventory, pricing, master data feeds, or any “file in → decision out” integration. You’ll see sample inputs and outputs below to keep the transformations concrete. What this post covers This walkthrough focuses on the integration building blocks that tend to matter in production: Calling SAP RFCs from Logic App workflows, and invoking Logic App workflows from SAP function modules Using the Logic Apps SAP built-in trigger Receiving and processing IDocs Returning responses and exceptions back to SAP in a structured, actionable way Data manipulation patterns in Logic Apps, including: parsing and formatting inline scripts XPath (where it fits, and where it becomes painful). Overall Implementation A high-level view of the implementation is shown below. The source workflow handles end-to-end ingestion—file intake, transformation, SAP integration, error handling, and notifications—using Azure Logic Apps. The destination workflows focus on validation and downstream processing, including AI-assisted analysis and reporting, with robust exception handling across multiple technologies. I’ll cover the AI portion in a follow-up post. Note on AI-assisted development Most of the workflow “glue” in this post—XPath, JavaScript snippets, and Logic Apps expressions—was built with help from Copilot and the AI assistant in the designer (see Get AI-assisted help for Standard workflows - Azure Logic Apps | Microsoft Learn). In my experience, this is exactly where AI assistance pays off: generating correct scaffolding quickly, then iterating based on runtime behavior. I’ve also included SAP ABAP snippets for the SAP-side counterpart. You don’t need advanced ABAP knowledge to follow along; the snippets are deliberately narrow and integration-focused. I include them because it’s hard to design robust integrations if you only understand one side of the contract. When you understand how SAP expects to receive data, how it signals errors, and where transactional boundaries actually are, you end up with cleaner workflows and fewer surprises. 2. Source Workflow This workflow is a small, end‑to‑end “sender” pipeline: it reads a CSV file from Azure Blob Storage, converts the rows into the SAP table‑of‑lines XML shape expected by an RFC, calls Z_GET_ORDERS_ANALYSIS via the SAP connector, then extracts analysis or error details from the RFC response and emails a single consolidated result. At a high level: Input: an HTTP request (used to kick off the run) + a blob name. Processing: CSV → array of rows → XML (…) → RFC call Output: one email containing either: the analysis (success path), or a composed error summary (failure path). The diagram below summarizes the sender pipeline: HTTP trigger → Blob CSV (header included) → rows → SAP RFC → parse response → email. Two design choices are doing most of the work here. First, the workflow keeps the CSV transport contract stable by sending the file as a verbatim list of lines—including the header—wrapped into … elements under IT_CSV . Second, it treats the RFC response as the source of truth: EXCEPTIONMSG and RETURN/MESSAGE drive a single Has errors gate, which determines whether the email contains the analysis or a consolidated failure summary. Step-by-step description Phase 0 — Trigger Trigger — When_an_HTTP_request_is_received The workflow is invoked via an HTTP request trigger (stateful workflow). Phase 1 — Load and split the CSV Read file — Read_CSV_orders_from_blob Reads the CSV from container onlinestoreorders using the blob name from @parameters('DataFileName'). Split into rows — Extract_rows Splits the blob content on \r\n, producing an array of CSV lines. Design note: Keeping the header row is useful when downstream validation or analysis wants column names, and it avoids implicit assumptions in the sender workflow. Phase 2 — Shape the RFC payload Convert CSV rows to SAP XML — Transform_CSV_to_XML Uses JavaScript to wrap each CSV row (including the header line) into the SAP line structure and XML‑escape special characters. The output is an XML fragment representing a table of ZTY_CSV_LINE rows. Phase 3 — Call SAP and extract response fields Call the RFC — [RFC]_ Z_GET_ORDERS_ANALYSIS Invokes Z_GET_ORDERS_ANALYSIS with an XML body containing … built from the transformed rows. Extract error/status — Save_EXCEPTION_message and Save_RETURN_message Uses XPath to pull: EXCEPTIONMSG from the RFC response, and the structured RETURN / MESSAGE field. Phase 4 — Decide success vs failure and notify Initialize output buffer — Initialize_email_body Creates the EmailBody variable used by both success and failure cases. Gate — Has_errors Determines whether to treat the run as failed based on: EXCEPTIONMSG being different from "ok", or RETURN / MESSAGE being non‑empty. Send result — Send_an_email_(V2) Emails either: the extracted ANALYSIS (success), or a concatenated error summary including RETURN / MESSAGE plus message details ( MESSAGE_V1 … MESSAGE_V4 ) and EXCEPTIONMSG . Note: Because the header row is included in IT_CSV , the SAP-side parsing/validation treats the first line as column titles (or simply ignores it). The sender workflow stays “schema-agnostic” by design. Useful snippets Snippet 1 — Split the CSV into rows split(string(body('Read_CSV_orders_from_blob')?['content']), '\r\n') Tip: If your CSV has a header row you don’t want to send to SAP, switch back to: @skip(split(string(body('Read_CSV_orders_from_blob')?['content']), '\r\n'), 1) Snippet 2 — JavaScript transform: “rows → SAP table‑of‑lines XML” const lines = workflowContext.actions.Extract_rows.outputs; function xmlEscape(value) { return String(value) .replace(/&/g, "&") .replace(//g, ">") .replace(/"/g, """) .replace(/'/g, "'"); } // NOTE: we don't want to keep empty lines (which can be produced by reading the blobs) // the reason being that if the recipient uses a schema to validate the xml, // it may reject it if it does not allow empty nodes. const xml = lines .filter(line => line && line.trim() !== '') // keep only non-empty lines .map(line => `<zty_csv_line><line>${xmlEscape(line)}</line></zty_csv_line>`) .join(''); return { xml }; Snippet 3 — XPath extraction of response fields (namespace-robust) EXCEPTIONMSG: @xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string( /*[local-name()="Z_GET_ORDERS_ANALYSISResponse"] /*[local-name()="EXCEPTIONMSG"])') RETURN/MESSAGE: @xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string( /*[local-name()="Z_GET_ORDERS_ANALYSISResponse"] /*[local-name()="RETURN"] /*[local-name()="MESSAGE"])') Snippet 4 — Failure email body composition concat( 'Error message: ', outputs('Save_RETURN_message'), ', details: ', xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string(//*[local-name()=\"MESSAGE_V1\"])'), xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string(//*[local-name()=\"MESSAGE_V2\"])'), xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string(//*[local-name()=\"MESSAGE_V3\"])'), xpath(body('[RFC]_Call_Z_GET_ORDERS_ANALYSIS')?['content'], 'string(//*[local-name()=\"MESSAGE_V4\"])'), '; ', 'Exception message: ', outputs('Save_EXCEPTION_message'), '.') 3. SAP Support To make the SAP/Logic Apps boundary simple, I model the incoming CSV as a table of “raw lines” on the SAP side. The function module Z_GET_ORDERS_ANALYSIS exposes a single table parameter, IT_CSV , typed using a custom line structure. Figure: IT_CSV is a table of CSV lines ( ZTY_CSV_LINE ), with a single LINE field ( CHAR2048 ). IT_CSV uses the custom structure ZTY_CSV_LINE , which contains a single component LINE ( CHAR2048 ). This keeps the SAP interface stable: the workflow can send CSV lines without SAP having to know the schema up front, and the parsing/validation logic can evolve independently. The diagram below shows the plumbing that connects SAP to Azure Logic Apps in two common patterns: SAP sending IDocs to a workflow and SAP calling a remote-enabled endpoint via an RFC destination. I’m showing all three pieces together—the ABAP call site, the SM59 RFC destination, and the Logic Apps SAP built-in trigger—because most “it doesn’t work” problems come down to a small set of mismatched configuration values rather than workflow logic. The key takeaway is that both patterns hinge on the same contract: Program ID plus the SAP Gateway host/service. In SAP, those live in SM59 (TCP/IP destination, registered server program). In Logic Apps, the SAP built-in trigger listens using the same Program ID and gateway settings, while the trigger configuration (for example, IDoc format and degree of parallelism) controls how messages are interpreted and processed. Once these values line up, the rest of the implementation becomes “normal workflow engineering”: validation, predictable error propagation, and response shaping. Before diving into workflow internals, I make the SAP-side contract explicit. The function module interface below shows the integration boundary: CSV lines come in as IT_CSV , results come back as ANALYSIS , and status/error information is surfaced both as a human-readable EXCEPTIONMSG and as a structured RETURN ( BAPIRET2 ). I also use a dedicated exception ( SENDEXCEPTIONTOSAPSERVER ) to signal workflow-raised failures cleanly. Contract (what goes over RFC): Input: IT_CSV (CSV lines) Outputs: ANALYSIS (analysis payload), EXCEPTIONMSG (human-readable status) Return structure: RETURN ( BAPIRET2 ) for structured SAP-style success/error Custom exception: SENDEXCEPTIONTOSAPSERVER for workflow-raised failures Here is the ABAP wrapper that calls the remote implementation and normalizes the result. FUNCTION z_get_orders_analysis. *"---------------------------------------------------------------------- *" This module acts as a caller wrapper. *" Important: the remote execution is determined by DESTINATION. *" Even though the function name is the same, this is not recursion: *" the call runs in the remote RFC server registered under DESTINATION "DEST". *"---------------------------------------------------------------------- *" Contract: *" TABLES it_csv "CSV lines *" IMPORTING analysis "Result payload *" EXPORTING exceptionmsg "Human-readable status / error *" CHANGING return "BAPIRET2 return structure *" EXCEPTIONS sendexceptiontosapserver *"---------------------------------------------------------------------- CALL FUNCTION 'Z_GET_ORDERS_ANALYSIS' DESTINATION dest IMPORTING analysis = analysis TABLES it_csv = it_csv CHANGING return = return EXCEPTIONS sendexceptiontosapserver = 1 system_failure = 2 MESSAGE exceptionmsg communication_failure = 3 MESSAGE exceptionmsg OTHERS = 4. CASE sy-subrc. WHEN 0. exceptionmsg = 'ok'. "Optional: normalize success into RETURN for callers that ignore EXCEPTIONMSG IF return-type IS INITIAL. return-type = 'S'. return-message = 'OK'. ENDIF. WHEN 1. exceptionmsg = |Exception from workflow: SENDEXCEPTIONTOSAPSERVER { sy-msgv1 }{ sy-msgv2 }{ sy-msgv3 }{ sy-msgv4 }|. return-type = 'E'. return-message = exceptionmsg. WHEN 2 OR 3. "system_failure / communication_failure usually already populate exceptionmsg IF exceptionmsg IS INITIAL. exceptionmsg = |RFC system/communication failure.|. ENDIF. return-type = 'E'. return-message = exceptionmsg. WHEN OTHERS. exceptionmsg = |Error in workflow: { sy-msgv1 }{ sy-msgv2 }{ sy-msgv3 }{ sy-msgv4 }|. return-type = 'E'. return-message = exceptionmsg. ENDCASE. ENDFUNCTION. The wrapper is intentionally small: it forwards the payload to the remote implementation via the RFC destination and then normalizes the outcome into a predictable shape. The point isn’t fancy ABAP — it’s reliability. With a stable contract ( IT_CSV, ANALYSIS, RETURN, EXCEPTIONMSG ) the Logic Apps side can evolve independently while SAP callers still get consistent success/error semantics. Important: in 'CALL FUNCTION 'Z_GET_ORDERS_ANALYSIS' DESTINATION dest' the name of the called function should be the same as the name of the ABAP wrapper function module, the reason being that the SAP built-in trigger in the logic app uses the function module signature as the contract (i.e. metadata). Z_GET_ORDERS_ANALYSIS . To sum up, the integration is intentionally shaped around three outputs: the raw input table ( IT_CSV ), a standardized SAP return structure ( RETURN / BAPIRET2 ), and a readable status string ( EXCEPTIONMSG ). The custom exception ( SENDEXCEPTIONTOSAPSERVER ) gives me a clean way to surface workflow failures back into SAP without burying them inside connector-specific error payloads. This is depicted in the figure below. 4. Destination Workflow The diagram below shows the destination workflow at a high level. I designed it as a staged pipeline: guard early, normalize input, validate, and then split the workload into two paths—operational handling of invalid records (notifications and optional IDoc remediation) and analysis of the validated dataset. Importantly, the SAP response is intentionally narrow: SAP receives only the final analysis (or a structured error), while validation details are delivered out-of-band via email. How to read this diagram Guardrail: Validate requested action ensures the workflow only handles the expected request. Normalize: Create CSV payload converts the inbound content into a consistent CSV representation. Validate: Data Validation Agent identifies invalid records (and produces a summary). Operational handling (invalid data): invalid rows are reported by email and may optionally be turned into IDocs (right-hand block). Analyze (valid data): Analyze data runs only on the validated dataset (invalid IDs excluded). Outputs: users receive Email analysis, while SAP receives only the analysis (or a structured error) via Respond to SAP server. Figure: Destination workflow with staged validation, optional IDoc remediation, and an SAP response . Reading the workflow top-to-bottom, the main design choice is separation of concerns. Validation is used to filter and operationalize bad records (notify humans, optionally create IDocs), while the SAP-facing response stays clean and predictable: SAP receives the final analysis for the validated dataset, or an error if the run can’t complete. This keeps the SAP contract stable even as validation rules and reporting details evolve. Step‑by‑step walkthrough Phase 0 — Entry and routing Trigger — When a message is received The workflow starts when an inbound SAP message is delivered to the Logic Apps SAP built‑in trigger. Guardrail — Validate requested action (2 cases) The workflow immediately checks whether the inbound request is the operation it expects (for example, the function/action name equals Z_GET_ORDERS_ANALYSIS ). If the action does not match: the workflow sends an exception back to SAP describing the unexpected action and terminates early (fail fast). If the action matches: processing continues. Phase 1 — Normalize input into a workflow‑friendly payload Prepare input — Create CSV payload The workflow extracts CSV lines from the inbound (XML) SAP payload and normalizes them into a consistent CSV text payload that downstream steps can process reliably. Initialize validation state — Initialize array of invalid order ids The workflow creates an empty array variable to capture order IDs that fail validation. This becomes the “validation output channel” used later for reporting, filtering, and optional remediation. Phase 2 — Validate the dataset (AI agent loop) Validate — Data Validation Agent (3 cases) This stage performs rule‑based validation using an agent pattern (backed by Azure OpenAI). Conceptually, it does three things (as shown in the diagram’s expanded block): Get validation rules: retrieves business rules from a SharePoint‑hosted validation document. Get CSV payload: loads the normalized CSV created earlier. Summarize CSV payload review: evaluates the CSV against the rules and produces structured validation outputs. Outputs produced by validation: A list of invalid order IDs The corresponding invalid CSV rows A human‑readable validation summary Note: The detailed AI prompt/agent mechanics are covered in Part 2. In Part 1, the focus is on the integration flow and how data moves. Phase 3 — Operational handling of invalid records (email + optional SAP remediation) After validation, the workflow treats invalid records as an operational concern: they are reported to humans and can optionally be routed into an SAP remediation path. This is shown in the right‑hand “Create IDocs” block. Notify — Send verification summary The workflow sends an email report (Office 365) to configured recipients containing: the validation summary the invalid order IDs the invalid CSV payload (or the subset of invalid rows) Transform — Transform CSV to XML The workflow converts the invalid CSV lines into an XML format that is suitable for SAP processing. Optional remediation — [RFC] Create all IDocs (conditional) If the workflow parameter (for example, CreateIDocs) is enabled, the workflow calls an SAP RFC (e.g., Z_CREATE_ONLINEORDER_IDOC ) to create IDocs from the transformed invalid data. Why this matters: Validation results are made visible (email) and optionally actionable (IDocs), without polluting the primary analysis response that SAP receives. Phase 4 — Analyze only the validated dataset (AI analysis) The workflow runs AI analysis on the validated dataset, explicitly excluding invalid order IDs discovered during the validation phase. The analysis prompt instructs the model to produce outputs such as trends, predictions, and recommendations. Note: The AI analysis prompt design and output shaping are covered in Part 2. Phase 5 — Post‑process the AI response and publish outputs Package results — Process analysis results (Scope) The workflow converts the AI response into a format suitable for email and for SAP consumption: Parse the OpenAI JSON response Extract the analysis content Convert markdown → HTML using custom JavaScript formatting Outputs Email analysis: sends the formatted analysis to recipients. Respond to SAP server: returns only the analysis (and errors) to SAP. Key design choice: SAP receives a clean, stable contract—analysis on success, structured error on failure. Validation details are handled out‑of‑band via email (and optionally via IDoc creation). Note: the analysis email sent by the destination workflow is there for testing purposes, to verify that the html content remains the same as it is sent back to the source workflow. Useful snippets Snippet 1 - Join each CSV line in the XML to make a CSV table: join( xpath( xml(triggerBody()?['content']), '/*[local-name()=\"Z_GET_ORDERS_ANALYSIS\"] /*[local-name()=\"IT_CSV\"] /*[local-name()=\"ZTY_CSV_LINE\"] /*[local-name()=\"LINE\"]/text()' ), '\r\n') Note: For the sake of simplicity, XPath is used here and throughout all places where XML is parsed. In the general case however, the Parse XML with schema action is the better and recommended way to strictly enforce the data contract between senders and receivers. More information about Parse XML with schema is provided in Appendix 1. Snippet 2 - Format markdown to html (simplified): const raw = workflowContext.actions.Extract_analysis.outputs; // Basic HTML escaping for safety (keeps <code> blocks clean) const escapeHtml = s => s.replace(/[&<>"]/g, c => ({'&':'&','<':'<','>':'>','"':'"'}[c])); // Normalize line endings let md = raw; // raw.replace(/\r\n/g, '\n').trim(); // Convert code blocks (``` ... ```) md = md.replace(/```([\s\S]*?)```/g, (m, p1) => `<pre><code>${escapeHtml(p1)}</code></pre>`); // Horizontal rules --- or *** md = md.replace(/(?:^|\n)---+(?:\n|$)/g, '<hr/>'); // Headings ###### to # for (let i = 6; i >= 1; i--) { const re = new RegExp(`(?:^|\\n)${'#'.repeat(i)}\\s+(.+?)\\s*(?=\\n|$)`, 'g'); md = md.replace(re, (m, p1) => `<h${i}>${p1.trim()}</h${i}>`); } // Bold and italic md = md.replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>'); md = md.replace(/\*([^*]+)\*/g, '<em>$1</em>'); // Unordered lists (lines starting with -, *, +) md = md.replace(/(?:^|\n)([-*+]\s.+(?:\n[-*+]\s.+)*)/g, (m) => { const items = m.trim().split(/\n/).map(l => l.replace(/^[-*+]\s+/, '').trim()); return '\n<ul>' + items.map(i => `<li>${i}</li>`).join('\n') + '</ul>'; }); // Paragraphs: wrap remaining text blocks in <p>...</p> const blocks = md.split(/\n{2,}/).map(b => { if (/^<h\d>|^<ul>|^<pre>|^<hr\/>/.test(b.trim())) return b; return `<p>${b.replace(/\n/g, '<br/>')}</p>`; }); const html = blocks.join(''); return { html }; 5. Exception Handling To illustrate exception handling, we supposed that multiple workflows may listen to the same program id (by design or unexpectedly) and could therefore receive messages that were meant for others. So the first thing that happens is validate that the function name is as expected. It is shown below. In this section I show three practical ways to surface workflow failures back to SAP using the Logic Apps action “Send exception to SAP server”, and the corresponding ABAP patterns used to handle them. The core idea is the same in all three: Logic Apps raises an exception on the SAP side, SAP receives it as an RFC exception, and your ABAP wrapper converts that into something predictable (for example, a readable EXCEPTIONMSG , a populated RETURN , or both). The differences are in how much control you want over the exception identity and whether you want to leverage SAP message classes for consistent, localized messages. 5.1 Default exception This first example shows the default behavior of Send exception to SAP server. When the action runs without a custom exception name configuration, the connector raises a pre-defined exception that can be handled explicitly in ABAP. On the Logic Apps side, the action card “Send exception to SAP server” sends an Exception Error Message (for example, “Unexpected action in request: …”). On the ABAP side, the RFC call lists SENDEXCEPTIONTOSAPSERVER = 1 under EXCEPTIONS , and the code uses CASE sy-subrc to map that exception to a readable message. The key takeaway is that you get a reliable “out-of-the-box” exception path: ABAP can treat sy-subrc = 1 as the workflow‑raised failure and generate a consistent EXCEPTIONMSG . This is the simplest option and works well when you don’t need multiple exception names—just one clear “workflow failed” signal. 5.2 Message exception If you want more control than the default, you can configure the action to raise a named exception declared in your ABAP function module interface. This makes it easier to route different failure types without parsing free-form text. The picture shows Advanced parameters under the Logic Apps action, including “Exception Name” with helper text indicating it must match an exception declared in the ABAP function module definition. This option is useful when you want to distinguish workflow error categories (e.g., validation vs. routing vs. downstream failures) using exception identity, not just message text. The contract stays explicit: Logic Apps raises a named exception, and ABAP can branch on that name (or on sy-subrc mapping) with minimal ambiguity. 5.3 Message class exception The third approach uses SAP’s built-in message class mechanism so that the exception raised by the workflow can map cleanly into SAP’s message catalog ( T100 ). This is helpful when you want consistent formatting and localization aligned with standard SAP patterns. On the Logic Apps side, the action shows advanced fields including Message Class, Message Number, and an Is ABAP Message toggle, with helper text stating the message class can come from message maintenance ( SE91 ) or be custom. On the ABAP side, the code highlights an error-handling block that calls using sy-msgid , sy-msgno , and variables sy-msgv1 … sy-msgv4 , then stores the resulting text in EXCEPTIONMSG . This pattern is ideal when you want workflow exceptions to look and behave like “native” SAP messages. Instead of hard-coding strings, you rely on the message catalog and let ABAP produce a consistent final message via FORMAT_MESSAGE . The result is easier to standardize across teams and environments—especially if you already manage message classes as part of your SAP development process. Refer to Appendix 2 for further information on FORMAT_MESSAGE . 5.4 Choosing an exception strategy that SAP can act on Across these examples, the goal is consistent: treat workflow failures as first‑class outcomes in SAP, not as connector noise buried in run history. The Logic Apps action Send exception to SAP server gives you three increasingly structured ways to do that, and the “right” choice depends on how much semantics you want SAP to understand. Default exception (lowest ceremony): Use this when you just need a reliable “workflow failed” signal. The connector raises a pre-defined exception name (for example, SENDEXCEPTIONTOSAPSERVER ), and ABAP can handle it with a simple EXCEPTIONS … = 1 mapping and a sy-subrc check. This is the fastest way to make failures visible and deterministic. Named exception(s) (more routing control): Use this when you want SAP to distinguish failure types without parsing message text. By raising an exception name declared in the ABAP function module interface, you can branch cleanly in ABAP (or map to different return handling) and keep the contract explicit and maintainable. Message class + number (most SAP-native): Use this when you want errors to look and behave like standard SAP messages—consistent wording, centralized maintenance, and better alignment with SAP operational practices. In this mode, ABAP can render the final localized string using FORMAT_MESSAGE and return it as EXCEPTIONMSG (and optionally BAPIRET2 - MESSAGE ), which makes the failure both human-friendly and SAP-friendly. A practical rule of thumb: start with the default exception while you stabilize the integration, move to named exceptions when you need clearer routing semantics, and adopt message classes when you want SAP-native error governance (standardization, maintainability, and localization). Regardless of the option, the key is to end with a predictable SAP-side contract: a clear success path, and a failure path that produces a structured return and a readable message. 6. Response Handling This section shows how the destination workflow returns either a successful analysis response or a workflow exception back to SAP, and how the source (caller) workflow interprets the RFC response structure to produce a single, human‑readable outcome (an email body). The key idea is to keep the SAP-facing contract stable: SAP always returns a Z_GET_ORDERS_ANALYSISResponse envelope, and the caller workflow decides between success and error using just two fields: EXCEPTIONMSG and RETURN / MESSAGE . To summarize the steps: Destination workflow either: sends a normal response via Respond to SAP server, or raises an exception via Send exception to SAP server (with an error message). SAP server exposes those outcomes through the RFC wrapper: sy-subrc = 0 → success ( EXCEPTIONMSG = 'ok') sy-subrc = 1 → workflow exception ( SENDEXCEPTIONTOSAPSERVER ) sy-subrc = 2/3 → system/communication failures Source workflow calls the RFC, extracts: EXCEPTIONMSG RETURN / MESSAGE and uses an Has errors gate to choose between a success email body (analysis) or a failure email body (error summary). The figure below shows the full return path for results and failures. On the right, the destination workflow either responds normally (Respond to SAP server) or raises a workflow exception (Send exception to SAP server). SAP then maps that into the RFC outcome ( sy-subrc and message fields). On the left, the source workflow parses the RFC response structure and populates a single EmailBody variable using two cases: failure (error details) or success (analysis text). Figure: Response/exception flow Two things make this pattern easy to operationalize. First, the caller workflow does not need to understand every SAP field—only EXCEPTIONMSG and RETURN / MESSAGE are required to decide success vs failure. Second, the failure path intentionally aggregates details ( MESSAGE_V1 … MESSAGE_V4 plus the exception text) into a single readable string so errors don’t get trapped in run history. Callout: The caller workflow deliberately treats EXCEPTIONMSG != "ok" or RETURN / MESSAGE present as the single source of truth for failure, which keeps the decision logic stable even if the response schema grows. Detailed description Phase 1 — Destination workflow: choose “response” vs “exception” Respond to SAP server returns the normal response payload back to SAP. Send exception to SAP server raises a workflow failure with an Exception Error Message (the screenshot shows an example beginning with “Unexpected action in request:” and a token for Function Name). Outcome: SAP receives either a normal response or a raised exception for the RFC call. Phase 2 — SAP server: map workflow outcomes to RFC results The SAP-side wrapper code shown in the figure calls: CALL FUNCTION ' Z_GET_ORDERS_ANALYSIS ' DESTINATION DEST ... It declares exception mappings including: SENDEXCEPTIONTOSAPSERVER = 1 system_failure = 2 MESSAGE EXCEPTIONMSG communication_failure = 3 MESSAGE EXCEPTIONMSG OTHERS = 4 Then it uses CASE sy-subrc . to normalize outcomes (the figure shows WHEN 0. setting EXCEPTIONMSG = 'ok'., and WHEN 1. building a readable message for the workflow exception). Outcome: regardless of why it failed, SAP can provide a consistent set of fields back to the caller: a return structure and an exception/status message. Phase 3 — Source workflow: parse response and build one “email body” After the RFC action ([RFC] Call Z GET ORDERS ANALYSIS ) the source workflow performs: Save EXCEPTION message Extracts EXCEPTIONMSG from the response XML using XPath. Save RETURN message Extracts RETURN / MESSAGE from the response XML using XPath. Initialize email body Creates EmailBody once, then sets it in exactly one of two cases. Has errors (two cases) The condition treats the run as “error” if either: EXCEPTIONMSG is not equal to "ok", or RETURN / MESSAGE is not empty. Set email body (failure) / Set email body (success) Failure: builds a consolidated string containing RETURN / MESSAGE , message details ( MESSAGE_V1 ..V4), and EXCEPTIONMSG . Success: sets EmailBody to the ANALYSIS field extracted from the response. Outcome: the caller produces a single artifact (EmailBody) that is readable and actionable, without requiring anyone to inspect the raw RFC response. 7. Destination Workflow #2: Persisting failed rows as custom IDocs In this section I zoom in on the optional “IDoc persistence” branch at the end of the destination workflow. After the workflow identifies invalid rows (via the Data Validation Agent) and emails a verification summary, it can optionally call a second SAP RFC to save the failed rows as IDocs for later processing. This is mainly included to showcase another common SAP integration scenario—creating/handling IDocs—and to highlight that you can combine “AI-driven validation” with traditional enterprise workflows. The deeper motivation for invoking this as part of the agent tooling is covered in Part 2; here, the goal is to show the connector pattern and the custom RFC used to create IDocs from CSV input. The figure below shows the destination workflow at two levels: a high-level overview at the top, and a zoomed view of the post-validation remediation steps at the bottom. The zoom starts from Data Validation Agent → Summarize CSV payload review and then expands the sequence that runs after Send verification summary: Transform CSV to XML followed by an SAP RFC call that creates IDocs from the failed data. The key point is that this branch is not the main “analysis response” path. It’s a practical remediation option: once invalid rows are identified and reported, the workflow can persist them into SAP using a dedicated RFC ( Z_CREATE_ONLINEORDER_IDOC ) and a simple IT_CSV payload. This keeps the end-to-end flow modular: analysis can remain focused on validated data, while failed records can be routed to SAP for follow-up processing on their own timeline. Callout: This branch exists to showcase an IDoc-oriented connector scenario. The “why this is invoked from the agent tooling” context is covered in Part 2; here the focus is the mechanics of calling Z_CREATE_ONLINEORDER_IDOC with IT_CSV and receiving ET_RETURN / ET_DOCNUMS . The screenshot shows an XML body with the RFC root element and an SAP namespace: <z_create_onlineorder_idoc xmlns="http://Microsoft.LobServices.Sap/2007/03/Rfc/"> <iv_direction>...</iv_direction> <iv_sndptr>...</iv_sndptr> <iv_sndprn>...</iv_sndprn> <iv_rcvptr>...</iv_rcvptr> <iv_rcvprn>...</iv_rcvprn> <it_csv> @{ ...Outputs... } </it_csv> <et_return></et_return> <et_docnums></et_docnums> </z_create_onlineorder_idoc> What to notice: the workflow passes invalid CSV rows in IT_CSV , and SAP returns a status table ( ET_RETURN ) and created document numbers ( ET_DOCNUMS ) for traceability. The payload includes standard-looking control fields ( IV_DIRECTION , IV_SNDPTR , IV_SNDPRN , IV_RCVPTR , IV_RCVPRN ) and the actual failed-row payload as IT_CSV . IT_CSV is populated via a Logic Apps expression (shown as @{ ...Outputs... } in the screenshot), which is the bridge between the prior transform step and the RFC call. The response side indicates table-like outputs: ET_RETURN and ET_DOCNUMS . 7.1 From CSV to IDocs I’ll cover the details of Destination workflow #2 in Part 2. In this post (Part 1), I focus on the contract and the end-to-end mechanics: what the RFC expects, what it returns, and how the created IDocs show up in the receiving workflow. Before looking at the RFC itself, it helps to understand the payload we’re building inside the IDoc. The screenshot below shows the custom segment definition used by the custom IDoc type. This segment is intentionally shaped to mirror the columns of the CSV input so the mapping stays direct and easy to reason about. Figure: Custom segment ZONLINEORDER000 (segment type ZONLINEORDER ) This segment definition is the contract anchor: it makes the CSV-to-IDoc mapping explicit and stable. Each CSV record becomes one segment instance with the same 14 business fields. That keeps the integration “boringly predictable,” which is exactly what you want when you’re persisting rejected records for later processing. The figure below shows the full loop for persisting failed rows as IDocs. The source workflow calls the custom RFC and sends the invalid CSV rows as XML. SAP converts each row into the custom segment and creates outbound IDocs. Those outbound IDocs are then received by Destination workflow #2, which processes them asynchronously (one workflow instance per IDoc) and appends results into shared storage for reporting. This pattern deliberately separates concerns: the first destination workflow identifies invalid rows and decides whether to persist them, SAP encapsulates the mechanics of IDoc creation behind a stable RFC interface, and a second destination workflow processes those IDocs asynchronously (one per IDoc), which is closer to how IDoc-driven integrations typically operate in production. Destination workflow #2 is included here to show the end-to-end contract and the “receipt” side of the connector scenario: Triggered by the SAP built-in trigger and checks FunctionName = IDOC_INBOUND_ASYNCHRONOUS extracts DOCNUM from the IDoc control record ( EDI_DC40 / DOCNUM ) reconstructs a CSV payload from the IDoc data segment (the fields shown match the segment definition) appends a “verification info” line to shared storage for reporting The implementation details of that workflow (including why it is invoked from the agent tooling) are covered in Part 2. 7.2 Z_CREATE_ONLINEORDER_IDOC - Contract overview The full source code for Z_CREATE_ONLINEORDER_IDOC is included in the supporting material. It’s too long to reproduce inline, so this post focuses on the contract—the part you need to call the RFC correctly and interpret its results. A quick note on authorship: most of the implementation was generated with Copilot, with manual review and fixes to resolve build errors and align the behavior with the intended integration pattern. The contract is deliberately generic because the goal was to produce an RFC that’s reusable across more than one scenario, rather than tightly coupled to a single workflow. At a high level, the RFC is designed to support: Both inbound and outbound IDoc creation It can either write IDocs to the SAP database (inbound-style persistence) or create/distribute IDocs outbound. Multiple IDoc/message/segment combinations IDoc type ( IDOCTYP ), message type ( MESTYP ), and segment type ( SEGTP ) are configurable so the same RFC can be reused. Explicit partner/port routing control Optional sender/receiver partner/port fields can be supplied when routing matters. Traceability of created artifacts The RFC returns created IDoc numbers so the caller can correlate “these failed rows” to “these IDocs.” Contract: Inputs (import parameters) IV_DIRECTION (default: 'O') — 'I' for inbound write-to-db, 'O' for outbound distribute/dispatch IV_IDOCTYP (default: ZONLINEORDERIDOC ) IV_MESTYP (default: ZONLINEORDER ) IV_SEGTP (default: ZONLINEORDER ) Optional partner/port routing fields: IV_SNDPRT , IV_SNDPRN , IV_RCVPRT , IV_RCVPRN , IV_RCVPOR Tables IT_CSV (structure ZTY_CSV_LINE ) — each row is one CSV line (the “table-of-lines” pattern) ET_RETURN (structure BAPIRET2 ) — success/warning/error messages (per-row and/or aggregate) ET_DOCNUMS (type ZTY_DOCNUM_TT ) — list of created IDoc numbers for correlation/traceability Outputs EV_DOCNUM — a convenience “primary / last created” DOCNUM value returned by the RFC 8. Concluding Remarks Part 1 established a stable SAP ↔ Logic Apps integration baseline: CSV moves end‑to‑end using explicit contracts, and failures are surfaced predictably. The source workflow reads CSV from Blob, wraps rows into the IT_CSV table‑of‑lines payload, calls Z_GET_ORDERS_ANALYSIS , and builds one outcome using two fields from the RFC response: EXCEPTIONMSG and RETURN / MESSAGE . The destination workflow gates requests, validates input, and returns only analysis (or errors) back to SAP while handling invalid rows operationally (notification + optional persistence). On the error path, we covered three concrete patterns to raise workflow failures back into SAP: the default connector exception ( SENDEXCEPTIONTOSAPSERVER ), named exceptions (explicit ABAP contract), and message‑class‑based errors (SAP‑native formatting via FORMAT_MESSAGE ). On the remediation side, we added a realistic enterprise pattern: persist rejected rows as custom IDocs via Z_CREATE_ONLINEORDER_IDOC ( IT_CSV in, ET_RETURN + ET_DOCNUMS out), using the custom segment ZONLINEORDER000 as the schema anchor and enabling downstream receipt in Destination workflow #2 (one run per IDoc, correlated via DOCNUM ). Part 2 is separate because it tackles a different problem: the AI layer. With contracts and error semantics now fixed, Part 2 can focus on the agent/tooling details that tend to iterate—rule retrieval, structured validation outputs, prompt constraints, token/history controls, and how the analysis output is generated and shaped—without muddying the transport story. Appendix 1: Parse XML with schema In this section I consider the CSV payload creation as an example, but parsing XML with schema applies in every place where we get an XML input to process, such as when receiving SAP responses, exceptions, or request/responses from other RFCs. Strong contract The Create_CSV_payload step in the shown implementation uses an xpath() + join() expression to extract LINE values from the incoming XML: join( xpath( xml(triggerBody()?['content']), '/*[local-name()="Z_GET_ORDERS_ANALYSIS"] /*[local-name()="IT_CSV"] /*[local-name()="ZTY_CSV_LINE"] /*[local-name()="LINE"]/text()' ), '\r\n' ) That approach works, but it’s essentially a “weak contract”: it assumes the message shape stays stable and that your XPath continues to match. By contrast, the Parse XML with schema action turns the XML payload into structured data based on an XSD, which gives you a “strong contract” and enables downstream steps to bind to known fields instead of re-parsing XML strings. The figure below compares two equivalent ways to build the CSV payload from the RFC input. On the left is the direct xpath() compose (labeled “weak contract”). On the right is the schema-based approach (labeled “strong contract”), where the workflow parses the request first and then builds the CSV payload by iterating over typed rows. What’s visible in the diagram is the key tradeoff: XPath compose path (left): the workflow creates the CSV payload directly using join(xpath(...), '\r\n') , with the XPath written using local-name() selectors. This is fast to prototype, but the contract is implicit—your workflow “trusts” the XML shape and your XPath accuracy. Parse XML with schema path (right): the workflow inserts a Parse XML with schema step (“ Parse Z GET ORDERS ANALYSIS request ”), initializes variables, loops For each CSV row, and Appends to CSV payload, then performs join(variables('CSVPayload'), '\r\n') . Here, the contract is explicit—your XSD defines what IT_CSV and LINE mean, and downstream steps bind to those fields rather than re-parsing XML. A good rule of thumb is: XPath is great for lightweight extraction, while Parse XML with schema is better when you want contract enforcement and long-term maintainability, especially in enterprise integration / BizTalk migration scenarios where schemas are already part of the integration culture. Implementation details The next figure shows the concrete configuration for Parse XML with schema and how its outputs flow into the “For each CSV row” loop. This is the “strong contract” version of the earlier XPath compose. This screenshot highlights three practical implementation details: The Parse action is schema-backed. In the Parameters pane, the action uses: Content: the incoming XML Response Schema source: LogicApp Schema name: Z_GET_ORDERS_ANALYSIS The code view snippet shows the same idea: type: "XmlParse" with content: " @triggerBody()?['content'] " and schema: { source: "LogicApp", name: "Z_GET_ORDERS_ANALYSIS.xsd" }. The parsed output becomes typed “dynamic content.” The loop input is shown as “ JSON Schema for element 'Z_GET_ORDERS_ANALYSIS: IT_CSV' ”. This is the key benefit: you are no longer scraping strings—you are iterating over a structured collection that was produced by schema-based parsing. The LINE extraction becomes trivial and readable. The “Append to CSV payload” step appends @item()?['LINE'] to the CSVpayload variable (as shown in the code snippet). Then the final Create CSV payload becomes a simple join(variables('CSVPayload'), '\r\n') . This is exactly the kind of “workflow readability” benefit you get once XML parsing is schema-backed. Schema generation The Parse action requires XSD schemas, which can be stored in the Logic App (or via a linked Integration Account). The final figure shows a few practical ways to obtain and manage those XSDs: Generate Schema (SAP connector): a “Generate Schema” action with Operation Type = RFC and an RFC Name field, which is a practical way to bootstrap schema artifacts when you already know the RFC you’re calling. Run Diagnostics / Fetch RFC Metadata: a “Run Diagnostics” action showing Operation type = Fetch RFC Metadata and RFC Name, which is useful to confirm the shape of the RFC interface and reconcile it with your XSD/contract. If you don’t want to rely solely on connector-side schema generation, there are also classic “developer tools” approaches: Infer XSD from a sample XML using .NET’s XmlSchemaInference (good for quick starting points). Generate XSD from an XML instance using xsd.exe (handy when you already have representative sample payloads) or by asking your favorite AI prompt. When to choose XPath vs Parse XML with schema (practical guidance) Generally speaking, choose XPath when… You need a quick extraction and you’re comfortable maintaining a single XPath. You don’t want to manage schema artifacts yet (early prototypes). Choose Parse XML with schema when… You want a stronger, explicit contract (XSD defines what the payload is). You want the designer to expose structured outputs (“JSON Schema for element …”) so downstream steps are readable and less brittle. You expect the message shape to evolve over time and prefer schema-driven changes over XPath surgery. Appendix 2: Using FORMAT_MESSAGE to produce SAP‑native error text When propagating failures from Logic Apps back into SAP (for example via Send exception to SAP server), I want the SAP side to produce a predictable, human‑readable message without forcing callers to parse connector‑specific payloads. ABAP’s FORMAT_MESSAGE is ideal for this because it converts SAP’s message context—message class, message number, and up to four variables—into the final message text that SAP would normally display, but without raising a UI message. What FORMAT_MESSAGE does FORMAT_MESSAGE formats a message defined in SAP’s message catalog ( T100 / maintained via SE91 ) using the values in sy-msgid , sy-msgno , and sy-msgv1 … sy-msgv4 . Conceptually, it answers the question: “Given message class + number + variables, what is the rendered message string?” This is particularly useful after an RFC call fails, where ABAP may have message context available even if the exception itself is not a clean string. Why this matters in an RFC wrapper In the message class–based exception configuration, the workflow can provide message metadata (class/number/type) so that SAP can behave “natively”: ABAP receives a failure ( sy-subrc <> 0), formats the message using FORMAT_MESSAGE , and returns the final text in a field like EXCEPTIONMSG (and/or in BAPIRET2 - MESSAGE ). The result is: consistent wording across systems and environments easier localization (SAP selects language-dependent text) separation of concerns: code supplies variables; message content lives in message maintenance A robust pattern After the RFC call, I use this order of precedence: Use any explicit text already provided (for example via system_failure … MESSAGE exceptionmsg), because it’s already formatted. If that’s empty but SAP message context exists ( sy-msgid / sy-msgno ), call FORMAT_MESSAGE to produce the final string. If neither is available, fall back to a generic message that includes sy-subrc . Here is a compact version of that pattern: DATA: lv_text TYPE string. CALL FUNCTION 'Z_GET_ORDERS_ANALYSIS' DESTINATION dest IMPORTING analysis = analysis TABLES it_csv = it_csv CHANGING return = return EXCEPTIONS sendexceptiontosapserver = 1 system_failure = 2 MESSAGE exceptionmsg communication_failure = 3 MESSAGE exceptionmsg OTHERS = 4. IF sy-subrc <> 0. "Prefer explicit message text if it already exists IF exceptionmsg IS INITIAL. "Otherwise format SAP message context into a string IF sy-msgid IS NOT INITIAL AND sy-msgno IS NOT INITIAL. CALL FUNCTION 'FORMAT_MESSAGE' EXPORTING id = sy-msgid no = sy-msgno v1 = sy-msgv1 v2 = sy-msgv2 v3 = sy-msgv3 v4 = sy-msgv4 IMPORTING msg = lv_text. exceptionmsg = lv_text. ELSE. exceptionmsg = |RFC failed (sy-subrc={ sy-subrc }).|. ENDIF. ENDIF. "Optionally normalize into BAPIRET2 for structured consumption return-type = 'E'. return-message = exceptionmsg. ENDIF. Common gotchas FORMAT_MESSAGE only helps if sy-msgid and sy-msgno are set. If the failure did not originate from an SAP message (or message mapping is disabled), these fields may be empty—so keep a fallback. Message numbers are typically 3-digit strings (e.g., 001, 012), matching how messages are stored in the catalog. FORMAT_MESSAGE formats text; it does not raise or display a message. That makes it safe to use in RFC wrappers and background processing. Bottom line: FORMAT_MESSAGE is a simple tool that helps workflow‑originated failures “land” in SAP as clean, SAP‑native messages—especially when using message classes to standardize and localize error text. References Logic Apps Agentic Workflows with SAP - Part 2: AI Agents Handling Errors in SAP BAPI Transactions | Microsoft Community Hub Access SAP from workflows | Microsoft Learn Create common SAP workflows | Microsoft Learn Generate Schemas for SAP Artifacts via Workflows | Microsoft Learn Parse XML using Schemas in Standard workflows - Azure Logic Apps | Microsoft Learn Announcing XML Parse and Compose for Azure Logic Apps GA Exception Handling | ABAP Keyword Documentation Handling and Propagating Exceptions - ABAP Keyword Documentation SAP .NET Connector 3.1 Overview SAP .NET Connector 3.1 Programming Guide All supporting content for this post may be found in the companion GitHub repository.🚀 AI Toolkit for VS Code — February 2026 Update
February brings a major milestone for AI Toolkit. Version 0.30.0 is packed with new capabilities that make agent development more discoverable, debuggable, and production-ready—from a brand-new Tool Catalog, to an end-to-end Agent Inspector, to treating evaluations as first-class tests. 🔧 New in v0.30.0 🧰 Tool Catalog: One place to discover and manage agent tools The new Tool Catalog is a centralized hub for discovering, configuring, and integrating tools into your AI agents. Instead of juggling scattered configs and definitions, you now get a unified experience for tool management: Browse, search, and filter tools from the public Foundry catalog and local stdio MCP servers Configure connection settings for each tool directly in VS Code Add tools to agents seamlessly via Agent Builder Manage the full tool lifecycle: add, update, or remove tools with confidence Why it matters: expanding your agent’s capabilities is now a few clicks away—and stays manageable as your agent grows. 🕵️ Agent Inspector: Debug agents like real software The new Agent Inspector turns agent debugging into a first-class experience inside VS Code. Just press F5 and launch your agent with full debugger support. Key highlights: One-click F5 debugging with breakpoints, variable inspection, and step-through execution Copilot auto-configuration that scaffolds agent code, endpoints, and debugging setup Production-ready code generated using the Hosted Agent SDK, ready for Microsoft Foundry Real-time visualization of streaming responses, tool calls, and multi-agent workflows Quick code navigation—double-click workflow nodes to jump straight to source Unified experience combining chat and workflow visualization in one view Why it matters: agents are no longer black boxes—you can see exactly what’s happening, when, and why. 🧪 Evaluation as Tests: Treat quality like code With Evaluation as Tests, agent quality checks now fit naturally into existing developer workflows. What’s new: Define evaluations as test cases using familiar pytest syntax and Eval Runner SDK annotations Run evaluations directly from VS Code Test Explorer, mixing and matching test cases Analyze results in a tabular view with Data Wrangler integration Submit evaluation definitions to run at scale in Microsoft Foundry Why it matters: evaluations are no longer ad-hoc scripts—they’re versioned, repeatable, and CI-friendly. 🔄 Improvements across the Toolkit 🧱 Agent Builder Agent Builder received a major usability refresh: Redesigned layout for better navigation and focus Quick switcher to move between agents effortlessly Support for authoring, running, and saving Foundry prompt agents Add tools to Foundry prompt agents directly from the Tool Catalog or built-in tools New Inspire Me feature to help you get started when drafting agent instructions Numerous performance and stability improvements 🤖 Model Catalog Added support for models using the OpenAI Response API, including gpt-5.2-codex General performance and reliability improvements 🧠 Build Agent with GitHub Copilot New Workflow entry point to quickly generate multi-agent workflows with Copilot Ability to orchestrate workflows by selecting prompt agents from Foundry 🔁 Conversion & Profiling Generate interactive playgrounds for history models Added Qualcomm GPU recipes Show resource usage for Phi Silica directly in Model Playground ✨ Wrapping up Version 0.30.0 is a big step forward for AI Toolkit. With better discoverability, real debugging, structured evaluation, and deeper Foundry integration, building AI agents in VS Code now feels much closer to building production software. As always, we’d love your feedback—keep it coming, and happy agent building! 🚀Teaching AI Development Through Gamification:
Introduction Learning AI development can feel overwhelming. Developers face abstract concepts like embeddings, prompt engineering, and workflow orchestration topics that traditional tutorials struggle to make tangible. How do you teach someone what an embedding "feels like" or why prompt engineering matters beyond theoretical examples? The answer lies in experiential learning through gamification. Instead of reading about AI concepts, what if developers could play a game that teaches these ideas through progressively challenging levels, immediate feedback, and real AI interactions? This article explores exactly that: building an educational adventure game that transforms AI learning from abstract theory into hands-on exploration. We'll dive into Foundry Local Learning Adventure, a JavaScript-based game that teaches AI fundamentals through five interactive levels. You'll learn how to create engaging educational experiences, integrate local AI models using Foundry Local, design progressive difficulty curves, and build cross-platform applications that run both in browsers and terminals. Whether you're an educator designing technical curriculum or a developer building learning tools, this architecture provides a proven blueprint for gamified technical education. Why Gamification Works for Technical Learning Traditional technical education follows a predictable pattern: read documentation, watch tutorials, attempt exercises, struggle with setup, eventually give up. The problem isn't content quality, it's engagement and friction. Gamification addresses both issues simultaneously. By framing learning as progression through levels, you create intrinsic motivation. Each completed challenge feels like unlocking a new ability in a game, triggering the same dopamine response that keeps players engaged in entertainment experiences. Progress is visible, achievements are celebrated, and setbacks feel like natural parts of the journey rather than personal failures. More importantly, gamification reduces friction. Instead of "install dependencies, configure API keys, read documentation, write code, debug errors," learners simply start the game and begin playing. The game handles setup, provides guardrails, and offers immediate feedback. When a concept clicks, the game celebrates it. When learners struggle, hints appear automatically. For AI development specifically, gamification solves a unique challenge: making probabilistic, non-deterministic systems feel approachable. Traditional programming has clear right and wrong answers, but AI outputs vary. A game can frame this variability as exploration rather than failure, teaching developers to evaluate AI responses critically while maintaining confidence. Architecture Overview: Dual-Platform Design for Maximum Reach The Foundry Local Learning Adventure implements a dual-platform architecture with separate but consistent implementations for web browsers and command-line terminals. This design maximizes accessibility, learners can start playing instantly in a browser, then graduate to CLI mode for the full terminal experience when they're ready to go deeper. The web version prioritizes zero-friction onboarding. It's deployed to GitHub Pages and can also be opened locally via a simple HTTP server, no build step, no package managers. The game starts with simulated AI responses in demo mode, but crucially, it also supports real AI responses when Foundry Local is installed. The web version auto-discovers Foundry Local's dynamic port through a foundry-port.json file (written by the startup scripts) or by scanning common ports. Progress saves to localStorage, badges unlock as you complete challenges, and an AI-powered mentor named Sage guides you through a chat widget in the corner. This version is perfect for classrooms, conference demos, and learners who want to try before committing to a full CLI setup. The CLI version provides the full terminal experience with real AI interactions. Built on Node.js with ES modules, this version features a custom FoundryLocalClient class that connects to Foundry Local's OpenAI-compatible REST API. Instead of relying on an external SDK, the game implements its own API client with automatic port discovery, model selection, and graceful fallback to demo mode. The terminal interface includes a rich command system ( play , hint , ask , explain , progress , badges ) and the Sage mentor provides contextual guidance throughout. Both versions implement the same five levels and learning objectives independently. The CLI uses game/src/game.js , levels.js , and mentor.js as ES modules, while the web version uses game/web/game-web.js and game-data.js . A key innovation is the automatic port discovery system, which eliminates manual configuration: // 3-tier port discovery strategy (game/src/game.js) class FoundryLocalClient { constructor() { this.commonPorts = [61341, 5272, 51319, 5000, 8080]; this.mode = 'demo'; // 'local', 'azure', or 'demo' } async initialize() { // Tier 1: CLI discovery - parse 'foundry service status' output const cliPort = await this.discoverPortViaCLI(); if (cliPort) { this.baseUrl = cliPort; this.mode = 'local'; return; } // Tier 2: Try configured URL from config.json if (await this.tryFoundryUrl(config.foundryLocal.baseUrl)) { this.mode = 'local'; return; } // Tier 3: Scan common ports for (const port of this.commonPorts) { if (await this.tryFoundryUrl(`http://127.0.0.1:${port}`)) { this.mode = 'local'; return; } } // Fallback: demo mode with simulated responses console.log('💡 Running in demo mode (no Foundry Local detected)'); this.mode = 'demo'; } async chat(messages, options = {}) { if (this.mode === 'demo') return this.getDemoResponse(messages); const response = await fetch(`${this.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: this.selectedModel, messages, temperature: options.temperature || 0.7, max_tokens: options.max_tokens || 300 }) }); const data = await response.json(); return data.choices[0].message.content; } } This architecture demonstrates several key principles for educational software: Progressive disclosure: Start simple (web demo mode), add complexity optionally (real AI via Foundry Local or Azure) Consistent learning outcomes: Both platforms teach the same five concepts through independently implemented but equivalent experiences Zero barriers to entry: No installation required for the web version eliminates the #1 reason learners abandon technical tutorials Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Graceful degradation: Three connection modes (local, Azure, demo) ensure the game always works regardless of setup Level Design: Teaching AI Concepts Through Progressive Challenges The game's five levels form a carefully designed curriculum that builds AI understanding incrementally. Each level introduces one core concept, provides hands-on practice, and validates learning before proceeding. Level 1: Meet the Model teaches the fundamental request-response pattern. Learners send their first message to an AI and see it respond. The challenge is deliberately trivial, just say hello, because the goal is building confidence. The level succeeds when the learner realizes "I can talk to an AI and it understands me." This moment of agency sets the foundation for everything else. The implementation focuses on positive reinforcement. In the CLI version, the Sage mentor celebrates each completion with contextual messages, while the web version displays inline celebration banners with badge animations: // Level 1 execution (game/src/game.js) async executeLevel1() { const level = this.levels.getLevel(1); this.displayLevelHeader(level); // Sage introduces the level const intro = await this.mentor.introduceLevel(1); console.log(`\n🧙 Sage: ${intro}`); const userPrompt = await this.askQuestion('\nYour prompt: '); console.log('\n🤖 AI is thinking...'); const response = await this.client.chat([ { role: 'system', content: 'You are Sage, a friendly AI mentor.' }, { role: 'user', content: userPrompt } ]); console.log(`\n📨 AI Response:\n${response}`); if (response && response.length > 10) { // Sage celebrates const celebration = await this.mentor.celebrateLevelComplete(1); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n🎯 You earned the Prompt Apprentice badge!'); console.log('🏆 +100 points'); this.progress.completeLevel(1, 100, '🎯 Prompt Apprentice'); } } This celebration pattern repeats throughout, explicit acknowledgment of success via the Sage mentor, explanation of what was learned, and a preview of what's next. The mentor system ( game/src/mentor.js ) provides contextual encouragement using AI-generated or pre-written fallback messages, transforming abstract concepts into concrete achievements. Level 2: Prompt Mastery introduces prompt quality through comparison. The game presents a deliberately poor prompt: "tell me stuff about coding." Learners must rewrite it to be specific, contextual, and actionable. The game runs both prompts, displays results side-by-side, and asks learners to evaluate the difference. // Level 2: Prompt Improvement (game/src/game.js) async executeLevel2() { const level = this.levels.getLevel(2); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(2); console.log(`\n🧙 Sage: ${intro}`); // Show the bad prompt const badPrompt = "tell me stuff about coding"; console.log(`\n❌ Poor prompt: "${badPrompt}"`); console.log('\n🤖 Getting response to bad prompt...'); const badResponse = await this.client.chat([ { role: 'user', content: badPrompt } ]); console.log(`\n📊 Bad prompt result:\n${badResponse}`); // Get the learner's improved version console.log('\n✍️ Now write a BETTER prompt about the same topic:'); const goodPrompt = await this.askQuestion('Your improved prompt: '); console.log('\n🤖 Getting response to your prompt...'); const goodResponse = await this.client.chat([ { role: 'user', content: goodPrompt } ]); console.log(`\n📊 Your prompt result:\n${goodResponse}`); // Evaluate: improved prompt should be longer and more specific const isImproved = goodPrompt.length > badPrompt.length && goodResponse.length > 0; if (isImproved) { const celebration = await this.mentor.celebrateLevelComplete(2); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n✨ You earned the Prompt Engineer badge!'); console.log('🏆 +150 points'); this.progress.completeLevel(2, 150, '✨ Prompt Engineer'); } else { const hint = await this.mentor.provideHint(2); console.log(`\n💡 Sage: ${hint}`); } } This comparative approach is powerful, learners don't just read about prompt engineering, they experience its impact directly. The before/after comparison makes quality differences undeniable. Level 3: Embeddings Explorer demystifies semantic search through practical demonstration. Learners search a knowledge base about Foundry Local using natural language queries. The game shows how embedding similarity works by returning relevant content even when exact keywords don't match. // Level 3: Embedding Search (game/src/game.js) async executeLevel3() { const level = this.levels.getLevel(3); this.displayLevelHeader(level); // Knowledge base loaded from game/data/knowledge-base.json const knowledgeBase = [ { id: 1, content: "Foundry Local runs AI models entirely on your device" }, { id: 2, content: "Embeddings convert text into numerical vectors" }, { id: 3, content: "Cosine similarity measures how related two texts are" }, // ... more entries about AI and Foundry Local ]; const query = await this.askQuestion('\n🔍 Search query: '); // Get embedding for user's query const queryEmbedding = await this.client.getEmbedding(query); // Get embeddings for all knowledge base entries const results = []; for (const item of knowledgeBase) { const itemEmbedding = await this.client.getEmbedding(item.content); const similarity = this.cosineSimilarity(queryEmbedding, itemEmbedding); results.push({ ...item, similarity }); } // Sort by similarity and show top matches results.sort((a, b) => b.similarity - a.similarity); console.log('\n📑 Top matches:'); results.slice(0, 3).forEach((r, i) => { console.log(` ${i + 1}. (${(r.similarity * 100).toFixed(1)}%) ${r.content}`); }); } // Cosine similarity calculation (also in TaskHandler) cosineSimilarity(a, b) { const dot = a.reduce((sum, val, i) => sum + val * b[i], 0); const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); return dot / (magA * magB); } // Demo mode generates pseudo-embeddings when Foundry isn't available getPseudoEmbedding(text) { // 128-dimension hash-based vector for offline demonstration const embedding = new Array(128).fill(0); for (let i = 0; i < text.length; i++) { embedding[i % 128] += text.charCodeAt(i) / 1000; } return embedding; } Learners query things like "How do I run AI offline?" and discover content about Foundry Local's offline capabilities—even though the word "offline" appears nowhere in the result. When Foundry Local is running, the game calls the /v1/embeddings endpoint for real vector representations. In demo mode, a pseudo-embedding function generates 128-dimension hash-based vectors that still demonstrate the concept of similarity search. This concrete demonstration of semantic understanding beats any theoretical explanation. Level 4: Workflow Wizard teaches AI pipeline composition. Learners build a three-step workflow: summarize text → extract keywords → generate questions. Each step uses the previous output as input, demonstrating how complex AI tasks decompose into chains of simpler operations. // Level 4: Workflow Builder (game/src/game.js) async executeLevel4() { const level = this.levels.getLevel(4); this.displayLevelHeader(level); const intro = await this.mentor.introduceLevel(4); console.log(`\n🧙 Sage: ${intro}`); console.log('\n📝 Enter text for the 3-step AI pipeline:'); const inputText = await this.askQuestion('Input text: '); // Step 1: Summarize console.log('\n⚙️ Step 1: Summarizing...'); const summary = await this.client.chat([ { role: 'system', content: 'Summarize this in 2 sentences.' }, { role: 'user', content: inputText } ]); console.log(` Result: ${summary}`); // Step 2: Extract keywords (chained from Step 1 output) console.log('\n🔑 Step 2: Extracting keywords...'); const keywords = await this.client.chat([ { role: 'system', content: 'Extract 5 important keywords.' }, { role: 'user', content: summary } ]); console.log(` Keywords: ${keywords}`); // Step 3: Generate questions (chained from Step 2 output) console.log('\n❓ Step 3: Generating study questions...'); const questions = await this.client.chat([ { role: 'system', content: 'Create 3 quiz questions about these topics.' }, { role: 'user', content: keywords } ]); console.log(` Questions:\n${questions}`); console.log('\n✅ Workflow complete!'); const celebration = await this.mentor.celebrateLevelComplete(4); console.log(`\n🧙 Sage: ${celebration}`); console.log('\n⚡ You earned the Workflow Wizard badge!'); console.log('🏆 +250 points'); this.progress.completeLevel(4, 250, '⚡ Workflow Wizard'); } This level bridges the gap between "toy examples" and real applications. Learners see firsthand how combining simple AI operations creates sophisticated functionality. Level 5: Build Your Own Tool challenges learners to create a custom AI-powered tool by selecting from pre-built templates and configuring them. Rather than asking learners to write arbitrary code, the game provides four structured templates that demonstrate how AI tools work in practice: // Level 5: Tool Builder templates (game/web/game-web.js) const TOOL_TEMPLATES = [ { id: 'summarizer', name: '📝 Text Summarizer', description: 'Summarizes long text into key points', systemPrompt: 'You are a text summarization tool. Provide concise summaries.', exampleInput: 'Paste any long article or document...' }, { id: 'translator', name: '🌐 Code Translator', description: 'Translates code between programming languages', systemPrompt: 'You are a code translation tool. Convert code accurately.', exampleInput: 'function hello() { console.log("Hello!"); }' }, { id: 'reviewer', name: '🔍 Code Reviewer', description: 'Reviews code for bugs, style, and improvements', systemPrompt: 'You are a code review tool. Identify issues and suggest fixes.', exampleInput: 'Paste code to review...' }, { id: 'custom', name: '✨ Custom Tool', description: 'Design your own AI tool with a custom system prompt', systemPrompt: '', // Learner provides this exampleInput: '' } ]; // Tool testing sends the configured system prompt + user input to Foundry Local async function testTool(template, userInput) { const response = await callFoundryAPI([ { role: 'system', content: template.systemPrompt }, { role: 'user', content: userInput } ]); console.log(`🔧 Tool output: ${response}`); return response; } This template-based approach is safer and more educational than arbitrary code execution. Learners select a template, customize its system prompt, test it with sample input, and see how the AI responds differently based on the tool's configuration. The "Custom Tool" option lets advanced learners design their own system prompts from scratch. Completing this level marks true understanding—learners aren't just using AI, they're shaping what it can do through prompt design and tool composition. Building the Web Version: Zero-Install Educational Experience The web version demonstrates how to create educational software that requires absolutely zero setup. This is critical for workshops, classroom settings, and casual learners who won't commit to installation until they see value. The architecture is deliberately simple, vanilla JavaScript with ES6 modules, no build tools, no package managers. The HTML includes a multi-screen layout with a welcome screen, level selection grid, game area, and modals for progress, badges, help, and game completion: <!-- game/web/index.html --> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Foundry Local Learning Adventure</title> <link rel="stylesheet" href="styles.css"> </head> <body> <!-- Welcome Screen with name input --> <div id="welcome-screen" class="screen active"> <h1>🎮 Foundry Local Learning Adventure</h1> <p>Master Microsoft Foundry AI - One Level at a Time!</p> <input type="text" id="player-name" placeholder="Enter your name"> <button id="start-btn">Start Adventure</button> <div id="foundry-status"><!-- Auto-detected connection status --></div> </div> <!-- Menu Screen with level grid --> <div id="menu-screen" class="screen"> <div class="level-grid"> <!-- 5 level cards with lock/unlock states --> </div> <div class="stats-bar"> <span id="points-display">0 points</span> <span id="badges-count">0/5 badges</span> </div> </div> <!-- Level Screen with task area --> <div id="level-screen" class="screen"> <div id="level-header"></div> <div id="task-area"><!-- Level-specific UI loads here --></div> <div id="response-area"></div> <div id="hint-area"></div> </div> <!-- Sage Mentor Chat Widget (fixed bottom-right) --> <div id="mentor-chat" class="mentor-widget"> <div class="mentor-header">🧙 Sage (AI Mentor)</div> <div id="mentor-messages"></div> <input type="text" id="mentor-input" placeholder="Ask Sage anything..."> </div> <script type="module" src="game-data.js"></script> <script type="module" src="game-web.js"></script> </body> </html> A critical feature of the web version is its ability to connect to a real Foundry Local instance. On startup, the game checks for a foundry-port.json file (written by the cross-platform start scripts) and falls back to scanning common ports: // game/web/game-web.js - Foundry Local auto-discovery let foundryConnection = { connected: false, baseUrl: null }; async function checkFoundryConnection() { // Try reading port from discovery file (written by start scripts) const discoveredPort = await readDiscoveredPort(); if (discoveredPort) { try { const resp = await fetch(`${discoveredPort}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: discoveredPort }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { /* continue to port scan */ } } // Scan common Foundry Local ports const ports = [61341, 5272, 51319, 5000, 8080]; for (const port of ports) { try { const resp = await fetch(`http://127.0.0.1:${port}/v1/models`); if (resp.ok) { foundryConnection = { connected: true, baseUrl: `http://127.0.0.1:${port}` }; updateStatusBadge('🟢 Foundry Local Connected'); return; } } catch (e) { continue; } } // Demo mode - use simulated responses from DEMO_RESPONSES updateStatusBadge('🟡 Demo Mode (install Foundry Local for real AI)'); } async function callFoundryAPI(messages) { if (!foundryConnection.connected) { return getDemoResponse(messages); // Simulated responses } const resp = await fetch(`${foundryConnection.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'auto', messages, temperature: 0.7 }) }); const data = await resp.json(); return data.choices[0].message.content; } The web version also includes level-specific UIs: each level type has its own builder function that constructs the appropriate interface. For example, Level 2 (Prompt Improvement) shows a split-view with the bad prompt result on one side and the learner's improved prompt on the other. Level 3 (Embeddings) presents a search interface with similarity scores. Level 5 (Tool Builder) offers a template selector with four options (Text Summarizer, Code Translator, Code Reviewer, and Custom). This architecture teaches several patterns for web-based educational tools: LocalStorage for persistence: Progress survives page refreshes without requiring accounts or databases ES6 modules for organization: Clean separation between game data ( game-data.js ) and engine ( game-web.js ) Hybrid AI mode: Real AI when Foundry Local is available, simulated responses when it's not—same code path for both Multi-screen navigation: Welcome, menu, level, and completion screens provide clear progression Always-available mentor: The Sage chat widget in the corner lets learners ask questions at any point Implementing the CLI Version with Real AI Integration The CLI version provides the authentic AI development experience. This version requires Node.js and Foundry Local, but rewards setup effort with genuine model interactions. Installation uses a startup script that handles prerequisites: #!/bin/bash # scripts/start-game.sh echo "🎮 Starting Foundry Local Learning Adventure..." # Check Node.js if ! command -v node &> /dev/null; then echo "❌ Node.js not found. Install from https://nodejs.org/" exit 1 fi # Check Foundry Local if ! command -v foundry &> /dev/null; then echo "❌ Foundry Local not found." echo " Install: winget install Microsoft.FoundryLocal" exit 1 fi # Start Foundry service echo "🚀 Starting Foundry Local service..." foundry service start # Wait for service sleep 2 # Load model echo "📦 Loading Phi-4 model..." foundry model load phi-4 # Install dependencies echo "📥 Installing game dependencies..." npm install # Start game echo "✅ Launching game..." npm start The game logic integrates with Foundry Local using the official SDK: // game/src/game.js import { FoundryLocalClient } from 'foundry-local-sdk'; import readline from 'readline/promises'; const client = new FoundryLocalClient({ endpoint: 'http://127.0.0.1:5272' // Default Foundry Local port }); async function getAIResponse(prompt, level) { try { const startTime = Date.now(); const completion = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are Sage, a friendly AI mentor teaching ${LEVELS[level-1].title}.` }, { role: 'user', content: prompt } ], temperature: 0.7, max_tokens: 300 }); const latency = Date.now() - startTime; console.log(`\n⏱️ AI responded in ${latency}ms`); return completion.choices[0].message.content; } catch (error) { console.error('❌ AI error:', error.message); console.log('💡 Falling back to demo mode...'); return getDemoResponse(prompt, level); } } async function playLevel(levelNumber) { const level = LEVELS[levelNumber - 1]; console.clear(); console.log(`\n${'='.repeat(60)}`); console.log(` Level ${levelNumber}: ${level.title}`); console.log(`${'='.repeat(60)}\n`); console.log(`🎯 ${level.objective}\n`); console.log(`📚 ${level.description}\n`); const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const userPrompt = await rl.question('Your prompt: '); rl.close(); console.log('\n🤖 AI is thinking...'); const response = await getAIResponse(userPrompt, levelNumber); console.log(`\n📨 AI Response:\n${response}\n`); // Evaluate success if (level.successCriteria(response, userPrompt)) { celebrateSuccess(level); updateProgress(levelNumber); if (levelNumber < 5) { const playNext = await askYesNo('Play next level?'); if (playNext) { await playLevel(levelNumber + 1); } } else { showGameComplete(); } } else { console.log(`\n💡 Hint: ${level.hints[0]}\n`); const retry = await askYesNo('Try again?'); if (retry) { await playLevel(levelNumber); } } } The CLI version adds several enhancements that deepen learning: Latency visibility: Display response times so learners understand local vs cloud performance differences Graceful fallback: If Foundry Local fails, switch to demo mode automatically rather than crashing Interactive prompts: Use readline for natural command-line interaction patterns Progress persistence: Save to JSON files so learners can pause and resume Command history: Log all prompts and responses for learners to review their progression Key Takeaways and Educational Design Principles Building effective educational software for technical audiences requires balancing several competing concerns: accessibility vs authenticity, simplicity vs depth, guidance vs exploration. The Foundry Local Learning Adventure succeeds by making deliberate architectural choices that prioritize learner experience. Key principles demonstrated: Zero-friction starts win: The web version eliminates all setup barriers, maximizing the chance learners will actually begin Automatic service discovery: The 3-tier port discovery strategy means no manual configuration, just install Foundry Local and play Progressive challenge curves build confidence: Each level introduces exactly one new concept, building on previous knowledge Immediate feedback accelerates learning: Learners know instantly if they succeeded, with Sage providing contextual explanations Real tools create transferable skills: The CLI version uses professional developer patterns (OpenAI-compatible REST APIs, ES modules, readline) that apply beyond the game Celebration creates emotional investment: Badges, points, and Sage's encouragement transform learning into achievement Dual platforms expand reach: Web attracts casual learners, CLI converts them to serious practitioners—and both support real AI Graceful degradation ensures reliability: Three connection modes (local, Azure, demo) mean the game always works regardless of setup To extend this approach for your own educational projects, consider: Domain-specific challenges: Adapt level structure to your technical domain (e.g., API design, database optimization, security practices) Multiplayer competitions: Add leaderboards and time trials to introduce social motivation Adaptive difficulty: Track learner performance and adjust challenge difficulty dynamically Sandbox modes: After completing the curriculum, provide free-play areas for experimentation Community sharing: Let learners share custom levels or challenges they've created The complete implementation with all levels, both web and CLI versions, comprehensive tests, and deployment guides is available at github.com/leestott/FoundryLocal-LearningAdventure. You can play the web version immediately at leestott.github.io/FoundryLocal-LearningAdventure or clone the repository to experience the full CLI version with real AI. Resources and Further Reading Foundry Local Learning Adventure Repository - Complete source code for both web and CLI versions Play Online Now - Try the web version instantly in your browser (supports real AI with Foundry Local installed) Microsoft Foundry Local Documentation - Official SDK and CLI reference Contributing Guide - How to contribute new levels or improvementsBuilding a Smart Building HVAC Digital Twin with AI Copilot Using Foundry Local
Introduction Building operations teams face a constant challenge: optimizing HVAC systems for energy efficiency while maintaining occupant comfort and air quality. Traditional building management systems display raw sensor data, temperatures, pressures, CO₂ levels—but translating this into actionable insights requires deep HVAC expertise. What if operators could simply ask "Why is the third floor so warm?" and get an intelligent answer grounded in real building state? This article demonstrates building a sample smart building digital twin with an AI-powered operations copilot, implemented using DigitalTwin, React, Three.js, and Microsoft Foundry Local. You'll learn how to architect physics-based simulators that model thermal dynamics, implement 3D visualizations of building systems, integrate natural language AI control, and design fault injection systems for testing and training. Whether you're building IoT platforms for commercial real estate, designing energy management systems, or implementing predictive maintenance for building automation, this sample provides proven patterns for intelligent facility operations. Why Digital Twins Matter for Building Operations Physical buildings generate enormous operational data but lack intelligent interpretation layers. A 50,000 square foot office building might have 500+ sensors streaming metrics every minute, zone temperatures, humidity levels, equipment runtimes, energy consumption. Traditional BMS (Building Management Systems) visualize this data as charts and gauges, but operators must manually correlate patterns, diagnose issues, and predict failures. Digital twins solve this through physics-based simulation coupled with AI interpretation. Instead of just displaying current temperature readings, a digital twin models thermal dynamics, heat transfer rates, HVAC response characteristics, occupancy impacts. When conditions deviate from expectations, the twin compares observed versus predicted states, identifying root causes. Layer AI on top, and operators get natural language explanations: "The conference room is 3 degrees too warm because the VAV damper is stuck at 40% open, reducing airflow by 60%." This application focuses on HVAC, the largest building energy consumer, typically 40-50% of total usage. Optimizing HVAC by just 10% through better controls can save thousands of dollars monthly while improving occupant satisfaction. The digital twin enables "what-if" scenarios before making changes: "What happens to energy consumption and comfort if we raise the cooling setpoint by 2 degrees during peak demand response events?" Architecture: Three-Tier Digital Twin System The application implements a clean three-tier architecture separating visualization, simulation, and state management: The frontend uses React with Three.js for 3D visualization. Users see an interactive 3D model of the three-floor building with color-coded zones indicating temperature and CO₂ levels. Click any equipment, AHUs, VAVs, chillers, to see detailed telemetry. The control panel enables adjusting setpoints, running simulation steps, and activating demand response scenarios. Real-time charts display KPIs: energy consumption, comfort compliance, air quality levels. The backend Node.js/Express server orchestrates simulation and state management. It maintains the digital twin state as JSON, the single source of truth for all equipment, zones, and telemetry. REST API endpoints handle control requests, simulation steps, and AI copilot queries. WebSocket connections push real-time updates to the frontend for live monitoring. The HVAC simulator implements physics-based models: 1R1C thermal models for zones, affinity laws for fan power, chiller COP calculations, CO₂ mass balance equations. Foundry Local provides AI copilot capabilities. The backend uses foundry-local-sdk to query locally running models. Natural language queries ("How's the lobby temperature?") get answered with building state context. The copilot can explain anomalies, suggest optimizations, and even execute commands when explicitly requested. Implementing Physics-Based HVAC Simulation Accurate simulation requires modeling actual HVAC physics. The simulator implements several established building energy models: // backend/src/simulator/thermal-model.js class ZoneThermalModel { // 1R1C (one resistance, one capacitance) thermal model static calculateTemperatureChange(zone, delta_t_seconds) { const C_thermal = zone.volume * 1.2 * 1000; // Heat capacity (J/K) const R_thermal = zone.r_value * zone.envelope_area; // Thermal resistance // Internal heat gains (occupancy, equipment, lighting) const Q_internal = zone.occupancy * 100 + // 100W per person zone.equipment_load + zone.lighting_load; // Cooling/heating from HVAC const airflow_kg_s = zone.vav.airflow_cfm * 0.0004719; // CFM to kg/s const c_p_air = 1006; // Specific heat of air (J/kg·K) const Q_hvac = airflow_kg_s * c_p_air * (zone.vav.supply_temp - zone.temperature); // Envelope losses const Q_envelope = (zone.outdoor_temp - zone.temperature) / R_thermal; // Net energy balance const Q_net = Q_internal + Q_hvac + Q_envelope; // Temperature change: Q = C * dT/dt const dT = (Q_net / C_thermal) * delta_t_seconds; return zone.temperature + dT; } } This model captures essential thermal dynamics while remaining computationally fast enough for real-time simulation. It accounts for internal heat generation from occupants and equipment, HVAC cooling/heating contributions, and heat loss through the building envelope. The CO₂ model uses mass balance equations: class AirQualityModel { static calculateCO2Change(zone, delta_t_seconds) { // CO₂ generation from occupants const G_co2 = zone.occupancy * 0.0052; // L/s per person at rest // Outdoor air ventilation rate const V_oa = zone.vav.outdoor_air_cfm * 0.000471947; // CFM to m³/s // CO₂ concentration difference (indoor - outdoor) const delta_CO2 = zone.co2_ppm - 400; // Outdoor ~400ppm // Mass balance: dC/dt = (G - V*ΔC) / Volume const dCO2_dt = (G_co2 - V_oa * delta_CO2) / zone.volume; return zone.co2_ppm + (dCO2_dt * delta_t_seconds); } } These models execute every simulation step, updating the entire building state: async function simulateStep(twin, timestep_minutes) { const delta_t = timestep_minutes * 60; // Convert to seconds // Update each zone for (const zone of twin.zones) { zone.temperature = ZoneThermalModel.calculateTemperatureChange(zone, delta_t); zone.co2_ppm = AirQualityModel.calculateCO2Change(zone, delta_t); } // Update equipment based on zone demands for (const vav of twin.vavs) { updateVAVOperation(vav, twin.zones); } for (const ahu of twin.ahus) { updateAHUOperation(ahu, twin.vavs); } updateChillerOperation(twin.chiller, twin.ahus); updateBoilerOperation(twin.boiler, twin.ahus); // Calculate system KPIs twin.kpis = calculateSystemKPIs(twin); // Detect alerts twin.alerts = detectAnomalies(twin); // Persist updated state await saveTwinState(twin); return twin; } 3D Visualization with React and Three.js The frontend renders an interactive 3D building view that updates in real-time as conditions change. Using React Three Fiber simplifies Three.js integration with React's component model: // frontend/src/components/BuildingView3D.jsx import { Canvas } from '@react-three/fiber'; import { OrbitControls } from '@react-three/drei'; export function BuildingView3D({ twinState }) { return ( {/* Render building floors */} {twinState.zones.map(zone => ( selectZone(zone.id)} /> ))} {/* Render equipment */} {twinState.ahus.map(ahu => ( ))} ); } function ZoneMesh({ zone, onClick }) { const color = getTemperatureColor(zone.temperature, zone.setpoint); return ( ); } function getTemperatureColor(current, setpoint) { const deviation = current - setpoint; if (Math.abs(deviation) < 1) return '#00ff00'; // Green: comfortable if (Math.abs(deviation) < 3) return '#ffff00'; // Yellow: acceptable return '#ff0000'; // Red: uncomfortable } This visualization immediately shows building state at a glance, operators see "hot spots" in red, comfortable zones in green, and can click any area for detailed metrics. Integrating AI Copilot for Natural Language Control The AI copilot transforms building data into conversational insights. Instead of navigating multiple screens, operators simply ask questions: // backend/src/routes/copilot.js import { FoundryLocalClient } from 'foundry-local-sdk'; const foundry = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); router.post('/api/copilot/chat', async (req, res) => { const { message } = req.body; // Load current building state const twin = await loadTwinState(); // Build context for AI const context = buildBuildingContext(twin); const completion = await foundry.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: `You are an HVAC operations assistant for a 3-floor office building. Current Building State: ${context} Answer questions about equipment status, comfort conditions, and energy usage. Provide specific, actionable information based on the current data. Do not speculate beyond provided information.` }, { role: 'user', content: message } ], temperature: 0.3, max_tokens: 300 }); res.json({ response: completion.choices[0].message.content, model: 'phi-4', timestamp: new Date().toISOString() }); }); function buildBuildingContext(twin) { const alerts = twin.alerts.filter(a => a.active); const zoneStatus = twin.zones.map(z => `${z.name}: ${z.temperature.toFixed(1)}°F (setpoint: ${z.setpoint}°F), CO₂: ${z.co2_ppm}ppm, ${z.occupancy} people` ); return ` Zones: ${zoneStatus.join('\n')} Active Alerts: ${alerts.length} ${alerts.map(a => `- ${a.severity}: ${a.message}`).join('\n')} Energy: ${twin.kpis.total_power_kw.toFixed(1)} kW Chiller: ${twin.chiller.percent_load}% load, ${twin.chiller.cop.toFixed(2)} COP `.trim(); } Example interactions: User: "Why is the third floor conference room warm?" AI: "The Executive Suite on Floor 3 is currently 75.2°F, 3 degrees above the 72°F setpoint. This is due to high occupancy (8 people) and VAV-F3-EXEC operating at reduced airflow (285 CFM vs 450 CFM capacity). AHU-2 is supplying adequate cool air at 55°F. Recommend checking VAV damper position—it may be partially stuck." User: "What's our current energy usage?" AI: "Total building power consumption is 47.3 kW. Breakdown: Chiller (24.1 kW, 51%), AHU fans (12.8 kW, 27%), Pumps (7.2 kW, 15%), Other (3.2 kW, 7%). Chiller is operating at 62% load with a COP of 4.8, which is efficient for current conditions." User: "Activate demand response level 2" AI: "I cannot directly control building systems for safety reasons. To activate DR Level 2, use the Control Panel and click 'Demand Response' → 'Level 2'. This will raise cooling setpoints by 3°F and reduce auxiliary loads, targeting 15% energy reduction." The AI provides grounded, specific answers citing actual equipment IDs and metrics. It refuses to directly execute control commands, instead guiding operators to explicit control interfaces, a critical safety pattern for building systems. Fault Injection for Testing and Training Real building operations experience equipment failures, stuck dampers, sensor drift, communication losses. The digital twin includes comprehensive fault injection capabilities to train operators and test control logic: // backend/src/simulator/fault-injector.js const FAULT_CATALOG = { chillerFailure: { description: 'Chiller compressor failure', apply: (twin) => { twin.chiller.status = 'FAULT'; twin.chiller.cooling_output = 0; twin.alerts.push({ id: 'chiller-fault', severity: 'CRITICAL', message: 'Chiller compressor failure - no cooling available', equipment: 'CHILLER-01' }); } }, stuckVAVDamper: { description: 'VAV damper stuck at current position', apply: (twin, vavId) => { const vav = twin.vavs.find(v => v.id === vavId); vav.damper_stuck = true; vav.damper_position_fixed = vav.damper_position; twin.alerts.push({ id: `vav-stuck-${vavId}`, severity: 'HIGH', message: `VAV ${vavId} damper stuck at ${vav.damper_position}%`, equipment: vavId }); } }, sensorDrift: { description: 'Temperature sensor reading 5°F high', apply: (twin, zoneId) => { const zone = twin.zones.find(z => z.id === zoneId); zone.sensor_drift = 5.0; zone.temperature_measured = zone.temperature_actual + 5.0; } }, communicationLoss: { description: 'Equipment communication timeout', apply: (twin, equipmentId) => { const equipment = findEquipmentById(twin, equipmentId); equipment.comm_status = 'OFFLINE'; equipment.stale_data = true; twin.alerts.push({ id: `comm-loss-${equipmentId}`, severity: 'MEDIUM', message: `Lost communication with ${equipmentId}`, equipment: equipmentId }); } } }; router.post('/api/twin/fault', async (req, res) => { const { faultType, targetEquipment } = req.body; const twin = await loadTwinState(); const fault = FAULT_CATALOG[faultType]; if (!fault) { return res.status(400).json({ error: 'Unknown fault type' }); } fault.apply(twin, targetEquipment); await saveTwinState(twin); res.json({ message: `Applied fault: ${fault.description}`, affectedEquipment: targetEquipment, timestamp: new Date().toISOString() }); }); Operators can inject faults to practice diagnosis and response. Training scenarios might include: "The chiller just failed during a heat wave, how do you maintain comfort?" or "Multiple VAV dampers are stuck, which zones need immediate attention?" Key Takeaways and Production Deployment Building a physics-based digital twin with AI capabilities requires balancing simulation accuracy with computational performance, providing intuitive visualization while maintaining technical depth, and enabling AI assistance without compromising safety. Key architectural lessons: Physics models enable prediction: Comparing predicted vs observed behavior identifies anomalies that simple thresholds miss 3D visualization improves spatial understanding: Operators immediately see which floors or zones need attention AI copilots accelerate diagnosis: Natural language queries get answers in seconds vs. minutes of manual data examination Fault injection validates readiness: Testing failure scenarios prepares operators for real incidents JSON state enables integration: Simple file-based state makes connecting to real BMS systems straightforward For production deployment, connect the twin to actual building systems via BACnet, Modbus, or MQTT integrations. Replace simulated telemetry with real sensor streams. Calibrate model parameters against historical building performance. Implement continuous learning where the twin's predictions improve as it observes actual building behavior. The complete implementation with simulation engine, 3D visualization, AI copilot, and fault injection system is available at github.com/leestott/DigitalTwin. Clone the repository and run the startup scripts to explore the digital twin, no building hardware required. Resources and Further Reading Smart Building HVAC Digital Twin Repository - Complete source code and simulation engine Setup and Quick Start Guide - Installation instructions and usage examples Microsoft Foundry Local Documentation - AI integration reference HVAC Simulation Documentation - Physics model details and calibration Three.js Documentation - 3D visualization framework ASHRAE Standards - Building energy modeling standardsAgents League: Join the Reasoning Agents Track
In a previous blog post, we introduced Agents League, a two‑week AI agent challenge running February 16–27, and gave an overview of the three available tracks. In this post, we’ll zoom in on one of them in particular:🧠 The Reasoning Agents track, built on Microsoft Foundry. If you’re interested in multi‑step reasoning, planning, verification, and multi‑agent collaboration, this is the track designed for you. What Do We Mean by “Reasoning Agents”? Reasoning agents go beyond simple prompt–response interactions. They are agents that can: Plan how to approach a task Break problems into steps Reason across intermediate results Verify or critique their own outputs Collaborate with other agents to solve more complex problems With Microsoft Foundry (via UI or SDK) and/or the Microsoft Agent Framework, you can design agent systems that reflect real‑world decision‑making patterns—closer to how teams of humans work together. Why Build Reasoning Agents on Microsoft Foundry? Microsoft Foundry provides production‑ready building blocks for agentic systems, without locking you into a single way of working. For the Reasoning Agents track, Foundry enables you to: Define agent roles (planner, executor, verifier, critic, etc.) Orchestrate multi‑agent workflows Integrate tools, APIs, and MCP servers Apply structured reasoning patterns Observe and debug agent behavior as it runs You can work visually in the Foundry UI, programmatically via the SDK, or mix both approaches depending on your project. How to get started? Your first step to enter the arena is registering to the Agents League challenge: https://aka.ms/agentsleague/register. After you registered, navigate to the Reasoning Agent Starter Kit, to get more context about the challenge scenario, an example of multi-agent architecture to address it, along with some guidelines on the tech stack to use and useful resources to get started. There’s no single “correct” project, feel free to unleash your creativity and leverage AI-assisted development tools to accelerate your build process (e.g. GitHub Copilot). 👉 View the Reasoning Agents starter kit: https://github.com/microsoft/agentsleague/starter-kits Live Coding Battle: Reasoning Agents 📽️ Wednesday, Feb 18 – 9:00 AM PT During Week 1, we’re hosting a live coding battle dedicated entirely to the Reasoning Agents track. You’ll watch experienced developers from the community: Design agent architectures live Explain reasoning strategies and trade‑offs Make real‑time decisions about agent roles, tools, and flows The session is streamed on Microsoft Reactor and recorded, so you can watch it live (highly recommended for the best experience!) or later at your convenience. AMA Session on Discord 💬 Wednesday, Feb 25 – 9:00 AM PT In Week 2, it’s your turn to build—and ask questions. Join the Reasoning Agents AMA on Discord to: Ask about agent architecture and reasoning patterns Get clarification on Foundry capabilities Discuss MCP integration and multi‑agent design Get unstuck when your agent doesn’t behave as expected Prizes, Badges, and Recognition 🏆 $500 for the Reasoning Agents track winner 🎖️ Digital badge for everyone who registers and submits a project Important reminder: 👉 You must register before submitting to be eligible for prizes and the badge. Beyond the rewards, every participant receives feedback from Microsoft product teams, which is often the most valuable prize of all. Ready to Build Agents That Reason? If you’ve been curious about: Agentic architectures Multi‑step reasoning Verification and self‑reflection Building AI systems that explain their thinking …then the Reasoning Agents track is your arena. 📝 Register here: https://aka.ms/agentsleague/register 💬 Join Discord: https://aka.ms/agentsleague/discord 📽️ Watch live battles: https://aka.ms/agentsleague/battles The league starts February 16. The reasoning begins now.Building Interactive Agent UIs with AG-UI and Microsoft Agent Framework
Introduction Picture this: You've built an AI agent that analyzes financial data. A user uploads a quarterly report and asks: "What are the top three expense categories?" Behind the scenes, your agent parses the spreadsheet, aggregates thousands of rows, and generates visualizations. All in 20 seconds. But the user? They see a loading spinner. Nothing else. No "reading file" message, no "analyzing data" indicator, no hint that progress is being made. They start wondering: Is it frozen? Should I refresh? The problem isn't the agent's capabilities - it's the communication gap between the agent running on the backend and the user interface. When agents perform multi-step reasoning, call external APIs, or execute complex tool chains, users deserve to see what's happening. They need streaming updates, intermediate results, and transparent progress indicators. Yet most agent frameworks force developers to choose between simple request/response patterns or building custom solutions to stream updates to their UIs. This is where AG-UI comes in. AG-UI is a fairly new event-based protocol that standardizes how agents communicate with user interfaces. Instead of every framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of structured events that work consistently across different agent implementations. When an agent starts processing, calls a tool, generates text, or encounters an error, the UI receives explicit, typed events in real time. The beauty of AG-UI is its framework-agnostic design. While this blog post demonstrates integration with Microsoft Agent Framework (MAF), the same AG-UI protocol works with LangGraph, CrewAI, or any other compliant framework. Write your UI code once, and it works with any AG-UI-compliant backend. (Note: MAF supports both Python and .NET - this blog post focuses on the Python implementation.) TL;DR The Problem: Users don't get real-time updates while AI agents work behind the scenes - no progress indicators, no transparency into tool calls, and no insight into what's happening. The Solution: AG-UI is an open, event-based protocol that standardizes real-time communication between AI agents and user interfaces. Instead of each development team and framework inventing custom streaming solutions, AG-UI provides a shared vocabulary of structured events (like TOOL_CALL_START, TEXT_MESSAGE_CONTENT, RUN_FINISHED) that work across any compliant framework. Key Benefits: Framework-agnostic - Write UI code once, works with LangGraph, Microsoft Agent Framework, CrewAI, and more Real-time observability - See exactly what your agent is doing as it happens Server-Sent Events - Built on standard HTTP for universal compatibility Protocol-managed state - No manual conversation history tracking In This Post: You'll learn why AG-UI exists, how it works, and build a complete working application using Microsoft Agent Framework with Python - from server setup to client implementation. What You'll Learn This blog post walks through: Why AG-UI exists - how agent-UI communication has evolved and what problems current approaches couldn't solve How the protocol works - the key design choices that make AG-UI simple, reliable, and framework-agnostic Protocol architecture - the generic components and how AG-UI integrates with agent frameworks Building an AG-UI application - a complete working example using Microsoft Agent Framework with server, client, and step-by-step setup Understanding events - what happens under the hood when your agent runs and how to observe it Thinking in events - how building with AG-UI differs from traditional APIs, and what benefits this brings Making the right choice - when AG-UI is the right fit for your project and when alternatives might be better Estimated reading time: 15 minutes Who this is for: Developers building AI agents who want to provide real-time feedback to users, and teams evaluating standardized approaches to agent-UI communication To appreciate why AG-UI matters, we need to understand the journey that led to its creation. Let's trace how agent-UI communication has evolved through three distinct phases. The Evolution of Agent-UI Communication AI agents have become more capable over time. As they evolved, the way they communicated with user interfaces had to evolve as well. Here's how this evolution unfolded. Phase 1: Simple Request/Response In the early days of AI agent development, the interaction model was straightforward: send a question, wait for an answer, display the result. This synchronous approach mirrored traditional API calls and worked fine for simple scenarios. # Simple, but limiting response = agent.run("What's the weather in Paris?") display(response) # User waits... and waits... Works for: Quick queries that complete in seconds, simple Q&A interactions where immediate feedback and interactivity aren't critical. Breaks down: When agents need to call multiple tools, perform multi-step reasoning, or process complex queries that take 30+ seconds. Users see nothing but a loading spinner, with no insight into what's happening or whether the agent is making progress. This creates a poor user experience and makes it impossible to show intermediate results or allow user intervention. Recognizing these limitations, development teams began experimenting with more sophisticated approaches. Phase 2: Custom Streaming Solutions As agents became more sophisticated, teams recognized the need for incremental feedback and interactivity. Rather than waiting for the complete response, they implemented custom streaming solutions to show partial results as they became available. # Every team invents their own format for chunk in agent.stream("What's the weather?"): display(chunk) # But what about tool calls? Errors? Progress? This was a step forward for building interactive agent UIs, but each team solved the problem differently. Also, different frameworks had incompatible approaches - some streamed only text tokens, others sent structured JSON, and most provided no visibility into critical events like tool calls or errors. The problem: No standardization across frameworks - client code that works with LangGraph won't work with Crew AI, requiring separate implementations for each agent backend Each implementation handles tool calls differently - some send nothing during tool execution, others send unstructured messages Complex state management - clients must track conversation history, manage reconnections, and handle edge cases manually The industry needed a better solution - a common protocol that could work across all frameworks while maintaining the benefits of streaming. Phase 3: Standardized Protocol (AG-UI) AG-UI emerged as a response to the fragmentation problem. Instead of each framework and development team inventing their own streaming solution, AG-UI provides a shared vocabulary of events that work consistently across different agent implementations. # Standardized events everyone understands async for event in agent.run_stream("What's the weather?"): if event.type == "TEXT_MESSAGE_CONTENT": display_text(event.delta) elif event.type == "TOOL_CALL_START": show_tool_indicator(event.tool_name) elif event.type == "TOOL_CALL_RESULT": show_tool_result(event.result) The key difference is structured observability. Rather than guessing what the agent is doing from unstructured text, clients receive explicit events for every stage of execution: when the agent starts, when it generates text, when it calls a tool, when that tool completes, and when the entire run finishes. What's different: A standardized vocabulary of event types, complete observability into agent execution, and framework-agnostic clients that work with any AG-UI-compliant backend. You write your UI code once, and it works whether the backend uses Microsoft Agent Framework, LangGraph, or any other framework that speaks AG-UI. Now that we've seen why AG-UI emerged and what problems it solves, let's examine the specific design decisions that make the protocol work. These choices weren't arbitrary - each one addresses concrete challenges in building reliable, observable agent-UI communication. The Design Decisions Behind AG-UI Why Server-Sent Events (SSE)? Aspect WebSockets SSE (AG-UI) Complexity Bidirectional Unidirectional (simpler) Firewall/Proxy Sometimes blocked Standard HTTP Reconnection Manual implementation Built-in browser support Use case Real-time games, chat Agent responses (one-way) For agent interactions, you typically only need server→client communication, making SSE a simpler choice. SSE solves the transport problem - how events travel from server to client. But once connected, how does the protocol handle conversation state across multiple interactions? Why Protocol-Managed Threads? # Without protocol threads (client manages): conversation_history = [] conversation_history.append({"role": "user", "content": message}) response = agent.complete(conversation_history) conversation_history.append({"role": "assistant", "content": response}) # Complex, error-prone, doesn't work with multiple clients # With AG-UI (protocol manages): thread = agent.get_new_thread() # Server creates and manages thread agent.run_stream(message, thread=thread) # Server maintains context # Simple, reliable, shareable across clients With transport and state management handled, the final piece is the actual messages flowing through the connection. What information should the protocol communicate, and how should it be structured? Why Standardized Event Types? Instead of parsing unstructured text, clients get typed events: RUN_STARTED - Agent begins (start loading UI) TEXT_MESSAGE_CONTENT - Text chunk (stream to user) TOOL_CALL_START - Tool invoked (show "searching...", "calculating...") TOOL_CALL_RESULT - Tool finished (show result, update UI) RUN_FINISHED - Complete (hide loading) This lets UIs react intelligently without custom parsing logic. Now that we understand the protocol's design choices, let's see how these pieces fit together in a complete system. Architecture Overview Here's how the components interact: The communication between these layers relies on a well-defined set of event types. Here are the core events that flow through the SSE connection: Core Event Types AG-UI provides a standardized set of event types to describe what's happening during an agent's execution: RUN_STARTED - agent begins execution TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END - streaming segments of text TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT - tool execution events RUN_FINISHED - agent has finished execution RUN_ERROR - error information This model lets the UI update as the agent runs, rather than waiting for the final response. The generic architecture above applies to any AG-UI implementation. Now let's see how this translates to Microsoft Agent Framework. AG-UI with Microsoft Agent Framework While AG-UI is framework-agnostic, this blog post demonstrates integration with Microsoft Agent Framework (MAF) using Python. MAF is available in both Python and .NET, giving you flexibility to build AG-UI applications in your preferred language. Understanding how MAF implements the protocol will help you build your own applications or work with other compliant frameworks. Integration Architecture The Microsoft Agent Framework integration involves several specialized layers that handle protocol translation and execution orchestration: Understanding each layer: FastAPI Endpoint - Handles HTTP requests and establishes SSE connections for streaming AgentFrameworkAgent - Protocol wrapper that translates between AG-UI events and Agent Framework operations Orchestrators - Manage execution flow, coordinate tool calling sequences, and handle state transitions ChatAgent - Your agent implementation with instructions, tools, and business logic ChatClient - Interface to the underlying language model (Azure OpenAI, OpenAI, or other providers) The good news? When you call add_agent_framework_fastapi_endpoint, all the middleware layers are configured automatically. You simply provide your ChatAgent, and the integration handles protocol translation, event streaming, and state management behind the scenes. Now that we understand both the protocol architecture and the Microsoft Agent Framework integration, let's build a working application. Hands-On: Building Your First AG-UI Application This section demonstrates how to build an AG-UI server and client using Microsoft Agent Framework and FastAPI. Prerequisites Before building your first AG-UI application, ensure you have: Python 3.10 or later installed Basic understanding of async/await patterns in Python Azure CLI installed and authenticated (az login) Azure OpenAI service endpoint and deployment configured (setup guide) Cognitive Services OpenAI Contributor role for your Azure OpenAI resource You'll also need to install the AG-UI integration package: pip install agent-framework-ag-ui --pre This automatically installs agent-framework-core, fastapi, and uvicorn as dependencies. With your environment configured, let's create the server that will host your agent and expose it via the AG-UI protocol. Building the Server Let's create a FastAPI server that hosts an AI agent and exposes it via AG-UI: # server.py import os from typing import Annotated from dotenv import load_dotenv from fastapi import FastAPI from pydantic import Field from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureOpenAIChatClient from agent_framework_ag_ui import add_agent_framework_fastapi_endpoint from azure.identity import DefaultAzureCredential # Load environment variables from .env file load_dotenv() # Validate environment configuration openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") model_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME") if not openai_endpoint: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_ENDPOINT") if not model_deployment: raise RuntimeError("Missing required environment variable: AZURE_OPENAI_DEPLOYMENT_NAME") # Define tools the agent can use @ai_function def get_order_status( order_id: Annotated[str, Field(description="The order ID to look up (e.g., ORD-001)")] ) -> dict: """Look up the status of a customer order. Returns order status, tracking number, and estimated delivery date. """ # Simulated order lookup orders = { "ORD-001": {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}, "ORD-002": {"status": "processing", "tracking": None, "eta": "Jan 23, 2026"}, "ORD-003": {"status": "delivered", "tracking": "1Z999AA3", "eta": "Delivered Jan 20"}, } return orders.get(order_id, {"status": "not_found", "message": "Order not found"}) # Initialize Azure OpenAI client chat_client = AzureOpenAIChatClient( credential=DefaultAzureCredential(), endpoint=openai_endpoint, deployment_name=model_deployment, ) # Configure the agent with custom instructions and tools agent = ChatAgent( name="CustomerSupportAgent", instructions="""You are a helpful customer support assistant. You have access to a get_order_status tool that can look up order information. IMPORTANT: When a user mentions an order ID (like ORD-001, ORD-002, etc.), you MUST call the get_order_status tool to retrieve the actual order details. Do NOT make up or guess order information. After calling get_order_status, provide the actual results to the user in a friendly format.""", chat_client=chat_client, tools=[get_order_status], ) # Initialize FastAPI application app = FastAPI( title="AG-UI Customer Support Server", description="Interactive AI agent server using AG-UI protocol with tool calling" ) # Mount the AG-UI endpoint add_agent_framework_fastapi_endpoint(app, agent, path="/chat") def main(): """Entry point for the AG-UI server.""" import uvicorn print("Starting AG-UI server on http://localhost:8000") uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info") # Run the application if __name__ == "__main__": main() What's happening here: We define a tool: get_order_status with the AI_function decorator Use Annotated and Field for parameter descriptions to help the agent understand when and how to use the tool We create an Azure OpenAI chat client with credential authentication The ChatAgent is configured with domain-specific instructions and the tools parameter add_agent_framework_fastapi_endpoint automatically handles SSE streaming and tool execution The server exposes the agent at the /chat endpoint Note: This example uses Azure OpenAI, but AG-UI works with any chat model. You can also integrate with Azure AI Foundry's model catalog or use other LLM providers. Tool calling is supported by most modern LLMs including GPT-4, GPT-4o, and Claude models. To run this server: # Set your Azure OpenAI credentials export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" # Start the server python server.py With your server running and exposing the AG-UI endpoint, the next step is building a client that can connect and consume the event stream. Streaming Results to Clients With the server running, clients can connect and stream events as the agent processes requests. Here's a Python client that demonstrates the streaming capabilities: # client.py import asyncio import os from dotenv import load_dotenv from agent_framework import ChatAgent, FunctionCallContent, FunctionResultContent from agent_framework_ag_ui import AGUIChatClient # Load environment variables from .env file load_dotenv() async def interactive_chat(): """Interactive chat session with streaming responses.""" # Connect to the AG-UI server base_url = os.getenv("AGUI_SERVER_URL", "http://localhost:8000/chat") print(f"Connecting to: {base_url}\n") # Initialize the AG-UI client client = AGUIChatClient(endpoint=base_url) # Create a local agent representation agent = ChatAgent(chat_client=client) # Start a new conversation thread conversation_thread = agent.get_new_thread() print("Chat started! Type 'exit' or 'quit' to end the session.\n") try: while True: # Collect user input user_message = input("You: ") # Handle empty input if not user_message.strip(): print("Please enter a message.\n") continue # Check for exit commands if user_message.lower() in ["exit", "quit", "bye"]: print("\nGoodbye!") break # Stream the agent's response print("Agent: ", end="", flush=True) # Track tool calls to avoid duplicate prints seen_tools = set() async for update in agent.run_stream(user_message, thread=conversation_thread): # Display text content if update.text: print(update.text, end="", flush=True) # Display tool calls and results for content in update.contents: if isinstance(content, FunctionCallContent): # Only print each tool call once if content.call_id not in seen_tools: seen_tools.add(content.call_id) print(f"\n[Calling tool: {content.name}]", flush=True) elif isinstance(content, FunctionResultContent): # Only print each result once result_id = f"result_{content.call_id}" if result_id not in seen_tools: seen_tools.add(result_id) result_text = content.result if isinstance(content.result, str) else str(content.result) print(f"[Tool result: {result_text}]", flush=True) print("\n") # New line after response completes except KeyboardInterrupt: print("\n\nChat interrupted by user.") except ConnectionError as e: print(f"\nConnection error: {e}") print("Make sure the server is running.") except Exception as e: print(f"\nUnexpected error: {e}") def main(): """Entry point for the AG-UI client.""" asyncio.run(interactive_chat()) if __name__ == "__main__": main() Key features: The client connects to the AG-UI endpoint using AGUIChatClient with the endpoint parameter run_stream() yields updates containing text and content as they arrive Tool calls are detected using FunctionCallContent and displayed with [Calling tool: ...] Tool results are detected using FunctionResultContent and displayed with [Tool result: ...] Deduplication logic (seen_tools set) prevents printing the same tool call multiple times as it streams Thread management maintains conversation context across messages Graceful error handling for connection issues To use the client: # Optional: specify custom server URL export AGUI_SERVER_URL="http://localhost:8000/chat" # Start the interactive chat python client.py Example Session: Connecting to: http://localhost:8000/chat Chat started! Type 'exit' or 'quit' to end the session. You: What's the status of order ORD-001? Agent: [Calling tool: get_order_status] [Tool result: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"}] Your order ORD-001 has been shipped! - Tracking Number: 1Z999AA1 - Estimated Delivery Date: January 25, 2026 You can use the tracking number to monitor the delivery progress. You: Can you check ORD-002? Agent: [Calling tool: get_order_status] [Tool result: {"status": "processing", "tracking": null, "eta": "Jan 23, 2026"}] Your order ORD-002 is currently being processed. - Status: Processing - Estimated Delivery: January 23, 2026 Your order should ship soon, and you'll receive a tracking number once it's on the way. You: exit Goodbye! The client we just built handles events at a high level, abstracting away the details. But what's actually flowing through that SSE connection? Let's peek under the hood. Event Types You'll See As the server streams back responses, clients receive a series of structured events. If you were to observe the raw SSE stream (e.g., using curl), you'd see events like: curl -N http://localhost:8000/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"messages": [{"role": "user", "content": "What'\''s the status of order ORD-001?"}]}' Sample event stream (with tool calling): data: {"type":"RUN_STARTED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} data: {"type":"TEXT_MESSAGE_START","messageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c","role":"assistant"} data: {"type":"TOOL_CALL_START","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","toolCallName":"get_order_status","parentMessageId":"e8648880-a9ff-4178-a17d-4a6d3ec3d39c"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"{\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"order"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"_id"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\":\""} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"ORD"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"-"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"001"} data: {"type":"TOOL_CALL_ARGS","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","delta":"\"}"} data: {"type":"TOOL_CALL_END","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y"} data: {"type":"TOOL_CALL_RESULT","messageId":"f048cb0a-a049-4a51-9403-a05e4820438a","toolCallId":"call_GTWj2N3ZyYiiQIjg3fwmiQ8y","content":"{\"status\": \"shipped\", \"tracking\": \"1Z999AA1\", \"eta\": \"Jan 25, 2026\"}","role":"tool"} data: {"type":"TEXT_MESSAGE_START","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","role":"assistant"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"Your"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" order"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" ORD"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"-"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"001"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" has"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" been"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":" shipped"} data: {"type":"TEXT_MESSAGE_CONTENT","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf","delta":"!"} ... (additional TEXT_MESSAGE_CONTENT events streaming the response) ... data: {"type":"TEXT_MESSAGE_END","messageId":"8215fc88-8cb6-4ce4-8bdb-a8715dcd26cf"} data: {"type":"RUN_FINISHED","threadId":"eb4d9850-14ef-446c-af4b-23037acda9e8","runId":"chatcmpl-xyz"} Understanding the flow: RUN_STARTED - Agent begins processing the request TEXT_MESSAGE_START - First message starts (will contain tool calls) TOOL_CALL_START - Agent invokes the get_order_status tool Multiple TOOL_CALL_ARGS events - Arguments stream incrementally as JSON chunks ({"order_id":"ORD-001"}) TOOL_CALL_END - Tool invocation structure complete TOOL_CALL_RESULT - Tool execution finished with result data TEXT_MESSAGE_START - Second message starts (the final response) Multiple TEXT_MESSAGE_CONTENT events - Response text streams word-by-word TEXT_MESSAGE_END - Response message complete RUN_FINISHED - Entire run completed successfully This granular event model enables rich UI experiences - showing tool execution indicators ("Searching...", "Calculating..."), displaying intermediate results, and providing complete transparency into the agent's reasoning process. Seeing the raw events helps, but truly working with AG-UI requires a shift in how you think about agent interactions. Let's explore this conceptual change. The Mental Model Shift Traditional API Thinking # Imperative: Call and wait response = agent.run("What's 2+2?") print(response) # "The answer is 4" Mental model: Function call with return value AG-UI Thinking # Reactive: Subscribe to events async for event in agent.run_stream("What's 2+2?"): match event.type: case "RUN_STARTED": show_loading() case "TEXT_MESSAGE_CONTENT": display_chunk(event.delta) case "RUN_FINISHED": hide_loading() Mental model: Observable stream of events This shift feels similar to: Moving from synchronous to async code Moving from REST to event-driven architecture Moving from polling to pub/sub This mental shift isn't just philosophical - it unlocks concrete benefits that weren't possible with request/response patterns. What You Gain Observability # You can SEE what the agent is doing TOOL_CALL_START: "get_order_status" TOOL_CALL_ARGS: {"order_id": "ORD-001"} TOOL_CALL_RESULT: {"status": "shipped", "tracking": "1Z999AA1", "eta": "Jan 25, 2026"} TEXT_MESSAGE_START: "Your order ORD-001 has been shipped..." Interruptibility # Future: Cancel long-running operations async for event in agent.run_stream(query): if user_clicked_cancel: await agent.cancel(thread_id, run_id) break Transparency # Users see the reasoning process "Looking up order ORD-001..." "Order found: Status is 'shipped'" "Retrieving tracking information..." "Your order has been shipped with tracking number 1Z999AA1..." To put these benefits in context, here's how AG-UI compares to traditional approaches across key dimensions: AG-UI vs. Traditional Approaches Aspect Traditional REST Custom Streaming AG-UI Connection Model Request/Response Varies Server-Sent Events State Management Manual Manual Protocol-managed Tool Calling Invisible Custom format Standardized events Framework Varies Framework-locked Framework-agnostic Browser Support Universal Varies Universal Implementation Simple Complex Moderate Ecosystem N/A Isolated Growing You've now seen AG-UI's design principles, implementation details, and conceptual foundations. But the most important question remains: should you actually use it? Conclusion: Is AG-UI Right for Your Project? AG-UI represents a shift toward standardized, observable agent interactions. Before adopting it, understand where the protocol stands and whether it fits your needs. Protocol Maturity The protocol is stable enough for production use but still evolving: Ready now: Core specification stable, Microsoft Agent Framework integration available, FastAPI/Python implementation mature, basic streaming and threading work reliably. Choose AG-UI If You Building new agent projects - No legacy API to maintain, want future compatibility with emerging ecosystem Need streaming observability - Multi-step workflows where users benefit from seeing each stage of execution Want framework flexibility - Same client code works with any AG-UI-compliant backend Comfortable with evolving standards - Can adapt to protocol changes as it matures Stick with Alternatives If You Have working solutions - Custom streaming working well, migration cost not justified Need guaranteed stability - Mission-critical systems where breaking changes are unacceptable Build simple agents - Single-step request/response without tool calling or streaming needs Risk-averse environment - Large existing implementations where proven approaches are required Beyond individual project decisions, it's worth considering AG-UI's role in the broader ecosystem. The Bigger Picture While this blog post focused on Microsoft Agent Framework, AG-UI's true power lies in its broader mission: creating a common language for agent-UI communication across the entire ecosystem. As more frameworks adopt it, the real value emerges: write your UI once, work with any compliant agent framework. Think of it like GraphQL for APIs or OpenAPI for REST - a standardization layer that benefits the entire ecosystem. The protocol is young, but the problem it solves is real. Whether you adopt it now or wait for broader adoption, understanding AG-UI helps you make informed architectural decisions for your agent applications. Ready to dive deeper? Here are the official resources to continue your AG-UI journey. Resources AG-UI & Microsoft Agent Framework Getting Started with AG-UI (Microsoft Learn) - Official tutorial AG-UI Integration Overview - Architecture and concepts AG-UI Protocol Specification - Official protocol documentation Backend Tool Rendering - Adding function tools Security Considerations - Production security guidance Microsoft Agent Framework Documentation - Framework overview AG-UI Dojo Examples - Live demonstrations UI Components & Integration CopilotKit for Microsoft Agent Framework - React component library Community & Support Microsoft Q&A - Community support Agent Framework GitHub - Source code and issues Related Technologies Azure AI Foundry Documentation - Azure AI platform FastAPI Documentation - Web framework Server-Sent Events (SSE) Specification - Protocol standard This blog post introduces AG-UI with Microsoft Agent Framework, focusing on fundamental concepts and building your first interactive agent application.