<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>rss.livelink.threads-in-node</title>
    <link>https://techcommunity.microsoft.com/t5/education-sector/ct-p/EducationSector</link>
    <description>rss.livelink.threads-in-node</description>
    <pubDate>Thu, 18 Jun 2026 02:43:06 GMT</pubDate>
    <dc:creator>EducationSector</dc:creator>
    <dc:date>2026-06-18T02:43:06Z</dc:date>
    <item>
      <title>What's new in Assignments: interactive lessons, localized content, and smarter AI tools</title>
      <link>https://techcommunity.microsoft.com/t5/education-blog/what-s-new-in-assignments-interactive-lessons-localized-content/ba-p/4528326</link>
      <description>&lt;P data-line="2"&gt;Assignments in Microsoft Teams and through the Microsoft 365 LTI app has shipped a set of updates that connect interactive learning content directly into the assignment workflow, improve content quality across English-speaking regions, and give educators more control over AI-generated output.&lt;/P&gt;
&lt;P data-line="4"&gt;All of these features are available now for every Microsoft Education license (A1, A3, A5) at no extra cost.&lt;/P&gt;
&lt;P data-line="6"&gt;Watch a quick walkthrough of all the updates:&amp;nbsp;&lt;A href="https://www.youtube.com/watch?v=nlR8pwKar90" target="_blank" rel="noopener" data-href="https://www.youtube.com/watch?v=nlR8pwKar90"&gt;What's new in Assignments&lt;/A&gt;&lt;/P&gt;
&lt;H2 data-line="6"&gt;Learning Zone interactive lessons in Assignments&lt;/H2&gt;
&lt;P data-line="12"&gt;Educators using a Copilot+ PC can turn any teaching idea into an interactive lesson — with slides, quizzes, and activities — using the Learning Zone app. Those lessons now connect directly into the assignment workflow.&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="14"&gt;When creating an assignment, educators see a new Learning Zone resource option. Pick a lesson, and it's attached to the assignment. Students complete the interactive lesson right inside their assignment in Teams or through the LMS, without leaving. When they finish, scores sync automatically — the student's completion percentage becomes their assignment grade.&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="16"&gt;A few things to know:&lt;/P&gt;
&lt;UL data-line="18"&gt;
&lt;LI data-line="18"&gt;You need a Copilot+ PC to create lessons. Students can complete them on any device.&lt;/LI&gt;
&lt;LI data-line="19"&gt;If you don't have any lessons yet, the picker will link you to download the Learning Zone app.&lt;/LI&gt;
&lt;LI data-line="20"&gt;Students can also open the lesson in the Learning Zone app on Windows if they prefer.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="22"&gt;&lt;A class="lia-external-url" href="https://support.microsoft.com/topic/-add-a-learning-zone-lesson-to-an-assignment-334b6013-f9e1-4eef-b209-362487543e36" target="_blank" rel="noopener" data-href="https://support.microsoft.com/en-us/topic/add-a-learning-zone-lesson-to-an-assignment"&gt;Learn more: Add a Learning Zone lesson to an assignment&lt;/A&gt;&lt;/P&gt;
&lt;H2 data-line="22"&gt;Student feedback view update&lt;/H2&gt;
&lt;P data-line="28"&gt;We've updated how students see feedback on returned assignments. The view now reorganizes to put what matters most front and center:&lt;/P&gt;
&lt;UL data-line="30"&gt;
&lt;LI data-line="30"&gt;&lt;STRONG&gt;Feedback, feedback resources, rubric, and grades&lt;/STRONG&gt;&amp;nbsp;take the main space.&lt;/LI&gt;
&lt;LI data-line="31"&gt;&lt;STRONG&gt;Instructions and resources&lt;/STRONG&gt;&amp;nbsp;collapse into side sections — still accessible, just not competing for attention.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="33"&gt;Before this change, students saw everything at once — feedback mixed in with instructions, attachments, and rubrics. Many students struggled to find the feedback you spent time writing. Now it's the first thing they see when they open a returned assignment.&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="35"&gt;This is already active for everyone. Nothing to turn on.&lt;/P&gt;
&lt;H2 data-line="35"&gt;Standards alignment across Instructions and Rubrics&lt;/H2&gt;
&lt;P data-line="41"&gt;Educators can now add educational standards directly to their assignment. When you use AI to enhance your instructions, it takes those standards into account. And when you generate a Rubric for the same assignment, it pulls in those same standards as input — so your instructions and rubric stay aligned without extra work.&lt;/P&gt;
&lt;P data-line="43"&gt;We've also made a few other improvements to AI Instructions:&lt;/P&gt;
&lt;UL data-line="45"&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;Describe changes to tweak the output.&lt;/STRONG&gt;&amp;nbsp;After the AI generates a suggestion, you can describe what you want changed in a text field and re-generate. Want it shorter? More scaffolded? Focused on a specific concept? Type what you'd like changed and get an updated version — no need to start over.&lt;/LI&gt;
&lt;LI data-line="46"&gt;&lt;STRONG&gt;Shorter, more focused suggestions.&lt;/STRONG&gt;&amp;nbsp;When you use Add Detail, Add Sparkle, Add Hints, or other enhancements, the output stays closer to what you actually need rather than over-expanding your instructions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P data-line="48"&gt;&lt;A href="https://techcommunity.microsoft.com/blog/educationblog/more-standards-are-coming-to-the-teach-module-and-teams-for-education/4504916" target="_blank" rel="noopener" data-href="https://techcommunity.microsoft.com/blog/educationblog/more-standards-are-coming-to-the-teach-module-and-teams-for-education/4504916"&gt;Learn more: Standards in Assignments and the Teach module&lt;/A&gt;&lt;/P&gt;
&lt;H2 data-line="48"&gt;English locale support across Assignments and the Teach module in Microsoft 365 Copilot&lt;/H2&gt;
&lt;P data-line="54"&gt;AI features across both Assignments and the Teach module in Microsoft 365 Copilot now support English (UK), English (Canada), and English (Australia) in addition to English (US).&lt;/P&gt;
&lt;P data-line="56"&gt;Language is automatically selected based on your browser language settings. If your browser is set to en-GB, the AI generates content using British English. If you've used Teach before, it defaults to your previously used language. You can switch anytime from the Language dropdown when creating content.&lt;/P&gt;
&lt;P data-line="58"&gt;We've also updated the age/year selection to match what's appropriate for each locale — so you'll see "Year 9" instead of "Grade 9" when using English (UK), for example.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-line="58"&gt;Resources&lt;/H2&gt;
&lt;UL data-line="62"&gt;
&lt;LI data-line="62"&gt;&lt;A href="https://www.youtube.com/watch?v=nlR8pwKar90" target="_blank" rel="noopener" data-href="https://www.youtube.com/watch?v=nlR8pwKar90"&gt;Video walkthrough: What's new in Assignments&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="63"&gt;&lt;A href="https://support.microsoft.com/en-us/topic/getting-started-with-microsoft-learning-zone-ff2fc08f-b3a0-43b7-823c-5d04516baa5e" target="_blank" rel="noopener" data-href="https://support.microsoft.com/en-us/topic/getting-started-with-microsoft-learning-zone-ff2fc08f-b3a0-43b7-823c-5d04516baa5e"&gt;Getting started with Learning Zone&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="64"&gt;&lt;A class="lia-external-url" href="https://support.microsoft.com/topic/-add-a-learning-zone-lesson-to-an-assignment-334b6013-f9e1-4eef-b209-362487543e36" target="_blank" rel="noopener" data-href="https://support.microsoft.com/en-us/topic/add-a-learning-zone-lesson-to-an-assignment"&gt;Add a Learning Zone lesson to an assignment&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="65"&gt;&lt;A href="https://techcommunity.microsoft.com/blog/educationblog/more-standards-are-coming-to-the-teach-module-and-teams-for-education/4504916" target="_blank" rel="noopener" data-href="https://techcommunity.microsoft.com/blog/educationblog/more-standards-are-coming-to-the-teach-module-and-teams-for-education/4504916"&gt;Standards in Assignments and Teach&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="66"&gt;&lt;A href="https://support.microsoft.com/education/assignments/set-ai-guidelines-assignment-microsoft-teams" target="_blank" rel="noopener" data-href="https://support.microsoft.com/education/assignments/set-ai-guidelines-assignment-microsoft-teams"&gt;Set Student AI Guidelines on an assignment&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="67"&gt;&lt;A href="https://support.microsoft.com/en-us/topic/create-an-assignment-in-microsoft-teams-23c128d0-ec34-4691-9511-661fba8599be" target="_blank" rel="noopener" data-href="https://support.microsoft.com/en-us/topic/create-an-assignment-in-microsoft-teams-23c128d0-ec34-4691-9511-661fba8599be"&gt;Create an assignment in Microsoft Teams&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 17 Jun 2026 14:59:25 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/education-blog/what-s-new-in-assignments-interactive-lessons-localized-content/ba-p/4528326</guid>
      <dc:creator>Leif Brenne</dc:creator>
      <dc:date>2026-06-17T14:59:25Z</dc:date>
    </item>
    <item>
      <title>Detecting Python Vulnerabilities with GraphCodeBERT</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/detecting-python-vulnerabilities-with-graphcodebert/ba-p/4517909</link>
      <description>&lt;P&gt;We are nine software engineering students at the Egyptian Chinese University in Cairo. When we got our project brief, we noticed a gap that bothered us: Python is the most widely used language in AI development, yet almost every security tool out there was built for C and C++. The tools that do exist for Python rely on regex pattern matching — a technique that has not changed meaningfully in years.&lt;/P&gt;
&lt;P&gt;So we built one ourselves.&lt;/P&gt;
&lt;P&gt;We called it Code Security Identifier — CSI. Instead of matching patterns like existing tools, CSI understands code structure. We split the work across nine people, each owning a specific piece of the system: dataset engineering, model architecture, loss function design, adversarial training, hyperparameter optimization, and deployment. None of us had built a production security tool before. By the end, we had one running.&lt;/P&gt;
&lt;P&gt;This post documents what we built, the decisions we made, the things that did not work, and what we learned.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;div data-video-id="https://www.youtube.com/watch?v=eL-wnn-UGhY/1781530172939" data-video-remote-vid="https://www.youtube.com/watch?v=eL-wnn-UGhY/1781530172939" class="lia-video-container lia-media-is-center lia-media-size-large"&gt;&lt;iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FeL-wnn-UGhY%3Ffeature%3Doembed&amp;amp;display_name=YouTube&amp;amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DeL-wnn-UGhY&amp;amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FeL-wnn-UGhY%2Fhqdefault.jpg&amp;amp;type=text%2Fhtml&amp;amp;schema=youtube" allowfullscreen="" style="max-width: 100%"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://github.com/a-elhaag/code-security-identifier" target="_blank" rel="noopener"&gt;GitHub Repo&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The Problem: Python Security Is Underserved&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Python powers 70% of AI workloads and 45% of enterprise backends. As AI-assisted code generation becomes standard practice, the volume of Python code being written and deployed is growing faster than any team can manually audit. GitHub Copilot, ChatGPT, and similar tools generate thousands of lines of Python daily. Much of it is never reviewed for security.&lt;/P&gt;
&lt;P&gt;The tools that exist were not built for this reality. Bandit, the industry standard for Python static analysis, uses regex pattern matching. Its F1 score on vulnerability detection is approximately 0.62. That means for every 100 real vulnerabilities in a Python codebase, Bandit catches 62 and misses 38. In a production system handling user data, financial transactions, or infrastructure commands, those 38 missed vulnerabilities are exploitable.&lt;/P&gt;
&lt;P&gt;The deeper problem is architectural. Regex-based tools flag code that looks suspicious based on token patterns. They cannot trace how data flows through a program. They cannot reason about whether an untrusted input reaches a dangerous execution point. They catch obvious cases and miss subtle ones — which are exactly the cases that matter most in real-world exploits.&lt;/P&gt;
&lt;P&gt;We set out to build something better.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why Token-Only Models Also Fail&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The first generation of deep learning approaches to vulnerability detection treated source code the same way NLP models treat text: as a sequence of tokens. Models like CodeBERT learn statistical co-occurrences. They learn that SELECT, WHERE, and execute appear together. They learn that os.system and subprocess appear near command-like strings. These patterns are suspicious. But suspicion is not detection.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;What "Token Co-Occurrence" Actually Means&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;To be concrete: a token model doesn't read code, it reads a flat sequence of sub-word units, the same way it would read a sentence. It has no built-in notion of "this token is a function argument," "this token is the return value of that call," or "this variable was assigned three lines up and is now being used here." Everything is positional and statistical. During pretraining, the model learns that certain tokens tend to appear near other tokens — execute tends to follow strings that look like SQL, eval tends to appear near user-controlled-looking variable names, os.system tends to sit close to subprocess or shell=True. These are real correlations in code, and they give the model &lt;EM&gt;some&lt;/EM&gt; signal. But a correlation between tokens is not the same as understanding what the code actually does with those tokens.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;The Causal Chain a Vulnerability Actually Is&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A real SQL injection vulnerability is not a collection of SQL-adjacent tokens. It is a causal chain: an untrusted value enters through a function parameter, passes through one or more assignments and string operations, and reaches a database execution call without sanitization. Concretely, that chain might look like: user_id arrives as a request parameter → it gets assigned to a local variable → that variable is interpolated into an f-string → the f-string is passed as the queryargument to cursor.execute(). Each of these steps, on its own, is completely unremarkable Python. Assigning a variable is not dangerous. Building an f-string is not dangerous. Calling execute() is not dangerous. The vulnerability exists &lt;EM&gt;only&lt;/EM&gt; in the connection between these steps — specifically, in the fact that an untrusted value reaches a sensitive sink without anything sanitizing it along the way.&lt;/P&gt;
&lt;P&gt;Token models cannot see this chain. They see the tokens at each step — user_id, f"...", cursor.execute — and they may even have learned that this combination of tokens is statistically associated with vulnerable code. But "statistically associated with" is not the same as "I can trace that this specific value, from this specific source, reaches this specific sink." The model has no mechanism for following a variable across lines, across function boundaries, or through transformations. It is reasoning about &lt;EM&gt;which words appear&lt;/EM&gt;, not &lt;EM&gt;what happens to the data&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Two Failure Modes, Same Root Cause&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This single limitation — no data-flow reasoning — produces two distinct failure modes, and both are costly in a real security context.&lt;/P&gt;
&lt;P&gt;The first is &lt;STRONG&gt;false positives on safe code&lt;/STRONG&gt;. Plenty of legitimate, secure code uses tokens that a token model has learned to associate with danger. A function that builds a SQL query using parameterized queries (the &lt;EM&gt;correct&lt;/EM&gt;, safe way to do it) still contains tokens like query, execute, and variable names that look like user input — because they often &lt;EM&gt;are&lt;/EM&gt; user input, just handled safely via placeholders and bound parameters instead of string concatenation. A token model, lacking the ability to distinguish "this value is interpolated directly into the query string" from "this value is passed as a separate, escaped parameter," may flag both patterns identically. In practice, this is exactly the kind of false positive that erodes trust in a security tool — if a scanner flags safe, well-written code as vulnerable often enough, developers start ignoring its output entirely, which defeats the purpose of having it.&lt;/P&gt;
&lt;P&gt;The second, more dangerous failure mode is &lt;STRONG&gt;false negatives on real injections that don't match the training distribution&lt;/STRONG&gt;. The token-level patterns a model learns during pretraining are necessarily a reflection of the &lt;EM&gt;kinds&lt;/EM&gt; of vulnerable code that were common in its training data — typical variable names like user_input, query, cmd, typical function calls like os.system, eval, cursor.execute. But real-world code, especially AI-generated code, doesn't always follow these conventions. A variable might be named x, payload, data_5, or something entirely project-specific. A dangerous sink might be wrapped in a thin custom helper function with an unfamiliar name that itself calls subprocess.run three layers down. If the &lt;EM&gt;surface tokens&lt;/EM&gt; don't match what the model has seen before, but the &lt;EM&gt;underlying data-flow path&lt;/EM&gt; — untrusted input to dangerous sink — is identical to a thousand vulnerabilities the model has seen, a token model has no way to recognize that. It missed the pattern not because the vulnerability is novel, but because it was only ever looking at the wrong thing: the words, not the wiring between them.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Why This Matters for CSI's Design&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Both failure modes point to the same underlying gap: vulnerability detection is fundamentally a question about the &lt;EM&gt;paths data takes through a program&lt;/EM&gt;, not about which words appear in the program's source. A model that wants to close this gap needs access to something a flat token sequence cannot provide — an explicit representation of how data flows from one point in the code to another, independent of what the variables along that path happen to be named. This is exactly the gap GraphCodeBERT's data-flow graph is designed to fill, and it's the reason we built CSI around it rather than around a purely token-based model like CodeBERT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Our Foundation: GraphCodeBERT&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Microsoft's GraphCodeBERT addresses the structural blindness of token models by parsing source code into three complementary graph representations and attending over all three simultaneously during pretraining.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Abstract Syntax Tree (AST)&lt;/EM&gt; The AST captures the syntactic structure of code: how functions are defined, how expressions compose, how variables are declared relative to their scope. It gives the model a hierarchical view of code that token sequences cannot provide.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Data Flow Graph (DFG)&lt;/EM&gt; The DFG is the critical representation for vulnerability detection. It traces how values propagate through a program: which variables receive which values, how those values are transformed, and where they ultimately flow. For an injection vulnerability, the DFG makes the taint path explicit: user_id → query → db.execute(). This is the path that token models cannot see.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Control Flow Graph (CFG)&lt;/EM&gt; The CFG maps which code paths execute under which conditions. It captures branch logic, loops, and exception handlers — the execution context that determines whether a tainted value can actually reach a dangerous sink in practice.&lt;/P&gt;
&lt;P&gt;Together, these three representations give GraphCodeBERT a structural understanding of code that enables meaningful vulnerability detection. For a SQL injection, it traces the full semantic chain from untrusted input through concatenation to database execution. Token models see three words at that point. GraphCodeBERT sees a taint flow.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Making It Trainable: LoRA Parameter Efficiency&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;GraphCodeBERT has 125 million parameters. Full fine-tuning on a domain-specific dataset at this scale requires significant GPU memory, long training times, and a dataset large enough to update 125M parameters meaningfully without overfitting. We had approximately 4,000 training samples and access to Google Colab.&lt;/P&gt;
&lt;P&gt;We applied LoRA (Low-Rank Adaptation), a parameter-efficient fine-tuning technique that injects small trainable adapter matrices into the query, key, and value projection layers of each transformer attention block while keeping all backbone weights frozen. The adapter for a weight matrix W is parameterized as two low-rank matrices B and A, where the effective weight update is W + (α/r) × BA. With rank r=16 and scaling factor α=32, the number of trainable parameters drops from 124M to 2.07M — 0.24% of the full model.&lt;/P&gt;
&lt;P&gt;This is not a compromise on performance. The LoRA constraint actively prevents overfitting on small datasets by limiting the effective model capacity. Our CSI-GCB model achieved F1 = 0.7012 after 30 training epochs, with validation F1 improving monotonically across all epochs — no overfitting, no degradation. The parameter-efficient constraint was a feature, not a limitation.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 55.555556%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Base model parameters&lt;/td&gt;&lt;td&gt;124M&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Trainable parameters (LoRA)&lt;/td&gt;&lt;td&gt;2.07M (0.24%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;LoRA rank (r)&lt;/td&gt;&lt;td&gt;16&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;LoRA scaling factor (α)&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Training epochs&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Optimizer&lt;/td&gt;&lt;td&gt;AdamW, lr=2e-5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Best validation F1&lt;/td&gt;&lt;td&gt;0.7012&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Building the Dataset: Three Real-World Sources&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We could not use existing C/C++ vulnerability datasets. Cross-language transfer from C/C++ to Python is problematic: graph structures differ, tokenization differs, and the vulnerability patterns that dominate C/C++ (buffer overflows, memory corruption) are largely irrelevant in Python. We needed Python-native training data.&lt;/P&gt;
&lt;P&gt;We unified approximately 4,000 deduplicated Python functions from three complementary real-world sources.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Source 1: AI-Generated Vulnerable Code (121 records)&lt;/EM&gt; A curated dataset of AI-generated Python functions, each labeled with its CWE identifier. Every record pairs a natural-language prompt with the insecure Python function produced by the AI model, covering 68 unique CWE types. This source directly targets the threat model motivating CSI: AI-assisted code generation introducing security vulnerabilities that no one audits.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Source 2: GitHub Security Commits (2,173 records)&lt;/EM&gt; Commit-level vulnerability pairs extracted from real GitHub security fix commits. Pre-patch function = vulnerable, post-patch function = safe. Labels verified using GPT-4 at approximately 94% accuracy — compared to 40–51% accuracy for automated commit-only labeling strategies. Our GPT-4 verification step was essential for training signal quality.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Source 3: Raw GitHub Diff Files (~300 records)&lt;/EM&gt; Approximately 300 raw GitHub diff records across seven vulnerability types: XSS, SQL injection, command injection, open redirect, path disclosure, RCE, and XSRF. Incorporated as augmentation for underrepresented CWE categories.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;One Critical Insight: Commit-Stratified Splitting&lt;/EM&gt; Random train/test splits leak information when applied to commit-level data. Functions from the same Git commit share context: the same bug fix, the same coding style, the same changeset patterns. Published research shows this inflates F1 scores by up to 40 percentage points. Our solution: entire commits assigned to a single partition. No commit ever split across train and test.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The Preprocessing Pipeline&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Six stages transform raw data into model-ready tensors.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 1 — Parse and Unify:&lt;/STRONG&gt; Each source normalized into a unified schema: source code, binary label, CWE identifier, provenance tag.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 2 — Label Encoding:&lt;/STRONG&gt; CWE-to-integer mapping constructed. Categories with fewer than 5 samples discarded.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 3 — Negative Sampling:&lt;/STRONG&gt; Safe samples drawn from post-patch functions and CodeSearchNet. Target ratio: 1:1 vulnerable-to-safe, correcting the natural 8:1 imbalance.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 4 — Class Weighting:&lt;/STRONG&gt; Per-class weights via scikit-learn compute_class_weight. Positive weight pos_weight = n_neg/n_pos for the binary head.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 5 — Mid-Truncation Tokenization:&lt;/STRONG&gt; Max 512 tokens. First 128 (function signature, entry logic) + last 384 (return statements, taint sinks) retained. Standard head truncation discards function tails — exactly where SQL and command injection sinks most commonly appear.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 6 — Commit-Stratified Split:&lt;/STRONG&gt; 70/15/15 train/validation/test. All functions from the same commit in the same partition.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Data Augmentation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We applied two augmentation techniques to the training set.&lt;/P&gt;
&lt;P&gt;Variable and function name normalization replaces all identifier tokens with abstract symbolic tokens (VAR_1, FUNC_1, etc.), adopted from the DetectVul preprocessing strategy. A SQL injection through a variable named user_input and one through a variable named x are the same vulnerability. The model should treat them identically.&lt;/P&gt;
&lt;P&gt;Dead-code insertion and minor refactoring variants of each vulnerable function were generated to increase intra-class diversity. This was motivated by a known failure mode in GNN-based detectors: models trained to distinguish vulnerable code from its fixed version perform near-randomly because security patches introduce minimal code differences. Increasing intra-class diversity forces the model to learn structural patterns rather than diff signatures.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The Architecture: Dual-Encoder Fusion&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The architecture decision was one of the first major forks in the project, and it shaped almost everything that came after it. Early on, we had to decide: do we build one encoder that does everything, or do we combine two encoders that each bring something the other lacks? We went with the second option, but getting there — and then getting the two halves to actually work together — took several iterations.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Why Two Encoders At All&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The case for a single encoder is simplicity: one model, one set of weights to fine-tune, fewer moving parts to debug. But GraphCodeBERT and VulBERTa are good at fundamentally different things. GraphCodeBERT understands &lt;EM&gt;structure&lt;/EM&gt; — how data moves through a function, how control flow branches, how an AST is shaped. VulBERTa understands &lt;EM&gt;vulnerability semantics&lt;/EM&gt; — it was pre-trained exclusively on NVD entries and CVE-linked code, so it has effectively memorized what dangerous code idioms look like: unsanitized input patterns, risky function calls, structures that resemble known CVEs.&lt;/P&gt;
&lt;P&gt;A function can be structurally unremarkable — a simple, shallow control flow, nothing exotic in its data flow graph — and still be dangerous because of &lt;EM&gt;what&lt;/EM&gt; it does with a specific input. Conversely, a function can have a complex data flow graph and still be perfectly safe. Structure alone doesn't tell you "this looks like a CVE I've seen before," and vulnerability-pattern memorization alone doesn't tell you "this input actually reaches this sink." We wanted both signals available to the classification heads simultaneously, which meant running both encoders on every input and combining their outputs — rather than picking one.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Encoder 1: GraphCodeBERT + LoRA&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The first encoder is GraphCodeBERT, adapted with LoRA as described earlier. For every input function, GraphCodeBERT processes two things at once: the tokenized source code itself, and the data-flow graph edges extracted from the function's AST. Internally, its attention layers attend across both — a token can attend not just to nearby tokens in the sequence, but to other tokens it has a data-flow relationship with, even if they're far apart in the raw text. This is what lets the model "see" that a variable assigned on line 3 flows into a database call on line 40, even though those two lines are nowhere near each other as tokens.&lt;/P&gt;
&lt;P&gt;The LoRA adapters sit on the query, key, and value projection matrices of every attention layer in this encoder. Everything else in GraphCodeBERT is frozen. After the full forward pass, we take the per-token output and apply mean pooling across the sequence dimension — averaging every token's final representation into a single vector. The result is a 768-dimensional embedding that we think of as the &lt;EM&gt;structural&lt;/EM&gt; signal: it encodes how this specific function is built, how its data moves, and how its control flow is organized.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Encoder 2: VulBERTa (Frozen)&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The second encoder is VulBERTa, used as a fixed feature extractor. Unlike GraphCodeBERT, VulBERTa receives no adapters and no gradient updates at all — its weights are exactly as they came from pretraining on NVD and CVE-linked code. We made this choice deliberately: VulBERTa's value to us is precisely the vulnerability-domain knowledge baked into its pretraining, and fine-tuning it on our comparatively small dataset risked overwriting that knowledge faster than it could learn anything useful from 4,000 samples — a classic catastrophic forgetting problem.&lt;/P&gt;
&lt;P&gt;For every input function, VulBERTa runs its own tokenization (it uses RoBERTa's BPE tokenizer, separate from GraphCodeBERT's tokenizer — these are two different views of the same source code, tokenized differently) and produces a sequence of hidden states. Rather than mean pooling, we take the CLS token's final representation — the standard approach for classification-style embeddings in BERT-family models — giving us a second 768-dimensional embedding. We think of this as the &lt;EM&gt;vulnerability-domain&lt;/EM&gt; signal: it encodes how similar this function "feels" to the vulnerable and CVE-linked code VulBERTa was trained on, independent of the function's own internal structure.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Fusion Layer: Combining Two 768-Dimensional Views&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;At this point we have two 768-dimensional vectors describing the same input function from two different angles. The fusion layer's job is to combine them into a single representation that downstream classification heads can use.&lt;/P&gt;
&lt;P&gt;The simplest possible approach — and the one we settled on — is concatenation: stack the two 768-dimensional vectors end to end into a single 1,536-dimensional vector. We considered alternatives (element-wise addition, learned gating, cross-attention between the two embeddings) but concatenation has one major advantage: it loses no information. Addition or gating require the two vectors to already be in a compatible space, which they aren't — they come from different models with different pretraining objectives. Concatenation defers that reconciliation to a layer that's actually trained for it.&lt;/P&gt;
&lt;P&gt;That reconciliation happens in the FusionProjectionLayer immediately after concatenation: a Linear layer projects the 1,536-dimensional concatenated vector back down to 768 dimensions, followed by LayerNorm, a GELU activation, and Dropout. This is the layer that actually learns how to weigh and combine the structural signal from GraphCodeBERT against the vulnerability-domain signal from VulBERTa — effectively learning, per-feature, how much to trust each encoder's contribution. The output is a single 768-dimensional fused representation that both downstream heads consume.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Classification Heads: Two Tasks, One Shared Representation&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The fused representation feeds two separate heads, trained jointly.&lt;/P&gt;
&lt;P&gt;The &lt;STRONG&gt;Binary Head&lt;/STRONG&gt; is intentionally minimal: a single Linear(1536 → 1) layer followed by a sigmoid, producing a vulnerable/safe probability. We kept this head simple because the binary task is, relatively speaking, the easier of the two — most of the discriminative work needed for "is this vulnerable at all" is already present in the fused representation, and adding more layers here mainly risked overfitting on a task that didn't need it.&lt;/P&gt;
&lt;P&gt;The &lt;STRONG&gt;CWE Head&lt;/STRONG&gt; is deliberately deeper: Linear(1536 → 384) → GELU → Dropout → Linear(384 → 8). Classifying &lt;EM&gt;which&lt;/EM&gt; of seven CWE categories a vulnerability belongs to (plus an eighth "unknown" class) is a harder, more fine-grained task than the binary one — it requires distinguishing between vulnerability types that can share a lot of surface-level structure (an XSS and a command injection can look superficially similar in terms of "untrusted input flows somewhere dangerous," but the &lt;EM&gt;kind&lt;/EM&gt; of dangerous matters for classification). The extra hidden layer gives the head room to learn these finer distinctions from the same shared representation, without needing a separate encoder pass.&lt;/P&gt;
&lt;P&gt;One detail that mattered in practice: for safe samples, the CWE label is set to −1 and masked out of the CWE loss entirely. A safe function has no CWE to predict, and including it in the CWE loss with some placeholder label would inject noise into a head that's already working with a smaller, more imbalanced label space than the binary head. Masking keeps the CWE head's gradient signal coming only from samples where a CWE label is actually meaningful.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Where the Two-Head Design Came From&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This two-head structure wasn't the original plan — early versions of the architecture experimented with a single combined output space (CWE categories plus an explicit "safe" class, predicted by one head). We moved to separate binary and CWE heads after running into a familiar problem: a single combined classifier tends to behave like a generalist binary detector with poor sensitivity to specific weakness types, because the "safe" class dominates the label distribution and pulls the decision boundary toward itself. Splitting the binary detection objective from the CWE classification objective let each head specialize — one for the broad "is this dangerous" question, one for the fine-grained "what kind of dangerous" question — while still sharing the same upstream encoders and fusion layer, so neither head requires its own separate feature extraction.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Training Strategy: Composite Loss and Adversarial Training&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Getting a model to F1 = 0.7012 on a 4,000-sample dataset with eight unevenly distributed classes is not a single-loss problem. A model trained with plain cross-entropy on this data converges quickly to predicting the dominant classes and essentially ignores rare CWE categories — the validation F1 looks acceptable on paper while the model is functionally blind to the vulnerability types that matter most. We addressed this by combining four loss components, each solving a different failure mode we hit during early experiments.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Focal Loss: Fixing the Class Imbalance Problem&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Our CWE distribution is heavily skewed — some categories have hundreds of examples, others barely clear the five-sample minimum from Stage 2 of preprocessing. With standard cross-entropy, the gradient signal from the dominant classes drowns out the rare ones, and the model learns to be "confidently correct" on easy majority-class examples while never improving on the hard minority-class examples.&lt;/P&gt;
&lt;P&gt;Focal Loss adds a modulating term, (1 − p_t)^γ, to the standard cross-entropy loss. When the model is already confident and correct on an example (p_t close to 1), this term shrinks toward zero and the loss contribution from that example is suppressed. When the model is wrong or uncertain (p_t low), the term stays close to 1 and the full loss applies. In practice, this redirects the gradient budget toward the examples the model is actually struggling with — which, in our case, were almost always the underrepresented CWE categories. We ran an ablation over γ ∈ {1, 2, 3} to find the value that best balanced this trade-off without destabilizing training on the majority classes.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;SCL-CVD: Making the Embedding Space Class-Aware&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Focal Loss fixes the gradient imbalance, but it doesn't directly address a separate problem: two functions with the same CWE label can look extremely different at the token level, while two functions with different CWE labels can look superficially similar (a few lines of diff apart). Without an explicit signal to organize the embedding space, the classification head has to do all the work of separating classes from a representation that wasn't built with that goal in mind.&lt;/P&gt;
&lt;P&gt;Supervised Contrastive Learning for Code Vulnerability Detection (SCL-CVD) adds a second objective directly on the embeddings, before the classification heads. For every anchor sample in a batch, it pulls embeddings from the same CWE class closer together in representation space and pushes embeddings from different classes apart, using a temperature parameter (tau) to control how sharply similarity is weighted. The result is an embedding space where same-class functions cluster together even if their surface code differs substantially, and where the classification head's decision boundaries become easier to learn because the classes are already partially separated upstream.&lt;/P&gt;
&lt;P&gt;This was one of the more iteration-heavy components of the project. We ran a direct SCL vs. no-SCL F1 comparison to confirm the contrastive objective was actually helping (it was), then separately tuned the temperature tau and the SCL loss weight alpha — two parameters that interact with each other and with the rest of the composite loss in non-obvious ways. Too high a weight on SCL and the model over-prioritizes embedding geometry at the expense of classification accuracy; too low and the contrastive signal gets lost in the noise of the other three losses.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;R-Drop: Consistency Under Dropout&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Dropout is applied during training for regularization, but it introduces a subtle problem: the same input, passed through the model twice, can produce noticeably different output distributions depending on which neurons happen to be dropped each time. For a model that needs to make confident, stable predictions about whether a specific line of code is vulnerable, this stochasticity is undesirable — it means the model's "opinion" about a given function can shift run to run without any change to the input.&lt;/P&gt;
&lt;P&gt;R-Drop addresses this directly. Each training sample is passed through the model twice, with two independent dropout masks, producing two output distributions. The loss adds a KL divergence term between these two distributions, on top of the standard task loss. This forces the model to produce consistent predictions regardless of which dropout mask is active — effectively training the model to be robust to its own regularization noise. We tested this on small batches first to confirm the KL term behaved as expected (it shouldn't dominate the loss or collapse the distributions to a degenerate point) before integrating it into the full training loop.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;EDAT: Adversarial Robustness with Syntactic Guarantees&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The fourth component, Embedding-Disturbed Adversarial Training (EDAT), was the most involved to build, and the one that touched the most hands on the team.&lt;/P&gt;
&lt;P&gt;The motivation is straightforward: vulnerability detection models are often brittle to small, semantically meaningless changes in code — renaming a variable, adding a comment, reordering independent statements. A model that flips its prediction because of a cosmetic change isn't actually reasoning about the vulnerability; it's keying on superficial patterns. EDAT trains against this by generating adversarial perturbations in the model's embedding space using Projected Gradient Descent (PGD) — small, gradient-directed nudges to the embedding designed to push the model toward a wrong prediction — and then training the model to be robust to those nudges via an adversarial KL loss between the clean and perturbed predictions.&lt;/P&gt;
&lt;P&gt;The catch is that perturbations applied carelessly in embedding space can correspond to nothing — there's no guarantee that a perturbed embedding still maps back to anything resembling valid Python. That's where AST constraint checking comes in: before a perturbed sample is accepted into the adversarial training loop, it's checked against AST-level constraints to ensure the perturbation corresponds to a syntactically valid transformation, not an out-of-distribution artifact the model could "cheat" against.&lt;/P&gt;
&lt;P&gt;Building this pipeline required several distinct pieces working together: tree-sitter-based identifier extraction to identify which tokens in a function are safe to perturb without breaking syntax, the AST constraint checker itself, the PGD perturbation loop, the adversarial KL loss term, and a tunable epsilon controlling the perturbation magnitude. Each of these had to be validated independently on small batches before being wired into the full training run, because a bug in any one component (a perturbation that's too large, an AST check that's too permissive, an epsilon that's miscalibrated) can silently degrade training without throwing an error — the model still trains, it just gets worse, and that's much harder to debug than a crash.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Who Built What&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This was genuinely distributed work, and each piece depended on the one before it. Hend Elhout built the tree-sitter identifier extraction that underpins EDAT — the foundation everything else in the adversarial pipeline depends on. Jomana Mekheimar implemented the AST constraint checking on top of that, and later ran the full EDAT training and the EDAT vs. no-EDAT F1 comparison that validated the whole approach was worth the added complexity. Menna Reda built the PGD perturbation loop and the adversarial KL loss, and ran the small-batch tests that caught early calibration issues. Youstina Adel tuned the epsilon parameter and did the final integration of EDAT into the main training loop. Separately, Farida Hassan implemented and tested both SCL-CVD and R-Drop, and later ran the full GraphCodeBERT training run that produced our final F1 = 0.7012 result.&lt;/P&gt;
&lt;P&gt;The composite loss that resulted from all of this — Focal Loss, SCL-CVD, R-Drop, and EDAT, combined and weighted together — was the single largest factor separating our early-epoch results (F1 ≈ 0.54) from the final 0.7012. No individual component alone got us there; it was the combination, tuned iteratively, that did.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Results&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 62.962963%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Method&lt;/th&gt;&lt;th&gt;Language&lt;/th&gt;&lt;th&gt;F1&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Bandit&lt;/td&gt;&lt;td&gt;Regex&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;~0.62&lt;/td&gt;&lt;td&gt;Industry standard&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DetectVul&lt;/td&gt;&lt;td&gt;Dual-BERT, full fine-tune&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;0.7447&lt;/td&gt;&lt;td&gt;Prior SOTA&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CSI-GCB&lt;/td&gt;&lt;td&gt;GraphCodeBERT + LoRA&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;0.7012&lt;/td&gt;&lt;td&gt;0.24% of params trained&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CSI-Dual&lt;/td&gt;&lt;td&gt;GCB + VulBERTa fusion&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;0.6630&lt;/td&gt;&lt;td&gt;Faster early convergence&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;CSI-GCB outperforms Bandit by 8 percentage points — roughly 13 additional vulnerabilities caught per 100. The single-encoder LoRA model outperformed the dual-encoder fusion model by 3.82 points. CSI-Dual showed faster early convergence (F1 = 0.6099 at epoch 1 vs ~0.541 for CSI-GCB) but plateaued earlier because frozen VulBERTa could not adapt to our seven-class CWE taxonomy. At ~4,000 training samples, the fusion layer's complexity outweighs its gains. This is a data scale problem, not an architecture problem.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why This Matters&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;For security teams: a tool that traces taint flows rather than flagging token patterns, catching injections that Bandit misses entirely.&lt;/P&gt;
&lt;P&gt;For ML practitioners: validation that LoRA is viable for production security tasks. 0.24% of parameters updated, F1 = 0.7012. Meaningful security tooling without large GPU infrastructure.&lt;/P&gt;
&lt;P&gt;For researchers: a reproducible Python-specific multi-task vulnerability detection baseline demonstrating parameter-efficient single-encoder fine-tuning can approach full fine-tuning performance at significantly lower compute cost.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 50.37037%; height: 177px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Result&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Training Time&lt;/td&gt;&lt;td&gt;~30 epochs on A100&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Trainable Parameters&lt;/td&gt;&lt;td&gt;2.07M (0.24%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;F1 Score&lt;/td&gt;&lt;td&gt;0.7012&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Building the Dataset: Three Sources, One Pipeline&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We unified approximately 4,000 deduplicated Python functions from three real-world sources:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;AI-Generated Code&lt;/STRONG&gt; (121 records): GPT-assisted Python functions labeled with CWE identifiers, spanning 68 unique CWE types. Primary training signal for the CWE classification head.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;GitHub Security Commits&lt;/STRONG&gt; (2,173 records): Real commit-level vulnerability pairs where pre-patch = vulnerable and post-patch = safe, verified by GPT-4 at ~94% label accuracy — vs. 40–51% for automated commit-only labeling.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Raw Diff Files&lt;/STRONG&gt; (~300 records): GitHub diffs across seven vulnerability types: XSS, SQL injection, command injection, open redirect, path disclosure, RCE, and XSRF. Used as augmentation for underrepresented CWE categories.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;One Critical Insight: Commit-Stratified Evaluation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Random data splits leak information. Functions in the same Git commit share context. If the model sees part of a commit during training and the rest during testing, it learns commit-specific signatures — metrics inflate by up to 40 percentage points. Solution: entire commits are assigned to either train or test, never split across partitions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The Architecture: Dual-Encoder Fusion&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;CSI runs two encoders in parallel on every input:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;GraphCodeBERT + LoRA&lt;/STRONG&gt;: captures AST structure, data flow between variables, and token relationships. Outputs a 768-dimensional embedding.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;VulBERTa&lt;/STRONG&gt; (frozen): a RoBERTa model pre-trained exclusively on NVD entries and CVE-linked code. Captures dangerous code idioms, unsanitized input patterns, and similarity to known vulnerable code. Outputs a 768-dimensional embedding.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Both embeddings are concatenated into a 1,536-dimensional vector, passed through a FusionProjectionLayer (Linear → LayerNorm → GELU → Dropout), then routed to two task heads: a binary head (Linear 1536→1) predicting vulnerable vs. safe, and a CWE head (Linear 1536→384 → GELU → Dropout → Linear 384→8) classifying across seven CWE categories plus unknown.&lt;/P&gt;
&lt;P&gt;Training used a composite loss: Focal Loss for class imbalance, SCL-CVD for intra-class compactness, R-Drop for output consistency, and EDAT adversarial perturbation on embeddings with AST constraint checking to preserve program semantics.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Results: Outperforming Baselines&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 94.259259%; height: 245px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Language&lt;/th&gt;&lt;th&gt;Method&lt;/th&gt;&lt;th&gt;F1&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Bandit&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;Regex&lt;/td&gt;&lt;td&gt;~0.62&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DetectVul&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;Dual-BERT&lt;/td&gt;&lt;td&gt;0.7447&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CSI-GCB&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;GraphCodeBERT+LoRA&lt;/td&gt;&lt;td&gt;0.7012&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CSI-Dual&lt;/td&gt;&lt;td&gt;Python&lt;/td&gt;&lt;td&gt;GCB+VulBERTa&lt;/td&gt;&lt;td&gt;0.6630&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;CSI-GCB outperforms Bandit by 8 percentage points on Python. The single-encoder LoRA approach outperforms the dual-encoder fusion under current data scale — dual-encoder architectures require larger corpora to realize their complementary representation advantage.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why This Matters&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Security teams get a tool that understands code semantics, not surface patterns. ML practitioners see validation of parameter-efficient fine-tuning (LoRA) on a real security task. Microsoft's GraphCodeBERT + PEFT ecosystem proves viable for production Python security tooling.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Meet the Team&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;CSI was built by a 9-person team at the Egyptian Chinese University.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 1053px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Member&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Tasks Owned&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Specific Contributions&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;LinkedIn&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Anas Abuelhaag&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Project lead, architecture, training infrastructure&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;GitHub repo setup, Drive infrastructure, CWE classification head design, checkpoint save/load, validation loop (F1/precision/recall), SCL integration into training loop, overall system architecture&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/anaselhaag/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Sohaila Tamer&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Graph extraction, SCL tuning, VulBERTa go/no-go&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AST/CFG/DFG graph extraction, SCL vs no-SCL F1 comparison, temperature tau tuning, SCL weight alpha tuning, results logging, VulBERTa go/no-go decision&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/sohaila-tamer/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Farida Hassan&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Tokenization, loss functions, full training run&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;GraphCodeBERT tokenization cache, checkpoint co-development, validation loop co-development, supervised contrastive loss implementation and testing, R-Drop KL divergence loss, full GraphCodeBERT training run&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/farida-hassan-ali/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Hend Elhout&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;EDAT identifier extraction, Streamlit UI&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Tree-sitter identifier extraction for EDAT, Streamlit line highlighting display, 7 fix suggestion texts, example code snippets, loading spinner, edge case handling, VulBERTa fusion layer implementation&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/hend-elhout-32253b313/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Jomana Mekheimar&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AST constraints, hyperparameter search, VulBERTa tokenization&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;EDAT AST constraint checking, full EDAT training run, EDAT vs no-EDAT F1 comparison, LoRA rank ablation (4 vs 8 vs 16), focal loss gamma ablation (1 vs 2 vs 3), VulBERTa BPE tokenization&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/jomana-mekheimar/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Menna Reda&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;PGD adversarial training, VulBERTa training&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;EDAT PGD perturbation loop, adversarial KL loss, small-batch EDAT testing, VulBERTa dual-encoder training run&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/menna-reda-146834319/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Youstina Adel&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Epsilon tuning, VulBERTa LoRA&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;EDAT epsilon tuning, EDAT integration into training loop, VulBERTa LoRA adapter implementation&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/youstina-adel-7b3bbb251/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;MennatAllah Amr&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Hyperparameter optimization, VulBERTa forward pass&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Batch size ablation (4 vs 8), learning rate ablation (1e-5, 2e-5, 5e-5), hyperparameter results compilation and best config selection, VulBERTa dual-input forward pass update&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/mennatallah-amr-5643322b7/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Hesham Elshimy&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Streamlit app, deployment, demo&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Streamlit UI layout design, hardcoded data testing, app.py with code input, 4 metric cards, CWE description section, 7 CWE descriptions, model download integration, model connection, safe/vulnerable code testing, demo video production&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.linkedin.com/in/hesham-elshimy-03b989266/" target="_blank" rel="noopener"&gt;LinkedIn Profile&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 182px" /&gt;&lt;col style="width: 233px" /&gt;&lt;col style="width: 374px" /&gt;&lt;col style="width: 264px" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;REFERENCES&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;[1] PyVul Team, "PyVul: A Real-World Python Vulnerability Benchmark with LLM-Assisted Data Cleansing," arXiv:2404.15687, 2024.&lt;/P&gt;
&lt;P&gt;[2] Y. Chen et al., "DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning-Based Vulnerability Detection," in Proc. ACM&lt;/P&gt;
&lt;P&gt;RAID, 2023, pp. 1-16.&lt;/P&gt;
&lt;P&gt;[3] H. Husain et al., "CodeSearchNet Challenge: Evaluating the State of Semantic Code Search," arXiv:1909.09436, 2019.&lt;/P&gt;
&lt;P&gt;[4] Y. Feng et al., "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," in Proc. EMNLP Findings, 2020.&lt;/P&gt;
&lt;P&gt;[5] M. T. Tran et al., "DetectVul: Statement-Level Python Vulnerability Detection Using Dual-BERT," Future Generation Computer&lt;/P&gt;
&lt;P&gt;Systems, 2025.&lt;/P&gt;
&lt;P&gt;[6] R. Mussabayev, "Structure-Aware Code Vulnerability Analysis with Graph Neural Networks," arXiv:2307.11454, 2023.&lt;/P&gt;
&lt;P&gt;[7] Anonymous, "From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection," arXiv:2408.02329, 2024.&lt;/P&gt;
&lt;P&gt;[8] H. Hanif and S. Maffeis, "VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection," in Proc. IJCNN, 2022.&lt;/P&gt;
&lt;P&gt;[9] J. Liu et al., "Vul-LMGNNs: Vulnerability Detection by Fusing Language Models and Online-Distilled Graph Neural Networks,"&lt;/P&gt;
&lt;P&gt;arXiv:2404.14719, 2025.&lt;/P&gt;
&lt;P&gt;[10] X. Wen et al., "AMPLE: Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning,"&lt;/P&gt;
&lt;P&gt;arXiv:2302.04675, 2023.&lt;/P&gt;
&lt;P&gt;[11] L. Peng et al., "ANGEL: Accurate Vulnerability Detection for Large Code Graphs," arXiv:2412.10164, 2024.&lt;/P&gt;
&lt;P&gt;[12] J. de Kraker, H. Vranken, and A. Hommersom, "MultiGLICE: GNNs with Program Slicing for Multiclass Vulnerability Detection,"&lt;/P&gt;
&lt;P&gt;Computers, vol. 14, no. 3, p. 98, 2025.&lt;/P&gt;
&lt;P&gt;[13] Anonymous, "Vignat: Vulnerability Identification by Learning Code Semantics via Graph Attention Networks," arXiv:2310.20067,&lt;/P&gt;
&lt;P&gt;2023.&lt;/P&gt;
&lt;P&gt;[14] Anonymous, "Enhancing Vulnerability Detection Using Code Property Graphs and CNNs," in Proc. ACM CCS Workshop, 2023.&lt;/P&gt;
&lt;P&gt;[15] Y. Hu et al., "Interpreters for GNN-Based Vulnerability Detection: Are We There Yet?," in Proc. ACM ISSTA, 2023.&lt;/P&gt;
&lt;P&gt;[16] K. Wartschinski et al., "VUDENC: Vulnerability Detection with Deep Learning for a Large Codebase in Python," Computers, vol. 14,&lt;/P&gt;
&lt;OL start="2025"&gt;
&lt;LI&gt;3, 2025.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;[17] C. Liang et al., "Source Code Vulnerability Analysis Based on Deep Learning: A Survey," Computers &amp;amp; Security, vol. 148, 2025.&lt;/P&gt;
&lt;P&gt;[18] Y. Hu et al., "SoK: Automated Vulnerability Repair, Methods, Tools, and Assessments," arXiv:2506.11697, 2025.&lt;/P&gt;
&lt;P&gt;[19] G. Bhandari, P. Gavric, and A. Shalaginov, "PatchLM: Generating Vulnerability Security Fixes with Code Language Models,"&lt;/P&gt;
&lt;P&gt;Information and Software Technology, vol. 185, 2025.&lt;/P&gt;
&lt;P&gt;[20] D. Guo et al., “GraphCodeBERT: Pre-Training Code Representations with Data Flow,” in &lt;EM&gt;Proc. International Conference on Learning&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Representations (ICLR)&lt;/EM&gt;, 2021.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2026 07:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/detecting-python-vulnerabilities-with-graphcodebert/ba-p/4517909</guid>
      <dc:creator>a_elhaag</dc:creator>
      <dc:date>2026-06-16T07:00:00Z</dc:date>
    </item>
    <item>
      <title>Building ShadowQuest: A Multi-Agent RPG</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-shadowquest-a-multi-agent-rpg/ba-p/4528055</link>
      <description>&lt;P class="lia-align-justify"&gt;Artificial Intelligence is rapidly evolving beyond traditional chatbots. Today, developers are building intelligent systems where multiple AI agents collaborate, retrieve knowledge, and solve problems together. Microsoft's Agents League Hackathon provided the perfect opportunity to explore this new approach through the Reasoning Agents challenge.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;For this challenge, I built ShadowQuest, a fantasy role-playing game (RPG) powered by Microsoft Foundry, Foundry IQ, Azure AI Search, GPT-4.1, and GitHub Copilot. The project demonstrates how specialized AI agents can work together while using Retrieval-Augmented Generation (RAG) to deliver accurate and context-aware responses.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;About the Challenge&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;&lt;A class="lia-external-url" href="https://info.microsoft.com/Agents-League-Hackathon-Registration.html" target="_blank"&gt;Microsoft Agents League &lt;/A&gt;is a global developer challenge designed to encourage developers to build intelligent AI applications using Microsoft's latest AI technologies. Participants could choose from three tracks: Creative Apps, Reasoning Agents, and Enterprise Agents.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;I selected the Reasoning Agents track because I wanted to explore how multiple AI agents could collaborate instead of relying on a single large language model. Another important requirement for this year's challenge was integrating at least one Microsoft Intelligence Layer. For ShadowQuest, I chose Foundry IQ as the project's intelligence layer.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;The Idea Behind ShadowQuest&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;Fantasy RPGs are built around storytelling, exploration, and collaboration between different characters. Every character usually has a unique role, whether it's a warrior protecting the team, a mage interpreting magical knowledge, or a rogue discovering hidden paths.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;I wanted to recreate this experience using AI.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Instead of building one AI assistant responsible for everything, I designed a system where multiple specialized agents collaborate to create a richer and more immersive adventure.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;ShadowQuest is set in a fantasy world filled with magical artifacts, forgotten kingdoms, mysterious locations, and story-driven quests. Players can ask questions about the world, explore different locations, and learn about the game's lore through conversations with AI agents.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Building the Multi-Agent Architecture&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The architecture follows a simple but scalable design.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;At the center of the system is the &lt;STRONG&gt;Game Master Agent&lt;/STRONG&gt;, which acts as the orchestrator. Every player interaction starts with the Game Master. It receives the player's request, determines what information is needed, retrieves additional knowledge when required, and generates the final response.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Supporting the Game Master are three specialized agents:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;STRONG&gt;Warrior Agent&lt;/STRONG&gt; – Focuses on combat strategy and tactical decisions.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Mage Agent&lt;/STRONG&gt; – Provides magical knowledge, world lore, and information about ancient artifacts.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Rogue Agent&lt;/STRONG&gt; – Specializes in exploration, investigation, and discovering hidden information.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;Each agent has a clearly defined responsibility, making the system easier to understand, maintain, and extend in the future.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Using Foundry IQ as the Knowledge Layer&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;One of the most important parts of the project was integrating &lt;STRONG&gt;Foundry IQ&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Instead of storing every piece of game information inside prompts, I created a dedicated knowledge base containing information about characters, magical artifacts, locations, quests, and the history of the ShadowQuest world.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;This approach separates &lt;STRONG&gt;knowledge&lt;/STRONG&gt; from &lt;STRONG&gt;reasoning&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Whenever a player asks a question, the Game Master Agent first retrieves relevant information from the knowledge base before generating a response. This ensures that answers remain consistent with the game's world while reducing hallucinations.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Foundry IQ became the central source of truth for the entire project, making it easy to manage and expand the game world without constantly modifying prompts.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Azure AI Search and Retrieval-Augmented Generation&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;To enable intelligent retrieval, I connected Foundry IQ with &lt;STRONG&gt;Azure AI Search&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The RPG documents were indexed, and vector embeddings were generated using Microsoft's embedding models. This enables semantic search, allowing the system to understand the meaning behind a player's question instead of relying only on keyword matching.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;For example, if a player asks about a magical relic without mentioning its exact name, Azure AI Search can still retrieve the correct information based on semantic similarity.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The complete workflow looks like this:&lt;/P&gt;
&lt;OL class="lia-align-justify"&gt;
&lt;LI&gt;The player submits a question.&lt;/LI&gt;
&lt;LI&gt;The Game Master Agent receives the request.&lt;/LI&gt;
&lt;LI&gt;Foundry IQ queries Azure AI Search.&lt;/LI&gt;
&lt;LI&gt;Relevant documents are retrieved.&lt;/LI&gt;
&lt;LI&gt;GPT-4.1 generates a grounded response using the retrieved context.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="lia-align-justify"&gt;This Retrieval-Augmented Generation (RAG) approach significantly improves the quality and reliability of responses.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Accelerating Development with GitHub Copilot&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;GitHub Copilot played an important role throughout the development process.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;It helped generate Python classes, improve documentation, create helper functions, and speed up repetitive coding tasks.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;During the live demonstration, I also showed how Copilot could quickly generate a new &lt;STRONG&gt;Healer Agent&lt;/STRONG&gt;, demonstrating how AI-assisted development makes it easier to extend a multi-agent application while maintaining a consistent architecture.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Rather than replacing the developer, Copilot acted as an intelligent coding assistant, allowing me to focus more on architecture and design decisions.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Demonstrating ShadowQuest&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;During the Microsoft Agents League Reasoning Agents Battle, I demonstrated the Game Master Agent by asking questions about the ShadowQuest world, magical artifacts, and game lore.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;One of the most interesting parts of the demonstration was observing the retrieval process.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Before generating a response, the Game Master Agent called the knowledge retrieval function through Foundry IQ. This confirmed that the system was retrieving relevant information from the indexed knowledge base rather than relying only on GPT-4.1's internal knowledge.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;This demonstrated how RAG can create more grounded, reliable, and context-aware AI experiences.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Lessons Learned&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;Building ShadowQuest taught me that designing multi-agent systems is as much about architecture as it is about AI models.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Clearly defining responsibilities for each agent made the application easier to maintain and opened the door for future expansion.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;I also learned how valuable Retrieval-Augmented Generation can be for applications that depend on structured knowledge. Separating reasoning from knowledge allows AI systems to remain accurate while making it easier to update information over time.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Finally, participating in the Microsoft Agents League was an incredible opportunity to experiment with Microsoft's latest AI technologies, learn from other developers, and share ideas with a global community passionate about agentic AI.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Looking Ahead&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;ShadowQuest is only the beginning.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;In future iterations, I plan to expand the project by introducing additional agents such as a Merchant Agent and Healer Agent, implementing persistent player memory, adding dynamic quest generation, improving combat mechanics, and enabling deeper collaboration between agents.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;These improvements will make the game world more immersive while continuing to explore the possibilities of agent-based AI systems.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;ShadowQuest demonstrates how Microsoft Foundry, Foundry IQ, Azure AI Search, GPT-4.1, and GitHub Copilot can be combined to build intelligent multi-agent applications.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;More importantly, the project reinforced an important idea: the future of AI is not a single assistant performing every task, but a team of specialized agents collaborating with shared knowledge to solve increasingly complex problems.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Participating in the Microsoft Agents League was an inspiring experience that allowed me to explore the next generation of AI development while building a project that combines storytelling, reasoning, and knowledge retrieval. I look forward to continuing this journey and discovering new ways to build intelligent applications using Microsoft's growing AI ecosystem.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2026 08:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-shadowquest-a-multi-agent-rpg/ba-p/4528055</guid>
      <dc:creator>ShardaKaur</dc:creator>
      <dc:date>2026-06-15T08:00:00Z</dc:date>
    </item>
    <item>
      <title>Gamifying World Improvement: A Reasoning-Agent RPG on Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/gamifying-world-improvement-a-reasoning-agent-rpg-on-microsoft/ba-p/4527648</link>
      <description>&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Building a multi-agent demo is the straightforward part. Building one where you can prove, live, with judges watching, that the agents reasoned, retrieved grounded context, called tools, and deferred to a human before awarding anything: that is where most teams run into friction.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We found out on stage. This project was demoed live at the &lt;A href="https://www.youtube.com/watch?v=Xj5LqH6k0U4" target="_blank" rel="noopener"&gt;Agents League Reasoning Agents battle on Microsoft Reactor&lt;/A&gt;: real-time narration, real Foundry calls, real latency, and a host watching the terminal.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For Microsoft Agents League Battle #2 (Reasoning Agents) the brief was the classic Game Master pattern: an orchestrator decomposes a goal, specialist agents execute, shared state tracks the world. We reskinned it with the biggest stakes we could defend. The game opens in a world that terraforms the Sahara and automates basic needs: food, water, energy, shelter. A vision that size is never commanded into existence; it has to be aligned. The player enters the story by founding a company on one front of the mission and taking the CEO's chair.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From there the loop is concrete. The player pitches the idea (or pastes a real company URL); a Master Narrator turns it into a quest line; a digital workforce, designed per company rather than hardcoded, produces real launch artefacts: positioning, landing-page structure, launch copy. Nothing earns XP until the human CEO approves it at a verification gate. That is the game's one law, and its title mechanic: your company is the dungeon.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;A mission too big to command needs a workforce you can verify: reason on Foundry, ground with Foundry IQ, validate with deterministic tools, ask the human before anything counts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Why a reasoning-agent demo is different&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A normal chatbot demo gates on "did it answer." A reasoning-agent demo has to gate on something harder: can you show the work, and can you stop the work? That reframes the whole build. The deployment unit is not "a model." It is an orchestrated run with a visible decomposition, logged tool calls, cited retrieval, and a human decision point. Borrowing Lee Stott's framing from &lt;A href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/cicd-for-ai-agents-on-microsoft-foundry/4522218" target="_blank" rel="noopener"&gt;CI/CD for AI Agents on Microsoft Foundry&lt;/A&gt;: release gates should be driven by evaluation outcomes, not just test results. We applied that idea one level down, to each artefact, at runtime.&lt;/P&gt;
&lt;H2&gt;The architecture in one diagram&lt;/H2&gt;
&lt;img /&gt;
&lt;P&gt;Two properties to read off the diagram. First, the loop is closed: the CEO's gate decision is written to agent memory and recalled into the next chapter's brief. Second, every cloud arrow has a keyless fallback. Clone the repo with no credentials and the whole game still plays in simulation mode.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;The pipeline: four agents before any work happens&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Most multi-agent demos hardcode their cast. We do not. The business defines the workforce; the workforce defines the quests. Four agents run before any artifact work starts:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Company Analyst.&lt;/STRONG&gt; Scrapes a real URL, reasons about the business, seeds the brief.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Org Designer.&lt;/STRONG&gt; Designs a digital-workforce blueprint for this specific company.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;World Designer.&lt;/STRONG&gt; Decomposes the pitch into a chapter quest line.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Worker Factory.&lt;/STRONG&gt; Binds each chapter to a worker and builds it as an Agent Framework agent with tools.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Pitch a bakery and you get a different org chart, and different quests, than a dev-tools startup. That is the difference between a scripted demo and a system.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Workers are real Agent Framework agents on Foundry&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Each designed worker is built at runtime with the Microsoft Agent Framework, with inference through the Foundry project Responses endpoint under AAD auth. Keyless, no secrets in .env:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="lia-code-sample language-python" tabindex="0" contenteditable="false" data-lia-code-value="# agents/maf_runtime.py
from agent_framework.foundry import FoundryChatClient

client = FoundryChatClient(
    project_endpoint=foundry_project_endpoint(),
    model=deployment,          # gpt-5 family deployment
    credential=_aad_credential(),   # DefaultAzureCredential - no API key
)"&gt;&lt;CODE&gt;# agents/maf_runtime.py
from agent_framework.foundry import FoundryChatClient

client = FoundryChatClient(
    project_endpoint=foundry_project_endpoint(),
    model=deployment,          # gpt-5 family deployment
    credential=_aad_credential(),   # DefaultAzureCredential - no API key
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Our deterministic validators are exposed to the model as real FunctionTools, capped so a stuck model cannot loop, and every mid-run call writes a receipt (args, result, latency) into the replay log:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="lia-code-sample language-python" tabindex="0" contenteditable="false" data-lia-code-value="@tool(name=tool_name,
      description=f&amp;quot;Run the deterministic '{tool_name}' check on a draft artifact...&amp;quot;,
      max_invocations=2)
def _t(artifact_json: str) -&amp;gt; str:
    meta[&amp;quot;maf_tools_called&amp;quot;].append(tool_name)
    receipt = {&amp;quot;tool&amp;quot;: tool_name, &amp;quot;source&amp;quot;: &amp;quot;maf-midrun&amp;quot;, &amp;quot;args&amp;quot;: {}, &amp;quot;result&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;ms&amp;quot;: 0.0}
    ..."&gt;&lt;CODE&gt;@tool(name=tool_name,
      description=f"Run the deterministic '{tool_name}' check on a draft artifact...",
      max_invocations=2)
def _t(artifact_json: str) -&amp;gt; str:
    meta["maf_tools_called"].append(tool_name)
    receipt = {"tool": tool_name, "source": "maf-midrun", "args": {}, "result": "", "ms": 0.0}
    ...&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So the model can check its own draft mid-run, but the gate score never comes from the model alone.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Best practice: deterministic gates first, model judgement second&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the single most important reliability decision, and it is the same principle Lee Stott states in the &lt;A href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/hybrid-ai-agents-in-python-routing-between-foundry-local-and-microsoft-foundry/4522979" target="_blank" rel="noopener"&gt;hybrid agents post&lt;/A&gt;: code the rules, and let the LLM judge only what is left. Our gate score is a weighted rubric with a deterministic floor. Structural validators (does the landing page have a headline, a CTA, a hero section; is the email parseable; do URLs resolve) set the minimum, and the narrator's rubric judgement can only move the score above that floor, never below the facts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Three scoring layers, and only one of them can award XP:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Mid-run tool calls.&lt;/STRONG&gt; Scored by deterministic validators. Advisory to the model only; cannot award XP.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;rubric_evaluate.&lt;/STRONG&gt; A Foundry model judges weighted dimensions, floored by the validators. Cannot award XP.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The CEO gate.&lt;/STRONG&gt; The human. The only path to XP.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Four proof points are logged on every single invocation, including simulation mode: iq_hits, memory_injected, tools_called, inference_usage. Every claim in the demo is a logged number in the replay log. It is the same discipline as carrying a correlation ID through every path.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Operational lessons learned&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Stream the reasoning, not just the answer.&lt;/STRONG&gt; The UI is a reasoning theater fed by SSE. Every decomposition beat, tool receipt, and retrieval hit arrives as an event. If a user cannot see the handoff, the system feels like one big chatbot.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Give every cloud arrow a dashed twin.&lt;/STRONG&gt; Foundry IQ falls back to local markdown retrieval; the Agent Service memory store falls back to a local JSON file; the model path falls back to scripted simulation. A live demo that depends on perfect connectivity is a demo waiting to fail.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cap tool invocations.&lt;/STRONG&gt; max_invocations=2 on every FunctionTool. Without it, a model in a tight spot calls the validator in a loop.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reasoning models and strict JSON do not mix.&lt;/STRONG&gt; Anything that must emit JSON gets a tolerant extractor (_extract_json) that survives think-blocks and markdown fences. This is the same think-block gotcha Lee Stott flags for local router models.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Scrub secrets at the sink.&lt;/STRONG&gt; Captured reasoning traces run through a secret scrubber before they touch the replay log.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Responsible AI&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The verification gate is the responsible-AI story, and it is also the lore's law: in the game's fiction, a human holds the seal, because a vision too big to command must keep a human at the root of every result. No artefact becomes progress without explicit human approval; every approval is logged with the full reasoning chain; rejected work goes back for rework with the rejection written into agent memory as binding direction. Deterministic validators bound what the model can claim about its own output, and the replay log preserves the whole chain for audit. Auth is keyless via DefaultAzureCredential. Nothing to leak, rotate, or commit by accident.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Try it&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Five minutes, no Azure account needed for the full game loop:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="lia-code-sample language-bash" tabindex="0" contenteditable="false" data-lia-code-value="git clone https://github.com/princepspolycap/agentsleague-afterbuild
cd agentsleague-afterbuild
python3 -m venv .venv &amp;amp;&amp;amp; source .venv/bin/activate
pip install -r submission/requirements.txt

python3 submission/tools/run_quest_simulation.py --pitch &amp;quot;Your idea here&amp;quot;"&gt;&lt;CODE&gt;git clone https://github.com/princepspolycap/agentsleague-afterbuild
cd agentsleague-afterbuild
python3 -m venv .venv &amp;amp;&amp;amp; source .venv/bin/activate
pip install -r submission/requirements.txt

python3 submission/tools/run_quest_simulation.py --pitch "Your idea here"&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For live Foundry runs, copy submission/.env.example to submission/.env and point it at your Foundry project endpoint and a gpt-5 family deployment, then az login.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Where this goes next&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Asked on stream where the project goes after the battle, the answer was already on the roadmap: "I'll deploy what I have as a web app that people can just use... I want to do some local models, make it even more accessible. Something people can play with even after the fact." That is the rest of this series. Part 2 covers how the agents remember the CEO's decisions, and Part 3 covers the fallback architecture and routing toward Foundry Local. The dungeon stays open after the competition ends.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Useful links:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Watch the live battle: &lt;A href="https://www.youtube.com/watch?v=Xj5LqH6k0U4" target="_blank" rel="noopener"&gt;Agents League - Reasoning Agents on Microsoft Reactor&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Clone the repo: &lt;A href="https://github.com/princepspolycap/agentsleague-afterbuild" target="_blank" rel="noopener"&gt;github.com/princepspolycap/agentsleague-afterbuild&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;The official challenge (submissions close 14 June 2026): &lt;A href="https://aka.ms/agentsleague/aisf" target="_blank" rel="noopener"&gt;Agents League registration&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-foundry/" target="_blank" rel="noopener"&gt;Microsoft Foundry docs&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Key takeaways&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Map your domain onto a proven orchestration pattern (Game Master) instead of inventing one.&lt;/LI&gt;
&lt;LI&gt;Design the workforce per input. The business defines the org, the org defines the quests.&lt;/LI&gt;
&lt;LI&gt;Deterministic gates first, model judgement second. A validator floor stops a model talking its way past a broken artefact.&lt;/LI&gt;
&lt;LI&gt;LLMs create, tools validate, humans approve, replay logs preserve.&lt;/LI&gt;
&lt;LI&gt;Ship a simulation fallback for every cloud dependency. Forkability is reliability.&lt;/LI&gt;
&lt;LI&gt;Log proof points on every invocation so every claim is an auditable number.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The dungeon is not a metaphor for difficulty. It is a metaphor for structure: a mission too big to command gets aligned one company at a time. Rooms you cannot skip, gates a human must pass, and a logged map of every step you took.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2026 06:40:36 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/gamifying-world-improvement-a-reasoning-agent-rpg-on-microsoft/ba-p/4527648</guid>
      <dc:creator>Princeps</dc:creator>
      <dc:date>2026-06-15T06:40:36Z</dc:date>
    </item>
    <item>
      <title>Copilot Notebooks and Study guide now available to Copilot Chat users</title>
      <link>https://techcommunity.microsoft.com/t5/education-blog/copilot-notebooks-and-study-guide-now-available-to-copilot-chat/ba-p/4527320</link>
      <description>&lt;P&gt;Every student knows the feeling: the test is coming, the materials are everywhere— and the hardest part is not finding information, it is knowing where to start. There is a PDF from the teacher. Slides from last week. A Word document with notes. Each piece has value. But studying means turning all of it into something usable: what to review first, what connects together, what still feels fuzzy, and what to practice learning.&lt;/P&gt;
&lt;P&gt;That is why we are excited to share two updates for education.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;First, Copilot Notebooks is now rolling out to Copilot Chat users, available with Microsoft 365 Education licenses.&lt;/STRONG&gt; Copilot Notebooks are AI-powered workspaces for a subject or group project built on reference materials—bringing together all context behind a topic in one place for you or your study group and Copilot to collaborate on. This addresses one of the top asks we have heard from Microsoft 365 Education customers: bring the power of Copilot Notebooks to the education licenses schools already use.&lt;/P&gt;
&lt;P&gt;This education expansion builds on the broader Copilot Notebooks announcement that brings Notebooks to all education and enterprise Copilot Chat users, including new ways to work with your own materials in Notebooks with mind maps, Study Guides, and coming soon - the ability to create Word documents, Excel spreadsheets, and PowerPoint presentations. You can read that announcement in the &lt;A class="lia-external-url" href="https://aka.ms/NotebooksJuneBlog" target="_blank" rel="noopener"&gt;broader commercial Copilot Notebooks blog&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Copilot Notebooks are available in the Microsoft 365 Copilot web and desktop versions for Education users. Expect them to be available in Education tenants in the next two weeks. They will be available in OneNote in the weeks to follow.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Get started:&lt;/STRONG&gt; Find&lt;SPAN data-teams="true"&gt; Copilot Notebooks located in the Microsoft 365 Copilot app waffle.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Second, Copilot Notebook Study Guide is now generally available both for education and enterprise users.&lt;/STRONG&gt; Study Guide is an AI-powered feature in Notebooks that turns the learning materials you provide into a complete, interactive study companion. Organized, editable, grounded in your references. Ready when you are.&lt;/P&gt;
&lt;H3&gt;For Education IT admins - what you need to know&lt;/H3&gt;
&lt;P&gt;Study guide lives inside Copilot Notebooks. Copilot Notebooks are available in the Microsoft 365 Copilot app. Copilot Notebooks are located in the Microsoft 365 Copilot app waffle.&lt;/P&gt;
&lt;P&gt;With this update, Copilot Notebooks are available to Microsoft 365 Education A1, A3, and A5 users.&lt;/P&gt;
&lt;P&gt;Study guide is available for education users ages 13+. Student accounts need the right &lt;A class="lia-external-url" href="https://aka.ms/enablecopilotchatstudentagegroup" target="_blank" rel="noopener"&gt;Age Group in Microsoft Entra ID&lt;/A&gt;, and K-12 students ages 13-17 need Microsoft 365 Copilot Chat enabled by an IT admin before they can use Copilot Notebooks and Study Guide.&lt;/P&gt;
&lt;P&gt;No additional deployment is needed for Study Guide. Study Guide is rolling out to enterprise and education customers starting June 11. It may take a few days to show up in your account.&lt;/P&gt;
&lt;H3&gt;What Study guide does&lt;/H3&gt;
&lt;P&gt;Study Guide takes the materials learners already have and helps turn them into a collection of organized study topics and activities of your choice.&lt;/P&gt;
&lt;P&gt;Drop in PDFs, Word documents, PowerPoint presentations, or Excel files. Study Guide reads across those references, identifies the key ideas, and creates a multi-page study guide inside the notebook.&lt;/P&gt;
&lt;P&gt;The important part:&lt;STRONG&gt; it is grounded in the sources you provide&lt;/STRONG&gt;. It is not pulling random facts from the internet. Summary pages and Topic pages include citations back to the original materials, so learners can check where information came from and return to the source when something needs a closer look.&lt;/P&gt;
&lt;P&gt;That matters for learning. It helps students stay connected to the actual course materials. It helps educators trust what students are practicing from turns citation-checking into a habit, not an afterthought.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;What is available in Study guide&lt;/H3&gt;
&lt;P&gt;Study Guide creates materials that span all phases of learning: understand, practice, and test.&lt;/P&gt;
&lt;H4&gt;Understand: deepen your knowledge of the material&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Summary page:&lt;/STRONG&gt; Start with a high-level overview of the materials you added. The Summary includes an overview, why the topic matters, key topics, a glossary, common misconceptions, and citations back to the source materials.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Topic pages:&lt;/STRONG&gt; Study Guide creates deeper pages for the major topics it finds in your content. These pages work like mini-chapters that cover content across all your references. They include explanations, sub-topic deep dives, worked examples, questions that make you think critically and analyze concepts, short exercises, and citations throughout.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Practice: strengthen recall, and make connections&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Flashcards:&lt;/STRONG&gt; Study Guide generates interactive cards from the learner's materials. Learners can flip cards, use hints, and edit the set so the wording matches how they think about the concept.&lt;BR /&gt;&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Fill in the blanks:&lt;/STRONG&gt; Key terms are removed from important sentences, and learners choose from a set of distraction answers to complete sentences. It is especially useful for processes, and sequences of events where the order and relationships matter.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Matching:&lt;/STRONG&gt; Study Guide creates matching tiles that ask learners to connect related ideas: terms to definitions, causes to effects, structures to functions, or concepts to examples.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Test: check what is sticking&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Quiz:&lt;/STRONG&gt; Study Guide creates a Microsoft Forms-powered quiz with questions generated from the materials. Learners can answer directly from the page, review results, and see explanations for multiple-choice answers. Results are private to the learner unless they choose to share them.&lt;BR /&gt;&lt;img /&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Every one of these formats is designed to move studying from passive review to active practice. Not just rereading or highlighting. Actually trying to remember, connect, explain, and check.&lt;/P&gt;
&lt;P&gt;Study Guide supports 21 languages at launch: Arabic, Chinese (Simplified), Chinese (Traditional), Danish, Dutch, English (US), Estonian, French (Canada), French (France), German, Hebrew, Italian, Japanese, Korean, Norwegian Bokmal, Portuguese (Brazil), Spanish (Mexico), Spanish (Spain), Swedish, Ukrainian, and Vietnamese.&lt;/P&gt;
&lt;P&gt;That means students can study from the materials they already use, in the language they already learn in, without having to move everything into a separate tool.&lt;/P&gt;
&lt;H3&gt;For educators&lt;/H3&gt;
&lt;P&gt;A few ways you can bring Study guide to your students or use it yourself:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Get started by taking the Microsoft Learn course. &lt;/STRONG&gt;Available now at &lt;A class="lia-external-url" href="https://learn.microsoft.com/training/modules/build-study-guides-copilot-notebooks/" target="_blank" rel="noopener"&gt;aka.ms/notebooksandstudyguidemodule&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Point learners to a specific study moment.&lt;/STRONG&gt; "Before Friday's quiz, add this week's slides and generate flashcards" is more useful than "use AI to study."&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Encourage active practice.&lt;/STRONG&gt; Flashcards, fill-in-the-blanks, matching, and quizzes help your students retrieve information from memory instead of only rereading it.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use citations as an AI literacy moment. &lt;/STRONG&gt;Study Guide shows where information came from. That opens a natural classroom conversation about checking sources, verifying AI-generated content, and staying grounded in the material.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keep assessment separate from practice.&lt;/STRONG&gt; Study Guide quizzes are for self-checking. They are not a gradebook, and quiz results are private unless a student chooses to share them.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keep building your own AI fluency.&lt;/STRONG&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Study Guide is built with privacy, safety, and learner control in mind. Study Guide pages are private by default, stored in the learner's Microsoft 365 notebook, and can be edited or deleted by the learner. Prompts and outputs are not used to train AI models, and quiz results are private unless a learner chooses to share them.&lt;/P&gt;
&lt;H3&gt;Get started&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Take the professional development course at &lt;A class="lia-external-url" href="https://learn.microsoft.com/training/modules/build-study-guides-copilot-notebooks/" target="_blank" rel="noopener"&gt;aka.ms/notebooksandstudyguidemodule&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://aka.ms/studyguidedocumentation" target="_blank" rel="noopener"&gt;Learn more&lt;/A&gt; about Study guide&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Anoo Padte is Principal Product Manager for AI in Education at Microsoft.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2026 02:31:33 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/education-blog/copilot-notebooks-and-study-guide-now-available-to-copilot-chat/ba-p/4527320</guid>
      <dc:creator>AnnoPadte</dc:creator>
      <dc:date>2026-06-17T02:31:33Z</dc:date>
    </item>
    <item>
      <title>Compliance Academy: A Multi-Agent Cyber Mystery Built on Microsoft Foundry Agent Service</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/compliance-academy-a-multi-agent-cyber-mystery-built-on/ba-p/4526526</link>
      <description>&lt;H2 data-line="6"&gt;The Challenge&lt;/H2&gt;
&lt;P data-line="8"&gt;Microsoft Reactor invited three individuals to compete in a live coding battle: build something that shows what reasoning agents can do, in front of a streaming audience, in roughly twelve minutes of airtime each.&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="8"&gt;Watch live at 9am PT 10th June &lt;A class="lia-external-url" href="https://developer.microsoft.com/en-us/reactor/events/26942/" target="_blank" rel="noopener"&gt;https://developer.microsoft.com/en-us/reactor/events/26942/&lt;/A&gt;&lt;/P&gt;
&lt;P data-line="8"&gt;The temptation is always to default to a clean Q&amp;amp;A demo or a document summarizer. I wanted to try something harder.&lt;/P&gt;
&lt;P data-line="10"&gt;The solution I will be building is &lt;STRONG&gt;Compliance Academy&lt;/STRONG&gt;: a multi-agent cyber-mystery role-playing game where the player is the lead investigator at&amp;nbsp;&lt;EM&gt;Helix Dynamics&lt;/EM&gt;, a fictional biotech that just lost 14 GB of clinical trial data. Five suspects. One perpetrator. Multiple frameworks (SOC 2, HIPAA, ISO 27001) and the company's own policies are all in play. By the end of the case, the player has learned a real compliance lesson, grounded in real policy excerpts retrieved from a real search index.&lt;/P&gt;
&lt;P data-line="12"&gt;The whole stack runs on&amp;nbsp;&lt;STRONG&gt;Microsoft Foundry Agent Service&lt;/STRONG&gt;, with&amp;nbsp;&lt;STRONG&gt;Azure OpenAI&lt;/STRONG&gt;&amp;nbsp;for the model layer and&amp;nbsp;&lt;STRONG&gt;Azure AI Search&lt;/STRONG&gt;&amp;nbsp;for retrieval. The source code is open:&amp;nbsp;&lt;A href="https://github.com/lwhieldon/msft-enterprise-learning-agent" target="_blank" rel="noopener" data-href="https://github.com/lwhieldon/msft-enterprise-learning-agent"&gt;github.com/lwhieldon/msft-enterprise-learning-agent&lt;/A&gt;.&lt;/P&gt;
&lt;H2 data-line="16"&gt;Why a Game?&lt;/H2&gt;
&lt;P data-line="18"&gt;Compliance training is something most professionals click through to satisfy a deadline. The content is dense, the stakes feel abstract, and the answers are usually "ask Legal." People learn just enough to pass the quiz, which is the wrong outcome for material that should change how they make decisions.&lt;/P&gt;
&lt;P data-line="20"&gt;Reasoning agents let us flip the contract. Instead of pushing content at the learner, a multi-agent system can stage a scenario where the learner has agency. They ask questions because they need answers. They interrogate suspects who are evasive on purpose. They piece evidence together. When the&amp;nbsp;&lt;STRONG&gt;Compliance Officer&lt;/STRONG&gt;&amp;nbsp;agent steps in at the end and cites SOC 2 CC6.1 alongside Helix Dynamics' own HD-SEC-AC-001 §4.1, the citation lands because the player just spent twenty minutes earning it.&lt;/P&gt;
&lt;P data-line="22"&gt;That was the hypothesis worth testing on stream: reasoning agents are good enough now that we can make compliance training feel like a case file instead of a quiz.&lt;/P&gt;
&lt;H2 data-line="26"&gt;The Architecture&lt;/H2&gt;
&lt;P data-line="28"&gt;Compliance Academy uses a&amp;nbsp;&lt;STRONG&gt;Connected Agents&lt;/STRONG&gt;&amp;nbsp;pattern on Microsoft Foundry Agent Service. Four party agents are always present, providing broad expertise. A roster of suspect agents activates dynamically as the player interrogates them. The player is the orchestrator. They decide who to talk to and when.&lt;/P&gt;
&lt;H3 data-line="30"&gt;The four party agents&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Agent&lt;/th&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Backing&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Game Master&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Scene-setting, action menu, scene close&lt;/td&gt;&lt;td&gt;gpt-4.1-mini&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Forensic Analyst&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Evidence reasoning, log analysis, framework lookups&lt;/td&gt;&lt;td&gt;gpt-4.1-mini + Azure AI Search retrieval&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Compliance Officer&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Post-scene verdict and framework grounding&lt;/td&gt;&lt;td&gt;gpt-4.1-mini + Azure AI Search retrieval&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Scenario Generator&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Hot-loads brand-new cases from a one-sentence breach prompt&lt;/td&gt;&lt;td&gt;gpt-4.1-mini&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="39"&gt;The five suspect agents&lt;/H3&gt;
&lt;P data-line="41"&gt;Each suspect is a separate agent instance with a templated system prompt that loads from scenario JSON. Personas include an HR Director, an IT Administrator, a vendor account manager, an executive assistant, and a summer intern. Each carries a distinct backstory, alibi, conversational style, and a list of "leak conditions" that determine when they reveal guarded information.&lt;/P&gt;
&lt;H3 data-line="43"&gt;The two surfaces&lt;/H3&gt;
&lt;P data-line="45"&gt;The audience-facing surface is a&amp;nbsp;&lt;STRONG&gt;Chainlit&lt;/STRONG&gt;&amp;nbsp;UI, branded for SC&amp;amp;H Group, with click-through action buttons (Briefing, Suspects, Evidence, Generate, Accuse, Wrap) and a side panel that shows the retrieved policy snippets the agents are reasoning over.&lt;/P&gt;
&lt;P data-line="47"&gt;The proof-of-orchestration surface is a&amp;nbsp;&lt;STRONG&gt;live activity log&lt;/STRONG&gt;&amp;nbsp;streaming to a second terminal. Every Azure AI Search retrieval, every Azure OpenAI POST, every first-token latency, every source name and relevance score scrolls by in real time. This terminal is the show-don't-tell evidence that the agents are genuinely doing work, not just dressing up a single prompt.&lt;/P&gt;
&lt;H2 data-line="51"&gt;Grounding Compliance Answers in Real Policy&lt;/H2&gt;
&lt;P data-line="53"&gt;The non-negotiable for compliance content is&amp;nbsp;&lt;STRONG&gt;hallucination prevention&lt;/STRONG&gt;. If the Forensic Analyst tells a learner that SOC 2 CC6.1 says something it doesn't, the entire training is worse than useless. So both the Forensic Analyst and the Compliance Officer ground their responses against an Azure AI Search index containing 52 chunked policy documents covering SOC 2, HIPAA, ISO 27001, NIST 800-53, and the fictional Helix Dynamics internal policy library.&lt;/P&gt;
&lt;P data-line="55"&gt;The retrieval is straightforward:&lt;/P&gt;
&lt;P style="font-family: Consolas, 'Courier New', monospace; font-size: 13px; line-height: 1.5; background-color: #f4f6f8; padding: 14px 18px; border-left: 3px solid #0079ff; border-radius: 3px; white-space: pre-wrap;"&gt;&lt;SPAN style="color: #0070c0;"&gt;def&lt;/SPAN&gt; retrieve_context(query: str, top_k: int = 5) -&amp;gt; list[dict]:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #2e8b57;"&gt;"""Search the compliance knowledge index and return top-k snippets."""&lt;/SPAN&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;client = build_search_client()&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;results = client.search(&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;search_text=query.strip(),&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;top=top_k,&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;select=[&lt;SPAN style="color: #2e8b57;"&gt;"uid"&lt;/SPAN&gt;, &lt;SPAN style="color: #2e8b57;"&gt;"snippet"&lt;/SPAN&gt;, &lt;SPAN style="color: #2e8b57;"&gt;"blob_url"&lt;/SPAN&gt;, &lt;SPAN style="color: #2e8b57;"&gt;"snippet_parent_id"&lt;/SPAN&gt;],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #0070c0;"&gt;return&lt;/SPAN&gt; [&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #2e8b57;"&gt;"source_url"&lt;/SPAN&gt;: r[&lt;SPAN style="color: #2e8b57;"&gt;"blob_url"&lt;/SPAN&gt;],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #2e8b57;"&gt;"snippet"&lt;/SPAN&gt;: r[&lt;SPAN style="color: #2e8b57;"&gt;"snippet"&lt;/SPAN&gt;],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #2e8b57;"&gt;"score"&lt;/SPAN&gt;: r[&lt;SPAN style="color: #2e8b57;"&gt;"@search.score"&lt;/SPAN&gt;],&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #0070c0;"&gt;for&lt;/SPAN&gt; r &lt;SPAN style="color: #0070c0;"&gt;in&lt;/SPAN&gt; results&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;]&lt;/P&gt;
&lt;P data-line="76"&gt;What makes it land in the demo is the per-source event logging. Each retrieval emits its filename and relevance score to the activity log:&lt;/P&gt;
&lt;P style="font-family: Consolas, 'Courier New', monospace; font-size: 13px; line-height: 1.5; background-color: #f4f6f8; padding: 14px 18px; border-left: 3px solid #41efaf; border-radius: 3px; white-space: pre-wrap;"&gt;[Foundry IQ]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Retrieved 5 sources in 1233ms&lt;BR /&gt;[Foundry IQ]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;vendor_breach_response.md&amp;nbsp;&amp;nbsp;(score=11.20)&lt;BR /&gt;[Foundry IQ]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;helix_dynamics_overview.md&amp;nbsp;&amp;nbsp;(score=7.65)&lt;BR /&gt;[Foundry IQ]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;credential_compromise_response.md&amp;nbsp;&amp;nbsp;(score=7.41)&lt;BR /&gt;[Azure OpenAI]&amp;nbsp;&amp;nbsp;POST gpt-4.1-mini&amp;nbsp;&amp;nbsp;(max_tokens=1500, temp=0.4)&lt;BR /&gt;[Azure OpenAI]&amp;nbsp;&amp;nbsp;First token in 8328ms&lt;BR /&gt;[Azure OpenAI]&amp;nbsp;&amp;nbsp;Stream complete: ~816 tokens in 14.9s&lt;/P&gt;
&lt;P data-line="88"&gt;When the Forensic Analyst then cites HD-SEC-AC-001 §4.1 on MFA exceptions, the audience can see the specific document the citation came from. The trust loop closes on screen.&lt;/P&gt;
&lt;H2 data-line="92"&gt;Live World-Building with the Scenario Generator&lt;/H2&gt;
&lt;P data-line="94"&gt;The piece I am most proud of technically is the&amp;nbsp;&lt;STRONG&gt;Scenario Generator&lt;/STRONG&gt;. Mid-demo, the host or an audience member can pitch a breach in one or two sentences. Roughly forty seconds later, a brand new scenario hot-loads into the game.&lt;/P&gt;
&lt;P data-line="96"&gt;The agent emits structured JSON: a premise narration, five suspects with assigned roles and hidden truths, six to ten pieces of evidence, four to six violated controls, a clue graph, and a compliance lesson. The output then runs through a validation layer:&lt;/P&gt;
&lt;P style="font-family: Consolas, 'Courier New', monospace; font-size: 13px; line-height: 1.5; background-color: #f4f6f8; padding: 14px 18px; border-left: 3px solid #0079ff; border-radius: 3px; white-space: pre-wrap;"&gt;&lt;SPAN style="color: #0070c0;"&gt;try&lt;/SPAN&gt;:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;merged_scenario = load_scenario_from_dict(scenario_override)&lt;BR /&gt;&lt;SPAN style="color: #0070c0;"&gt;except&lt;/SPAN&gt; ScenarioValidationError &lt;SPAN style="color: #0070c0;"&gt;as&lt;/SPAN&gt; exc:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #0070c0;"&gt;if&lt;/SPAN&gt; validation_attempt &amp;lt; max_validation_retries:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #808080;"&gt;# Feed the validation error back to the model as a corrective message&lt;/SPAN&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;user_message = _build_validation_retry_message(&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;breach_description, exc&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #0070c0;"&gt;continue&lt;/SPAN&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN style="color: #0070c0;"&gt;raise&lt;/SPAN&gt;&lt;/P&gt;
&lt;P data-line="111"&gt;If validation fails (wrong perpetrator count, missing canonical suspect, malformed evidence reference), the generator retries with the specific error fed back as a corrective message. Two validation cycles is usually enough. The session state then resets cleanly, the new briefing renders, and the player can start investigating immediately. Same agents. Brand new world.&lt;/P&gt;
&lt;H2 data-line="115"&gt;Lessons Learned&lt;/H2&gt;
&lt;P data-line="117"&gt;A few takeaways I will carry to the next reasoning-agent project:&lt;/P&gt;
&lt;UL data-line="119"&gt;
&lt;LI data-line="119"&gt;&lt;STRONG&gt;Ground anything compliance-related in retrieval, always.&lt;/STRONG&gt;&amp;nbsp;Even "general knowledge" policy citations are a hallucination risk. The Connected Agents pattern made it natural to wire retrieval into the two agents that needed it (Forensic Analyst, Compliance Officer) without complicating the others.&lt;/LI&gt;
&lt;LI data-line="120"&gt;&lt;STRONG&gt;Plan the observability surface as a first-class deliverable.&lt;/STRONG&gt;&amp;nbsp;The activity log was not an afterthought; it was the audience's proof that the agents were real. For a live demo, observability is part of the user experience.&lt;/LI&gt;
&lt;LI data-line="121"&gt;&lt;STRONG&gt;Validation with corrective retry beats perfect prompting.&lt;/STRONG&gt;&amp;nbsp;Let the model fix its own structured output errors when the schema permits.&lt;/LI&gt;
&lt;LI data-line="122"&gt;&lt;STRONG&gt;Specialize and route.&lt;/STRONG&gt;&amp;nbsp;The Connected Agents pattern works well for clear role separation. Don't try to make one agent do everything; give each agent a tight remit and a clear handoff.&lt;/LI&gt;
&lt;LI data-line="123"&gt;&lt;STRONG&gt;Keep a terminal-driven backup surface.&lt;/STRONG&gt;&amp;nbsp;When the live demo gods are unhappy, falling back to a CLI is a graceful recovery path. The CLI orchestrator and the Chainlit UI in this project share the same agent functions, so either surface tells the same story.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-line="127"&gt;Why Microsoft Foundry Agent Service Was the Right Fit&lt;/H2&gt;
&lt;P data-line="129"&gt;Foundry Agent Service was the right home for this build because:&lt;/P&gt;
&lt;UL data-line="131"&gt;
&lt;LI data-line="131"&gt;&lt;STRONG&gt;Connected Agents&lt;/STRONG&gt;&amp;nbsp;maps cleanly onto a party-of-agents game design.&lt;/LI&gt;
&lt;LI data-line="132"&gt;&lt;STRONG&gt;Model Router&lt;/STRONG&gt;&amp;nbsp;gives one endpoint that picks the right backing model per agent without rebuilding clients.&lt;/LI&gt;
&lt;LI data-line="133"&gt;&lt;STRONG&gt;Azure AI Search&lt;/STRONG&gt;&amp;nbsp;with agentic retrieval is first-class — no separate retrieval service to provision.&lt;/LI&gt;
&lt;LI data-line="134"&gt;&lt;STRONG&gt;Azure Entra ID&lt;/STRONG&gt;&amp;nbsp;integration means the entire stack runs under enterprise SSO with no token juggling.&lt;/LI&gt;
&lt;LI data-line="135"&gt;&lt;STRONG&gt;Streaming responses&lt;/STRONG&gt;&amp;nbsp;from Azure OpenAI deployments work cleanly with Python's async generators, which made the activity log timing events trivial to wire up.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="137"&gt;The full stack — Foundry Agent Service, Azure OpenAI, Azure AI Search, and Azure Storage — sits inside a single resource group, which made teardown between dry runs trivial.&lt;/P&gt;
&lt;P data-line="137"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-line="137"&gt;Join and watch live to learn more how we tackle the Agents League Challenges &lt;A class="lia-external-url" href="https://developer.microsoft.com/en-us/reactor/events/26942/" target="_blank" rel="noopener"&gt;https://developer.microsoft.com/en-us/reactor/events/26942/&lt;/A&gt;&lt;/P&gt;
&lt;H2 data-line="141"&gt;Try It Yourself&lt;/H2&gt;
&lt;P data-line="143"&gt;The repository is open source. Clone it and follow the README:&lt;/P&gt;
&lt;P style="font-family: Consolas, 'Courier New', monospace; font-size: 13px; line-height: 1.5; background-color: #f4f6f8; padding: 14px 18px; border-left: 3px solid #41efaf; border-radius: 3px; white-space: pre-wrap;"&gt;git clone https://github.com/lwhieldon/msft-enterprise-learning-agent.git&lt;BR /&gt;cd msft-enterprise-learning-agent&lt;BR /&gt;python -m venv .venv&lt;BR /&gt;.\.venv\Scripts\activate&lt;BR /&gt;pip install -r requirements.txt&lt;BR /&gt;chainlit run app.py -w&lt;/P&gt;
&lt;P data-line="154"&gt;Three pre-built scenarios ship with the repo (Default, Supply Chain, Vishing). The Generate button creates new ones live.&lt;/P&gt;
&lt;P data-line="156"&gt;If you build on top of this, want to compare notes on multi-agent design, or want to swap reasoning-agent war stories, I am&amp;nbsp;&lt;A href="https://github.com/Lwhieldon" target="_blank" rel="noopener" data-href="https://github.com/Lwhieldon"&gt;@Lwhieldon on GitHub&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://www.linkedin.com/in/lee-w-9b3a0620/" target="_blank" rel="noopener" data-href="https://www.linkedin.com/in/lee-w-9b3a0620/"&gt;Lee Whieldon on LinkedIn&lt;/A&gt;.&lt;/P&gt;
&lt;P data-line="158"&gt;Thanks to&amp;nbsp;&lt;STRONG&gt;Lee Stott&lt;/STRONG&gt;&amp;nbsp;and&amp;nbsp;&lt;STRONG&gt;Carlotta Castelluccio&lt;/STRONG&gt;&amp;nbsp;at Microsoft for hosting the Reactor battle and giving these projects a stage.&lt;/P&gt;
&lt;P data-line="162"&gt;&lt;STRONG&gt;About the author.&lt;/STRONG&gt; Lee Whieldon is a Principal at SC&amp;amp;H Group, leading the Data Analytics &amp;amp; AI advisory practice. She works at the intersection of structured data, reasoning agents, and enterprise delivery.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2026 07:39:39 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/compliance-academy-a-multi-agent-cyber-mystery-built-on/ba-p/4526526</guid>
      <dc:creator>lwhieldon</dc:creator>
      <dc:date>2026-06-10T07:39:39Z</dc:date>
    </item>
    <item>
      <title>Make Your Copilot Credits Count: A Student's Guide to Smarter AI Usage</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/make-your-copilot-credits-count-a-student-s-guide-to-smarter-ai/ba-p/4526668</link>
      <description>&lt;P&gt;If you're a student enrolled in&amp;nbsp;&lt;A href="https://education.github.com" target="_blank" rel="noopener"&gt;GitHub Education&lt;/A&gt;, you already have something most developers pay for: free access to GitHub Copilot and its premium features. That's incredible. But here's the thing, free access doesn't mean unlimited usage, and not all AI interactions cost the same. Every chat message, every agent task, every model call consumes something called &lt;STRONG&gt;AI Credits&lt;/STRONG&gt;, and knowing how they work will help you use Copilot smarter, produce better code, and build the kind of disciplined AI habits that professional developers are only just starting to learn.&lt;/P&gt;
&lt;P&gt;This post is inspired by a fantastic deep-dive from my collegaue developer advocate&lt;STRONG&gt;&amp;nbsp;Bruno&lt;/STRONG&gt;: &lt;A href="https://elbruno.com/2026/06/04/github-copilot-and-tokens-how-to-keep-using-ai-without-burning-your-budget-in-three-prompts-some-personal-lessons-learned/" target="_blank" rel="noopener"&gt; "GitHub Copilot and Tokens: How to Keep Using AI Without Burning Your Budget" &lt;/A&gt;. We've taken those professional lessons and tailored them specifically for students&amp;nbsp; because your learning environment, your assignments, and your goals are different from a seasoned engineer at a tech company.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;STRONG&gt;TL;DR:&lt;/STRONG&gt; Use autocomplete before chat. Choose the right model. Keep context small. Start fresh chats often. Plan before you build. These habits will make you a better developer and stretch your credits further.&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;What Are AI Credits and Why Do They Matter?&lt;/H2&gt;
&lt;P&gt;When you interact with GitHub Copilot through chat, agent mode, or inline edits the model processes &lt;STRONG&gt;tokens&lt;/STRONG&gt;. Tokens are small chunks of text (roughly 3–4 characters each). Every interaction consumes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Input tokens&lt;/STRONG&gt; — everything sent to the model (your message, attached files, chat history, instructions)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Output tokens&lt;/STRONG&gt; — everything the model generates back to you&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cached tokens&lt;/STRONG&gt; — context the model reuses from previous turns (cheaper)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These tokens are converted to &lt;STRONG&gt;AI Credits&lt;/STRONG&gt;, where &lt;STRONG&gt;1 AI Credit = $0.01 USD&lt;/STRONG&gt;. Different models have very different token costs a lightweight model like GPT-5 mini charges $0.25 per million input tokens, while a powerful model like GPT-5.5 charges $5.00 per million input tokens (20x more expensive). Using the wrong model for a simple task is like taking a taxi to a destination that's a 5-minute walk.&lt;/P&gt;
&lt;P&gt;See the official pricing table: &lt;A href="https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing" target="_blank" rel="noopener"&gt; GitHub Copilot Models and Pricing &lt;/A&gt;.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 1: The four cost tiers of Copilot interactions. Autocomplete and Next Edit Suggestions are free — they do not consume AI Credits on paid plans&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;Strategy 1: Tab Before Chat&amp;nbsp; The Free Tier is Powerful&lt;/H2&gt;
&lt;P&gt;Here is the single most impactful habit you can build: &lt;STRONG&gt;always try autocomplete before opening chat&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;According to &lt;A href="https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing" target="_blank" rel="noopener"&gt;GitHub's official billing documentation&lt;/A&gt;, code completions and Next Edit Suggestions are &lt;STRONG&gt;not billed as AI Credits&lt;/STRONG&gt; on paid plans. That means every time you press Tab to accept an inline suggestion, you are getting AI assistance for free.&lt;/P&gt;
&lt;P&gt;Use autocomplete (Tab) for:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Completing a line or a simple function&lt;/LI&gt;
&lt;LI&gt;Generating repetitive boilerplate (constructors, properties, getters/setters)&lt;/LI&gt;
&lt;LI&gt;Completing a repeated pattern you've started&lt;/LI&gt;
&lt;LI&gt;Writing obvious next lines like &lt;CODE&gt;console.log&lt;/CODE&gt;, imports, or variable declarations&lt;/LI&gt;
&lt;LI&gt;Adjusting variable names inline&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Only move to &lt;STRONG&gt;Inline Edit&lt;/STRONG&gt; (Ctrl+I / Cmd+I) when autocomplete isn't enough for a local change. Only open a &lt;STRONG&gt;Chat&lt;/STRONG&gt; window when you need genuine reasoning an explanation, a plan, or a multi-step solution.&lt;/P&gt;
&lt;P&gt;As Bruno puts it:&amp;nbsp;&lt;EM&gt;"The most expensive model in the world should not be helping you write &lt;CODE&gt;public string Name { get; set; }&lt;/CODE&gt;. That's what Tab is for. And coffee."&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;Strategy 2: Choose the Right Model for the Job&lt;/H2&gt;
&lt;P&gt;GitHub Copilot gives you access to models from OpenAI, Anthropic, and Google&amp;nbsp; each at different price points and capability levels. The key insight from &lt;A href="https://code.visualstudio.com/docs/copilot/guides/optimize-usage" target="_blank" rel="noopener"&gt;VS Code's official Copilot usage guide&lt;/A&gt; is: &lt;STRONG&gt;reserve powerful reasoning models for tasks that genuinely need them&lt;/STRONG&gt;.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Your Task&lt;/th&gt;&lt;th&gt;Recommended Model Tier&lt;/th&gt;&lt;th&gt;Example Models&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Simple question or boilerplate&lt;/td&gt;&lt;td&gt;Lightweight&lt;/td&gt;&lt;td&gt;GPT-5 mini, Gemini 3 Flash&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Code explanation or basic docs&lt;/td&gt;&lt;td&gt;Lightweight&lt;/td&gt;&lt;td&gt;GPT-5 mini, GPT-5.4 nano&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Writing tests or debugging a single function&lt;/td&gt;&lt;td&gt;Medium / Versatile&lt;/td&gt;&lt;td&gt;Claude Haiku 4.5, GPT-5.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-file refactor or code review&lt;/td&gt;&lt;td&gt;Medium / Versatile&lt;/td&gt;&lt;td&gt;Claude Sonnet 4.6, GPT-5.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Complex system design or architecture&lt;/td&gt;&lt;td&gt;Powerful&lt;/td&gt;&lt;td&gt;Claude Opus 4.7, GPT-5.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Long agentic workflows&lt;/td&gt;&lt;td&gt;Powerful (scoped!)&lt;/td&gt;&lt;td&gt;Claude Opus 4.8, GPT-5.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Not sure what you need&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Auto (recommended default)&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Copilot selects for you&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;GitHub Copilot's &lt;A href="https://docs.github.com/en/copilot/concepts/auto-model-selection" target="_blank" rel="noopener"&gt;Auto Model Selection&lt;/A&gt; feature automatically chooses a model based on task complexity, availability, and policies. For most students, &lt;STRONG&gt;Auto should be your default&lt;/STRONG&gt;&amp;nbsp; only switch manually when you have a specific reason. And when the complex task is done, switch back to Auto or a lighter model.&lt;/P&gt;
&lt;H2&gt;Strategy 3: Context is Currency&amp;nbsp; Smaller is Smarter&lt;/H2&gt;
&lt;P&gt;Here's the counterintuitive truth that surprises most developers: &lt;STRONG&gt;the expensive part of a prompt is usually not the question you type&amp;nbsp; it's everything surrounding it.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Every token consumed by Copilot includes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;All your previous chat messages in the session&lt;/LI&gt;
&lt;LI&gt;Every file you have open or attached&lt;/LI&gt;
&lt;LI&gt;Workspace search results Copilot pulled in&lt;/LI&gt;
&lt;LI&gt;Build output, terminal logs, or diff content&lt;/LI&gt;
&lt;LI&gt;Responses from any MCP (Model Context Protocol) servers you have enabled&lt;/LI&gt;
&lt;LI&gt;Your custom instructions file (&lt;CODE&gt;.github/copilot-instructions.md&lt;/CODE&gt;)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A single question inside a conversation with 80 messages, 12 open files, and 3 tool call results can cost significantly more than the same question asked fresh in a new chat with one relevant file attached.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 2: The same task asked two ways. Scope your prompts to save credits and often get better answers.&lt;/EM&gt;&lt;/P&gt;
&lt;H3&gt;Practical rules for context management:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Attach only 2–3 relevant files&lt;/STRONG&gt; — not your entire project&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Don't ask Copilot to analyse the whole repo&lt;/STRONG&gt; when you only need changes in one module&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Paste only the first relevant error&lt;/STRONG&gt; from a log, not 2,000 lines of output&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Remove timestamps and duplicate stack traces&lt;/STRONG&gt; from pasted logs&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;State the expected output format&lt;/STRONG&gt; explicitly so the model stops early&lt;/LI&gt;
&lt;LI&gt;Use &lt;CODE&gt;/compact&lt;/CODE&gt; in VS Code Chat to summarise a long conversation without losing key context&lt;/LI&gt;
&lt;LI&gt;Use &lt;CODE&gt;/fork&lt;/CODE&gt; to explore an alternative direction without polluting the main conversation&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Strategy 4: Start Fresh Chats When You Change Tasks&lt;/H2&gt;
&lt;P&gt;This is one of the simplest optimisations&amp;nbsp; and one of the most ignored. The &lt;A href="https://code.visualstudio.com/docs/copilot/guides/optimize-usage" target="_blank" rel="noopener"&gt;VS Code Copilot usage guide&lt;/A&gt; is explicit about it: when a conversation grows, it carries context from all previous messages. If you switch to an unrelated task in the same session, the model still processes that irrelevant history and you pay for it in credits.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Bad pattern:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Chat session:
  - "Help me fix the JWT bug in auth.ts"   [10 messages]
  - "Now write unit tests for my sorting algorithm"  [still in same chat!]
  - "Can you generate the README for my project?"    [still in same chat!]
  - "Now debug this CSS layout issue..."             [still in same chat!]&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;Smart pattern:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Chat 1: "Fix JWT bug in auth.ts" - DONE, close chat.
Chat 2: "Write unit tests for sorting algorithm" - DONE, close chat.
Chat 3: "Generate README for project" - fresh context, fresh cost.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;New task = new chat. Your human brain benefits too — focused sessions produce better outcomes than sprawling multi-topic conversations.&lt;/P&gt;
&lt;H2&gt;Strategy 5: Plan Before You Build&amp;nbsp; Use Agent Mode Wisely&lt;/H2&gt;
&lt;P&gt;Agent mode is one of the most powerful Copilot features for students working on larger assignments — it can create files, run terminal commands, edit across multiple files, and execute tests. But agent mode also carries the highest token cost, because it loops: it plans, acts, observes tool output, then plans again.&lt;/P&gt;
&lt;P&gt;The VS Code documentation &lt;A href="https://code.visualstudio.com/docs/copilot/guides/optimize-usage" target="_blank" rel="noopener"&gt;recommends&lt;/A&gt; separating planning from implementation to reduce rework and back-and-forth. Here's a phased approach that saves credits and produces better results:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 3: The credit-smart workflow. Always try the cheaper option first, escalate only when needed.&lt;/EM&gt;&lt;/P&gt;
&lt;H3&gt;Phase 1: Plan (lightweight model, low cost)&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;I need to add user authentication to my Express app.
Before writing any code, give me a step-by-step plan
covering which files to create, which packages to install,
and what tests to write. Do not write code yet.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Phase 2: Scoped Implementation (one feature at a time)&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;Using the plan we agreed, implement only Step 1:
create src/middleware/auth.ts with JWT validation.
Do not modify any other files yet.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Phase 3: Validate&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;Run the existing tests in tests/auth.test.ts
and report the results. Fix only test failures
related to the new auth middleware.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Phase 4: Cleanup&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;The implementation is complete.
Update README.md with setup instructions for the auth module.
Keep it under 200 words.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Each phase is small, scoped, and verifiable. You can stop at any phase, check the result, and only continue when you're satisfied. This dramatically reduces expensive re-runs where the agent reverses its own changes.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Strategy 6: Review Your MCP Servers and Custom Instructions&lt;/H2&gt;
&lt;H3&gt;MCP Servers&lt;/H3&gt;
&lt;P&gt;MCP (Model Context Protocol) servers let Copilot connect to external tools&amp;nbsp; databases, GitHub issues, Jira, Slack, browser automation, and more. Each enabled server expands what the agent can do, but also adds to the context the model must consider, which increases token usage.&lt;/P&gt;
&lt;P&gt;For students, a practical rule: &lt;STRONG&gt;only enable MCP servers relevant to your current project&lt;/STRONG&gt;. If you're working on a simple Python web app, you probably don't need browser automation, a Kubernetes connector, and a Slack integration all active at the same time.&lt;/P&gt;
&lt;P&gt;See the &lt;A href="https://code.visualstudio.com/docs/copilot/customization/mcp-servers" target="_blank" rel="noopener"&gt;VS Code MCP servers documentation&lt;/A&gt; for how to enable, disable, and configure them.&lt;/P&gt;
&lt;H3&gt;Custom Instructions&lt;/H3&gt;
&lt;P&gt;A &lt;CODE&gt;.github/copilot-instructions.md&lt;/CODE&gt; file in your repository lets you give Copilot standing instructions — coding standards, testing commands, architecture conventions. This is a fantastic feature. But that file is &lt;STRONG&gt;included in every prompt's context&lt;/STRONG&gt;, so a bloated instructions file costs credits on every single interaction.&lt;/P&gt;
&lt;P&gt;A good custom instructions file is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Short — under 200 words for a student project&lt;/LI&gt;
&lt;LI&gt;Specific to &lt;EM&gt;this&lt;/EM&gt; repository's real conventions&lt;/LI&gt;
&lt;LI&gt;Clear about test commands (e.g., &lt;CODE&gt;npm test&lt;/CODE&gt;, &lt;CODE&gt;pytest&lt;/CODE&gt;)&lt;/LI&gt;
&lt;LI&gt;Free of generic advice that applies to every codebase on earth&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Example of a good student instructions file:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# Copilot Instructions for MyWebApp

Language: TypeScript (strict mode)
Framework: Express.js with Prisma ORM
Tests: Run with `npm test` (Jest)
Lint: Run with `npm run lint` (ESLint + Prettier)

Conventions:
- Use async/await, not callbacks
- Validate all request inputs with Zod
- Keep controllers thin; put logic in service files
- Write a test for every new public function&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That's it. Short, actionable, and genuinely useful — not a 500-line manifesto.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Strategy 7: Use Traditional Tools First&lt;/H2&gt;
&lt;P&gt;AI is excellent for reasoning, explaining, planning, and connecting ideas. It is &lt;EM&gt;not&lt;/EM&gt; the right tool for every job. Before reaching for Copilot chat, ask yourself whether a traditional tool can answer your question faster, cheaper, and more reliably:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Compiler / type-checker&lt;/STRONG&gt; — to find type errors (TypeScript, mypy)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Linter&lt;/STRONG&gt; — to find style and logic issues (ESLint, Pylint, Checkstyle)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Formatter&lt;/STRONG&gt; — to fix formatting (Prettier, Black, gofmt)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Test runner&lt;/STRONG&gt; — to confirm whether your code works (Jest, pytest, JUnit)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Debugger&lt;/STRONG&gt; — to step through execution and inspect state&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Docs / Stack Overflow&lt;/STRONG&gt; — for well-documented APIs and common patterns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If your linter tells you there's a missing import, fix it directly — don't ask Copilot to analyse your code to find it. Let deterministic tools do deterministic work, and let AI do the reasoning where it genuinely adds value.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Your GitHub Education Benefits: What You Get&lt;/H2&gt;
&lt;P&gt;If you haven't already, apply for &lt;A href="https://education.github.com" target="_blank" rel="noopener"&gt;GitHub Education&lt;/A&gt; with your school email address. Once verified, you receive:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Free GitHub Copilot&lt;/STRONG&gt; including premium features — see &lt;A href="https://docs.github.com/en/copilot/how-tos/copilot-on-github/set-up-copilot/enable-copilot/set-up-for-students" target="_blank" rel="noopener"&gt; how to enable Copilot as a student &lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Free GitHub Codespaces&lt;/STRONG&gt; — 180 core hours per month, equivalent to GitHub Pro (great for browser-based coding with Copilot built in)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;GitHub Student Developer Pack&lt;/STRONG&gt; — free access to dozens of professional tools from GitHub's partners, including cloud credits, domains, and IDEs&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;GitHub Classroom&lt;/STRONG&gt; — your instructors can manage assignments and provide feedback&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;GitHub Community Exchange&lt;/STRONG&gt; — discover and contribute to student-built projects&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Campus Experts program&lt;/STRONG&gt; — become a student leader in your tech community&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These benefits are designed to give you real-world tools in an educational setting. Copilot is the standout feature — it's the same tool professional developers use every day. Using it wisely during your studies means you'll arrive in the workforce already ahead of the curve.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Pre-Prompt Checklist for Students&lt;/H2&gt;
&lt;P&gt;Before you fire off your next Copilot prompt, run through this checklist. It takes 10 seconds and can save significant credits — and more importantly, it builds the mental habits of a professional AI user.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 4: Two-column checklist covering what to check before opening chat and when writing your prompt.&lt;/EM&gt;&lt;/P&gt;
&lt;H3&gt;Before you open chat:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;☐ Can Tab / autocomplete solve this?&lt;/LI&gt;
&lt;LI&gt;☐ Is inline edit (Ctrl+I) enough for this local change?&lt;/LI&gt;
&lt;LI&gt;☐ Can a linter, compiler, or test runner answer this?&lt;/LI&gt;
&lt;LI&gt;☐ Is this a different task from my last message? If so, start a new chat.&lt;/LI&gt;
&lt;LI&gt;☐ Am I on Auto model selection (or the right tier for this task)?&lt;/LI&gt;
&lt;LI&gt;☐ Should I ask for a plan before asking for code?&lt;/LI&gt;
&lt;LI&gt;☐ Do I have MCP servers enabled that I don't need right now?&lt;/LI&gt;
&lt;LI&gt;☐ Is my &lt;CODE&gt;copilot-instructions.md&lt;/CODE&gt; file concise and current?&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;When writing your prompt:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;☐ Attach only 2–3 relevant files, not the whole project&lt;/LI&gt;
&lt;LI&gt;☐ Paste only the first relevant error from any logs&lt;/LI&gt;
&lt;LI&gt;☐ Define the files to change, the goal, and any files &lt;EM&gt;not&lt;/EM&gt; to touch&lt;/LI&gt;
&lt;LI&gt;☐ Ask for a plan before implementation on complex tasks&lt;/LI&gt;
&lt;LI&gt;☐ Remove timestamps and duplicate stack traces from pasted logs&lt;/LI&gt;
&lt;LI&gt;☐ State the expected output format and length&lt;/LI&gt;
&lt;LI&gt;☐ Use &lt;CODE&gt;/compact&lt;/CODE&gt; if the session is getting long&lt;/LI&gt;
&lt;LI&gt;☐ Use &lt;CODE&gt;/fork&lt;/CODE&gt; to explore alternatives without polluting the main thread&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;A Note on Responsible AI Use in Education&lt;/H2&gt;
&lt;P&gt;Using Copilot smartly is not just about saving credits it's about developing genuine skills. When you ask Copilot to write all your code without understanding it, you lose the learning opportunity the assignment was designed to create. When you review and understand every suggestion Copilot makes, you learn faster, build better instincts, and can confidently explain your own work.&lt;/P&gt;
&lt;P&gt;Best practices for academic integrity with AI tools:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Understand before you accept&lt;/STRONG&gt; — never paste code you can't explain&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use Copilot to learn, not to skip learning&lt;/STRONG&gt; — ask it to explain the code it generates&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Follow your institution's AI policy&lt;/STRONG&gt; — many universities have specific guidance on AI use in assessments&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Treat Copilot as a senior pair-programmer&lt;/STRONG&gt;, not an answer machine — question its suggestions, push back, iterate&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Verify facts and documentation links&lt;/STRONG&gt; — AI can hallucinate; always check official sources&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;GitHub Education exists to give you real professional tools while you learn. The goal is for you to graduate with genuine skills, a real portfolio, and the confidence that comes from building things yourself — with AI as your collaborator, not your ghostwriter.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Key Takeaways&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Tab first&lt;/STRONG&gt; — autocomplete and Next Edit Suggestions are free; use them for everything small&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Auto model by default&lt;/STRONG&gt; — only switch to a powerful model when you have a clear reason&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Context is cost&lt;/STRONG&gt; — fewer files, fewer messages, fewer tools = fewer tokens&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;New task = new chat&lt;/STRONG&gt; — don't carry stale context into unrelated work&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Plan before you build&lt;/STRONG&gt; — a 10-message plan session is cheaper than 50 messages of rework&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keep instructions short&lt;/STRONG&gt; — your &lt;CODE&gt;copilot-instructions.md&lt;/CODE&gt; runs on every prompt&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use traditional tools first&lt;/STRONG&gt; — linters and compilers are free, fast, and deterministic&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Understand your code&lt;/STRONG&gt; — Copilot is a collaborator, not a replacement for learning&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;Resources and Next Steps&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://education.github.com" target="_blank" rel="noopener"&gt;GitHub Education&lt;/A&gt; — apply for your free student benefits&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://education.github.com/pack" target="_blank" rel="noopener"&gt;GitHub Student Developer Pack&lt;/A&gt; — explore free tools for students&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.github.com/en/copilot/how-tos/copilot-on-github/set-up-copilot/enable-copilot/set-up-for-students" target="_blank" rel="noopener"&gt; Enable GitHub Copilot as a student &lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing" target="_blank" rel="noopener"&gt; GitHub Copilot: Models and Pricing &lt;/A&gt; — understand exactly what each model costs&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.github.com/en/copilot/concepts/auto-model-selection" target="_blank" rel="noopener"&gt; Auto Model Selection in GitHub Copilot &lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://code.visualstudio.com/docs/copilot/guides/optimize-usage" target="_blank" rel="noopener"&gt; VS Code: Optimising GitHub Copilot Usage &lt;/A&gt; — the official guide that inspired many of these tips&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://code.visualstudio.com/docs/copilot/customization/mcp-servers" target="_blank" rel="noopener"&gt; Managing MCP Servers in VS Code &lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://elbruno.com/2026/06/04/github-copilot-and-tokens-how-to-keep-using-ai-without-burning-your-budget-in-three-prompts-some-personal-lessons-learned/" target="_blank" rel="noopener"&gt; El Bruno: GitHub Copilot and Tokens (the original professional perspective) &lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://github.com/orgs/community/discussions/categories/github-education" target="_blank" rel="noopener"&gt; GitHub Education Community Discussions &lt;/A&gt; — connect with students and educators worldwide&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;This post draws on insights from &lt;A href="https://elbruno.com/2026/06/04/github-copilot-and-tokens-how-to-keep-using-ai-without-burning-your-budget-in-three-prompts-some-personal-lessons-learned/" target="_blank" rel="noopener"&gt;El Bruno's developer blog&lt;/A&gt; and best practices from &lt;A href="https://education.github.com" target="_blank" rel="noopener"&gt;GitHub Education&lt;/A&gt;. All pricing figures are sourced from the &lt;A href="https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing" target="_blank" rel="noopener"&gt;official GitHub Copilot billing documentation&lt;/A&gt; and are correct as of June 2026.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jun 2026 07:40:42 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/make-your-copilot-credits-count-a-student-s-guide-to-smarter-ai/ba-p/4526668</guid>
      <dc:creator>Lee_Stott</dc:creator>
      <dc:date>2026-06-09T07:40:42Z</dc:date>
    </item>
    <item>
      <title>Communication with parents in Teams / Teams Communitu</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-teams-for-education/communication-with-parents-in-teams-teams-communitu/m-p/4525231#M1187</link>
      <description>&lt;P&gt;We have teams' chats set up with families in Teams Education but for some reason parents (I.e external ) are being removed by ??????&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is causing a lot of issues with messages between sch and home&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you help.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jun 2026 18:14:29 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-teams-for-education/communication-with-parents-in-teams-teams-communitu/m-p/4525231#M1187</guid>
      <dc:creator>FmGonzalez</dc:creator>
      <dc:date>2026-06-03T18:14:29Z</dc:date>
    </item>
    <item>
      <title>Announcing the 2026 Imagine Cup World Champion</title>
      <link>https://techcommunity.microsoft.com/t5/student-developer-blog/announcing-the-2026-imagine-cup-world-champion/ba-p/4524686</link>
      <description>&lt;P&gt;On June 2 at &lt;A class="lia-external-url" href="https://build.microsoft.com/en-US/home" target="_blank" rel="noopener"&gt;Microsoft Build&lt;/A&gt;, student founders from around the world watched as this year’s winner was revealed live on stage. Hosted by &lt;A class="lia-external-url" href="https://www.linkedin.com/in/donasarkar/" target="_blank" rel="noopener"&gt;Dona Sarkar&lt;/A&gt; of Microsoft Enterprise AI Advocacy, the Imagine Cup World Championship brought together a global audience of developers, founders, investors, and technology leaders as &lt;A class="lia-external-url" href="https://www.linkedin.com/in/hanscyang/" target="_blank" rel="noopener"&gt;Hans Yang&lt;/A&gt;, Vice President of Microsoft for Startups and Microsoft Learn Programs, announced the 2026 Imagine Cup World Champion. The final three startups showcased bold solutions to meaningful problems through AI.&lt;/P&gt;
&lt;P&gt;Built by &lt;A class="lia-external-url" href="https://www.linkedin.com/in/patrick-brown8/" target="_blank" rel="noopener"&gt;Patrick Brown&lt;/A&gt; from the University of Oxford, CopyFlag enables creators to protect their original work in the age of generative AI. Using Azure-powered AI models, the platform detects copied and AI-modified content across the internet, helping creators identify infringement faster and automate takedowns at scale.&lt;/P&gt;
&lt;P&gt;The winning startup received $150,000 USD, a mentorship session with &lt;A class="lia-external-url" href="https://www.instagram.com/microsoft/" target="_blank"&gt;Microsoft&lt;/A&gt; Chairman and CEO &lt;A class="lia-external-url" href="https://www.linkedin.com/in/satyanadella/" target="_blank"&gt;Satya Nadella&lt;/A&gt;, and the opportunity to continue growing through &lt;A class="lia-external-url" href="https://www.microsoft.com/en-us/startups" target="_blank" rel="noopener"&gt;Microsoft for Startups&lt;/A&gt;. In addition, the World Champion received an additional 6 months and another $5,000 USD in credits from &lt;A class="lia-external-url" href="https://replit.com/" target="_blank" rel="noopener"&gt;Replit&lt;/A&gt; to continue building what comes next.&lt;/P&gt;
&lt;P&gt;But the World Championship was never just about one moment on stage. It represented months of iteration, collaboration, learning, and resilience from student founders building solutions with real-world impact.&lt;/P&gt;
&lt;P&gt;This year’s finalists showed how students are applying AI across industries including creator protection, healthcare, and supply chain intelligence; not just imagining what technology could become, but actively building it.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Meet the 2026 Imagine Cup World Champion: CopyFlag&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.copyflag.com/" target="_blank" rel="noopener"&gt;CopyFlag&lt;/A&gt; is redefining how creators protect their work online. As AI-generated and modified content continues to scale, creators often lack the tools to identify where original work is being copied, altered, or redistributed.&lt;/P&gt;
&lt;P&gt;CopyFlag uses Azure-based AI models to detect both direct copies and AI-modified derivatives, helping creators trace misuse across platforms and automate enforcement workflows in minutes.&lt;/P&gt;
&lt;P&gt;What began through Patrick’s personal experience of having his own work copied online evolved into a mission to make advanced protection tools accessible beyond the world’s largest brands and enterprises.&lt;/P&gt;
&lt;P&gt;By combining AI, pattern recognition, and scalable detection systems, CopyFlag is building toward a future where creators can focus more on creating and less on defending their work.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Celebrating This Year’s World Finalists&lt;/STRONG&gt;&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;Revora Health&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;EM&gt;Carnegie Mellon University - United States 🇺🇸&lt;/EM&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Founded by S&lt;A class="lia-external-url" href="https://www.linkedin.com/in/suryakukkapalli/" target="_blank" rel="noopener"&gt;urya Kukkapalli&lt;/A&gt;, &lt;A class="lia-external-url" href="https://revora.health/" target="_blank" rel="noopener"&gt;Revora Health&lt;/A&gt; is reimagining physical therapy and recovery through AI-guided care. The platform connects patients to licensed providers in minutes while delivering real-time multimodal support powered through Azure Health Data Services and AI-driven computer vision.&lt;/P&gt;
&lt;P&gt;Built from Surya’s firsthand experience working with clients navigating injury recovery, Revora Health focuses on making specialized care more accessible, connected, and continuous beyond the clinic setting.&lt;/P&gt;
&lt;H5&gt;&amp;nbsp;&lt;/H5&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;SpoilSafe&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;EM&gt;Carnegie Mellon University - United States 🇺🇸&lt;/EM&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Built by &lt;A class="lia-external-url" href="https://www.linkedin.com/in/advikav/" target="_blank" rel="noopener"&gt;Advika Vuppala&lt;/A&gt;, &lt;A class="lia-external-url" href="https://www.linkedin.com/in/rohanganesh7/" target="_blank" rel="noopener"&gt;Rohan Ganesh&lt;/A&gt;, and &lt;A class="lia-external-url" href="https://www.linkedin.com/in/troy-mcbride-6a29a1300/" target="_blank" rel="noopener"&gt;Troy McBride&lt;/A&gt;; &lt;A class="lia-external-url" href="https://spoilsafe.tech/" target="_blank" rel="noopener"&gt;SpoilSafe&lt;/A&gt; is bringing predictive intelligence into the food supply chain.&lt;/P&gt;
&lt;P&gt;Using sensor data processed through Azure IoT Hub, Azure Data Explorer, Azure Synapse Analytics, and Azure AI Foundry, SpoilSafe helps teams estimate freshness in real time, enabling smarter operational decisions and reducing food waste before spoilage occurs.&lt;/P&gt;
&lt;P&gt;Their work represents a shift from reactive decision-making toward more resilient, data-driven systems across the global food ecosystem.&lt;/P&gt;
&lt;H4&gt;&amp;nbsp;&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;More Than a Competition&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://imaginecup.microsoft.com/en-us" target="_blank" rel="noopener"&gt;Imagine Cup&lt;/A&gt; is a global stage for student founders building commercially viable startups on Microsoft’s cloud and AI platforms.&lt;/P&gt;
&lt;P&gt;From mentorship and technical skilling to investor visibility and startup support, the program helps students move from ideas to scalable businesses while learning how modern teams build, iterate, and grow. Through Microsoft for Startups, founders continue receiving the tools, support, and opportunities needed to keep building beyond the competition.&lt;/P&gt;
&lt;P&gt;The future is being built by students bold enough to solve problems that are good for business and good for the world. And this year’s finalists proved what’s possible when great ideas meet the right mentorship, technology, and momentum.&lt;/P&gt;
&lt;P&gt;Think you could be next? The 2027 Imagine Cup is officially underway. Use this summer to build your MVP, refine your idea, and explore what it takes to turn your startup into something bigger. &lt;A class="lia-external-url" href="https://imaginecup.microsoft.com/en-us" target="_blank" rel="noopener"&gt;Learn more&lt;/A&gt; and get started today.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 16:31:19 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/student-developer-blog/announcing-the-2026-imagine-cup-world-champion/ba-p/4524686</guid>
      <dc:creator>StudentDeveloperTeam</dc:creator>
      <dc:date>2026-06-02T16:31:19Z</dc:date>
    </item>
    <item>
      <title>Evaluating the Evaluator: How to Test an LLM Judge with Microsoft Agent Framework</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/evaluating-the-evaluator-how-to-test-an-llm-judge-with-microsoft/ba-p/4516639</link>
      <description>&lt;P&gt;&lt;STRONG&gt;The four verdicts, up front&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;PRE&gt;Consistency: mean CV across posts &lt;STRONG&gt;5.30%&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;PRE&gt;Pipeline format checks: pipeline pass rate &lt;STRONG&gt;100%&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;PRE&gt;Rubric adherence (strict judge): &lt;STRONG&gt;5.00 / 5&lt;/STRONG&gt;, mean math drift &lt;STRONG&gt;0.05 pts&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;PRE&gt;Calibration vs. labels: &lt;STRONG&gt;Pearson r = 0.51, MAE = 22.9 pts&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Three of those say the model is healthy. The last one is the only one that compared the model against anything real, and it tells a different story.&lt;/P&gt;
&lt;H2&gt;Where we left off&lt;/H2&gt;
&lt;P&gt;In &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/creating-a-fun-multi-agent-content-strategy-system-with-microsoft-agent-framewor/4495105" data-lia-auto-title="Post 1" data-lia-auto-title-active="0" target="_blank"&gt;Post 1&lt;/A&gt; I built &lt;STRONG&gt;Viral or Fail&lt;/STRONG&gt;, three Microsoft Agent Framework agents that pressure-test a gaming social post before you publish it. A Content Creator drafts the post, an Algorithm Simulator scores it the way a recommendation system might, and an Audience Persona reacts the way an actual viewer would. The whole thing runs on the GitHub Models free tier, with no paid API keys.&lt;/P&gt;
&lt;P&gt;That post ended on a cliffhanger I left deliberately open. The Algorithm Simulator scored the post 75/100, but how do I know the Simulator itself is any good? How consistent are its scores? Do they track real engagement? Would a human social strategist agree with its rubric weights?&lt;/P&gt;
&lt;P&gt;This post answers that empirically. I built four tests: consistency, pipeline format checks, rubric adherence, and calibration. Three came back healthy. The fourth caught a problem structural enough that it changed how I think about evaluating LLM judges in general.&lt;/P&gt;
&lt;P&gt;The surprising part isn't that the model failed somewhere but that it passed the three tests you naturally reach for first, and only failed the one most will skip.&lt;/P&gt;
&lt;H2&gt;Why I built my own harness&lt;/H2&gt;
&lt;P&gt;The Microsoft Agent Framework ships a real evaluation surface. You get evaluate_agent, LocalEvaluator, an @evaluator decorator, and the EvalItem / EvalResults data types. It's well designed, and for production agents it's the right choice.&lt;/P&gt;
&lt;P&gt;It also pairs most naturally with Azure AI Foundry. The path of least resistance assumes you already have an Azure project, a model deployment, and the budget for cloud-tier LLM-as-judge calls. Post 1 went the other way on purpose: zero paid keys, GitHub Models free tier only. To keep that footing, I wrote a small in-house harness that mirrors the call shape of evaluate_agent.&lt;/P&gt;
&lt;P&gt;The framework's evaluation surface is provider-agnostic in principle, but it leans toward Azure in practice. What the SDK hands you for free on Azure, you can rebuild for yourself on GitHub Models in as you would see shortly, and the patterns transfer directly when you upgrade.&lt;/P&gt;
&lt;P&gt;The harness is one file, roughly 150 lines. The trick that makes it more than a wrapper is that it tries to import the SDK's primitives first and only defines its own if they aren't there yet:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;try:
    from agent_framework import EvalItem, EvalResults, evaluator
    _USING_SDK_PRIMITIVES = True
except ImportError:
    # agent-framework-core==1.0.0rc1 doesn't ship these yet,
    # so we define local equivalents with the same shape.
    @dataclass
    class EvalItem:
        query: str
        response: str
        expected_output: str | None = None
        scores: dict[str, float] = field(default_factory=dict)
        repetition: int = 0
    # ... EvalResults, evaluator defined the same way&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The day Microsoft ships these types, the suite picks them up with no code change. An evaluator looks like this:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;@evaluator
def correlates_with_truth(response: str, expected_output: str) -&amp;gt; float:
    sim = parse_weighted_total(response)
    if sim is None or expected_output is None:
        return 0.0
    truth = float(expected_output)
    return 1.0 - (abs(sim - truth) / 100.0)&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If you've used the SDK's @evaluator, you've used this one. Same parameter-name dispatch (query, response, expected_output), same return convention (a float in [0, 1]). The runner wraps a retry-aware async loop around a list of these. GitHub Models caps this model at about 15 requests per minute, so the loop sleeps 4.5 seconds between calls (12 a minute, comfortably under the cap). When it does hit a 429 it waits 30 seconds and up, rather than the short exponential backoff it uses for ordinary transient failures. Boring glue code, and important glue code.&lt;/P&gt;
&lt;P&gt;When you eventually move to Azure, you swap runner.run(...) for evaluate_agent(...) and nothing else in your codebase has to change.&lt;/P&gt;
&lt;H2&gt;What 'good' even means for a judge&lt;/H2&gt;
&lt;P&gt;Before running anything, it's worth being precise about what "good" even means for a judge agent. There are four versions of it, and they split into two camps.&lt;/P&gt;
&lt;P&gt;The first three are process checks. They probe the model against itself. No external reference data, just the model and its own outputs.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Consistency&lt;/EM&gt; means same input, same output. Run the Simulator twice on the same post and the scores should land in roughly the same place. If they don't, the score is noise.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Pipeline format checks&lt;/EM&gt; ask whether each agent followed its required output shape. Did the Creator produce platform-native text? Did the Simulator emit a parseable weighted total? Did the Persona stay in character? These are the cheapest tests of all, just regex and keyword matching, no LLM judge needed.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Rubric adherence&lt;/EM&gt; is harder. The Simulator's prompt asks it to score five weighted criteria and report a weighted total. Did it actually do that, or did it list the criteria and then invent a number? Checking this needs an LLM. The cloud-tier equivalent is FoundryEvals.TaskAdherence, and I'll build the free-tier version below.&lt;/P&gt;
&lt;P&gt;The fourth check is a different animal. Validation against ground truth&lt;STRONG&gt;.&lt;/STRONG&gt; Calibration asks whether the Simulator's scores correlate with real engagement. It's the same operation you'd run on any predictive model: predict, compare against a labeled set, report correlation and error. It's the only check that tells you the model is &lt;EM&gt;correct&lt;/EM&gt; rather than merely consistent and well-formatted, and it's the only one that needs data the model didn't produce.&lt;/P&gt;
&lt;P&gt;That's the thesis of this post, and the reason the order matters. The three process checks can all come back green and still tell you nothing about the validation result. And because validation needs ground truth, the design of the ground-truth dataset becomes part of the result. I'll be explicit about that when we get there.&lt;/P&gt;
&lt;H2&gt;The posts under test&lt;/H2&gt;
&lt;P&gt;Every test runs against the same thing: a 10-post golden dataset of gaming social posts I wrote and hand-labeled. Each entry carries the post content, its real-world engagement numbers, a normalized engagement_score from 0 to 100, and a label (viral, decent, flop, or outlier). Here's the viral Valorant post that Test 1 keeps referencing, in full:&lt;/P&gt;
&lt;P&gt;json&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="json"&gt;{
  "id": "post_001",
  "platform": "Twitter/X",
  "topic": "Valorant Champions 2025",
  "content": "EG winning Champions 2025 was the most underrated moment in Valorant esports history and people still don't talk about it enough.\n\nDemon1 carried that grand final on a level we won't see again until at least Champions '26. The map veto into Bind alone deserves a documentary.",
  "real_engagement": {
    "impressions": 2100000,
    "likes": 45000,
    "shares_or_retweets": 8000,
    "replies_or_comments": 1200,
    "engagement_rate_pct": 2.58
  },
  "engagement_score": 82,
  "label": "viral",
  "notes": "Hot take + esports nostalgia + named callout (Demon1) drove QRTs from competing fanbases."
}&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The full set is in the repo at &lt;A href="https://github.com/HamidOna/viral-or-fail/blob/main/evals/golden_dataset.json" target="_blank" rel="noopener"&gt;evals/golden_dataset.json&lt;/A&gt;: two viral hits, four decent posts, three flops, and one outlier, across Twitter/X, TikTok, YouTube, and Instagram.&lt;/P&gt;
&lt;H2&gt;Test 1: Consistency&lt;/H2&gt;
&lt;P&gt;The easiest test to write. Run the Simulator ten times on the same post with identical input. Compute the mean, standard deviation, and coefficient of variation. Repeat across five posts spanning viral, decent, flop, and outlier labels.&lt;/P&gt;
&lt;P&gt;The harness call is one line:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;runner = EvalRunner(rate_limit_sleep=4.5)   # 12 RPM, under the cap

results = await runner.run(
    agent=agent,
    queries=[_build_simulator_prompt(p) for p in selected],
    evaluators=[weighted_total_score],       # parses the score out of each run
    num_repetitions=NUM_REPETITIONS,         # 10
)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Fifty Simulator runs in total. Group by query, compute std/mean per post, then average the resulting CVs.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Mean coefficient of variation across the five posts: 5.30%. With the rubric pinned, the Simulator is meaningfully non-deterministic, but it isn't chaotic. Most scores cluster within about four points of the per-post mean.&lt;/P&gt;
&lt;P&gt;That's the headline, and it's fine.&lt;/P&gt;
&lt;P&gt;Now look at the chart again. post_001 (the viral Valorant Champions post, mean 70.3) and post_003 (the decent Steam Deck OLED post, mean 72.4) sit at almost the same place on the x-axis. The decent post averages slightly &lt;EM&gt;higher&lt;/EM&gt; than the viral one. Across ten reps each. Twenty data points, and the Simulator can't reliably tell which one is supposed to be the success. If you trace the mean diamonds left to right, the decent post outranks both viral posts.&lt;/P&gt;
&lt;P&gt;A consistency test won't flag this as a problem, because the Simulator is being consistent. It consistently rates these two posts in the same band. The problem is what that band &lt;EM&gt;means&lt;/EM&gt;. If consistency were your only check, you'd close the laptop and ship.&lt;/P&gt;
&lt;P&gt;Hold onto that. It comes back.&lt;/P&gt;
&lt;H2&gt;Test 4: Pipeline format checks&lt;/H2&gt;
&lt;P&gt;Now zoom out from a single agent and run the full Viral or Fail pipeline (Creator, then Simulator, then Persona) on five live trending gaming topics, applying format-level checks to each agent's output.&lt;/P&gt;
&lt;P&gt;The checks are deliberately cheap. For the Creator: does the output contain Twitter/X-native vocabulary (the keyword list looks for things like &lt;EM&gt;thread&lt;/EM&gt;, &lt;EM&gt;ratio&lt;/EM&gt;, &lt;EM&gt;QRT&lt;/EM&gt;, &lt;EM&gt;take&lt;/EM&gt;, &lt;EM&gt;based&lt;/EM&gt;)? For the Simulator: is there a parseable weighted total between 0 and 100? For the Persona (TryHard_Tyler, the competitive esports fan, in this run): does the output use any of the persona's keywords, like &lt;EM&gt;diff&lt;/EM&gt;, &lt;EM&gt;cope&lt;/EM&gt;, &lt;EM&gt;goated&lt;/EM&gt;, &lt;EM&gt;ratio&lt;/EM&gt;, &lt;EM&gt;cap&lt;/EM&gt;?&lt;/P&gt;
&lt;P&gt;Five topics, fetched live from Google Trends: xbox game pass, the hobbit mtg collector booster, crimson desert patch notes, xbox, olden era steam.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Per-agent pass rate: Creator 100%, Simulator 100%, Persona 100%. Pipeline pass rate 100%.&lt;/P&gt;
&lt;P&gt;The format checks are doing their job. Every agent produced output in the shape it was supposed to, on every topic. No regex misses, no missing weighted totals, no out-of-character personas.&lt;/P&gt;
&lt;P&gt;This is the point where, if you'd only run consistency and pipeline checks, you'd write the triumphant report. "Our agents are reliable. CV under 6%. Pipeline pass rate 100%. Ship it." That report would be true. It would also be wrong about whether the model is correct, because format adherence is not output validity. Keep going.&lt;/P&gt;
&lt;H2&gt;Test 3: Rubric adherence, and a free-tier LLM-as-judge&lt;/H2&gt;
&lt;P&gt;Format checks tell you what the output &lt;EM&gt;looks&lt;/EM&gt; like. Rubric adherence asks whether the Simulator actually did the work it was prompted to do: score five weighted criteria, sum them correctly, and explain each score with platform-mechanic reasoning rather than vibes.&lt;/P&gt;
&lt;P&gt;There's no regex for that. You need an LLM to read the Simulator's full evaluation and judge whether it followed its own rubric. That's an LLM-as-judge, and the cloud-tier equivalent is FoundryEvals.TaskAdherence on Azure. Since we're staying free, I built it.&lt;/P&gt;
&lt;P&gt;The judge is just another Agent with a stricter system prompt:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;JUDGE_SYSTEM_PROMPT = """You are a Rubric Adherence Judge — strict and skeptical. You evaluate whether another AI agent ACTUALLY followed its scoring rubric, not just whether it produced output that looks like it did.

You will check three things, in order of severity (the strictest failing check sets the score):

A. MATHEMATICAL FIDELITY (most important).
   Compute sum(criterion_score × weight) yourself from the agent's per-criterion scores. Compare it to the agent's stated WEIGHTED TOTAL. If they differ by more than 2 points, the agent is doing the rubric wrong even if it looks correct on the surface. Report the difference as `math_diff`.

B. REASONING SPECIFICITY.
   Each criterion's justification must reference platform-specific algorithm mechanics — "FYP retention threshold", "QRT velocity", "average view duration". Generic praise ("strong hook", "good engagement") is GENERIC and lowers the score. Classify reasoning as "specific", "mixed", or "generic".

C. COVERAGE.
   Every criterion in the rubric must be explicitly scored. Missing criteria fail this check.
...
Be strict. Format-following ≠ rubric-following."""&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The full prompt is in the repo. The key decision is point A: the judge recomputes the math itself. That catches the failure where an agent lists every criterion with a score, but the weighted total it reports doesn't actually equal the weighted sum. That kind of quiet drift is exactly what format checks miss.&lt;/P&gt;
&lt;P&gt;The judge returns strict JSON: adherence_score (1 to 5), math_diff, reasoning_quality, criteria_present, missing_criteria, weight_drift, plus a few sentences of reasoning. Test 3 doesn't go through runner.run; it orchestrates the two agents by hand, one post at a time, so the judge sees the Simulator's full evaluation:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;for post in posts:
    sim_text = await call_agent_with_retry(simulator, build_simulator_prompt(post))
    verdict = await judge.judge(
        rubric=PLATFORM_RULES[post["platform"]],
        post_content=post["content"],
        evaluation_output=sim_text,
    )&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Run across all 10 posts in the golden dataset, here's what comes back.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Mean adherence score: 5.00 / 5. Mean absolute math drift: 0.05 points (max 0.25). Reasoning quality classified "specific" on 100% of evaluations. Zero missing criteria, zero weight drift.&lt;/P&gt;
&lt;P&gt;This was not the result I expected. I built the judge to be strict on purpose, after my first version turned out too lenient (more on that in the bugs section). The strict version recomputes the weighted sum, classifies generic praise as a failure, and demands platform-mechanic citations. The Simulator passed every dimension anyway.&lt;/P&gt;
&lt;P&gt;The per-post reasoning is genuinely fun to read. On the Activision Blizzard flop, the judge noted that the Simulator's reasoning leaned on engagement velocity, quote-retweet incentives, topicality timing, and hashtag discoverability rather than generic praise. On the GTA 6 viral TikTok, it cited pattern interrupts, trending-cluster signals, and share-velocity drivers. That's the language I asked for, and the Simulator is producing it.&lt;/P&gt;
&lt;P&gt;So the Simulator does the rubric correctly. The math is right, the reasoning is specific, every criterion is covered. By every internal measure, it works.&lt;/P&gt;
&lt;P&gt;You can probably see where this is going. There's exactly one thing left to check, and it's the most expensive and most important one.&lt;/P&gt;
&lt;H2&gt;Test 2: Calibration, the reckoning&lt;/H2&gt;
&lt;P&gt;This one isn't a test in the same sense as the first three. They asked whether the model was malfunctioning. This asks whether it's &lt;EM&gt;correct&lt;/EM&gt;, which is a different question entirely, because it's the only one that needs data the model didn't produce.&lt;/P&gt;
&lt;P&gt;And because it's a validation, what I validate against matters as much as the model. So before running anything, here's exactly what the ground truth is: a 10-post golden dataset that I &lt;EM&gt;built&lt;/EM&gt;, not measured. I wrote the post content myself in platform-native style, then assigned each post an engagement_score from a back-of-envelope formula (impressions x engagement rate x shareability), calibrated against publicly observable performance for similar posts. The set spans two viral hits, four decent posts, three flops, and one deliberate outlier (a post that got ratio'd into orbit, with high reach and terrible reception).&lt;/P&gt;
&lt;P&gt;So when I show you a Pearson r in a moment, hold it loosely. The exact number is partly a function of how I designed the labels. The &lt;EM&gt;shape&lt;/EM&gt; of the failure (whether the Simulator's predictions cluster, spread, invert, or track the labels) is what's actually informative, because the shape doesn't depend on the labels being precise. It only depends on them being roughly ordered: viral out-ranks decent, decent out-ranks flop. Whether viral is 91 against flop 18, or viral 85 against flop 25, doesn't change which way the comparison runs.&lt;/P&gt;
&lt;P&gt;With that on the table: run the Simulator once per post, compute Pearson r and Spearman rho, compute MAE.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Pearson r = 0.51. Spearman rho = 0.52. MAE = 22.9 points.&lt;/P&gt;
&lt;P&gt;That r-value isn't a small problem. Here's what it means in practice, post by post:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Post&lt;/th&gt;&lt;th&gt;Topic&lt;/th&gt;&lt;th&gt;Label&lt;/th&gt;&lt;th&gt;Truth&lt;/th&gt;&lt;th&gt;Simulator&lt;/th&gt;&lt;th&gt;Error&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;001&lt;/td&gt;&lt;td&gt;Valorant Champions 2025&lt;/td&gt;&lt;td&gt;viral&lt;/td&gt;&lt;td&gt;82&lt;/td&gt;&lt;td&gt;69.75&lt;/td&gt;&lt;td&gt;12.25&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;002&lt;/td&gt;&lt;td&gt;GTA 6 reveal reaction&lt;/td&gt;&lt;td&gt;viral&lt;/td&gt;&lt;td&gt;91&lt;/td&gt;&lt;td&gt;65.75&lt;/td&gt;&lt;td&gt;25.25&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;003&lt;/td&gt;&lt;td&gt;Steam Deck OLED price&lt;/td&gt;&lt;td&gt;decent&lt;/td&gt;&lt;td&gt;55&lt;/td&gt;&lt;td&gt;71.00&lt;/td&gt;&lt;td&gt;16.00&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;004&lt;/td&gt;&lt;td&gt;Genshin Impact 5.0 pulls&lt;/td&gt;&lt;td&gt;decent&lt;/td&gt;&lt;td&gt;48&lt;/td&gt;&lt;td&gt;65.00&lt;/td&gt;&lt;td&gt;17.00&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;005&lt;/td&gt;&lt;td&gt;Hollow Knight: Silksong&lt;/td&gt;&lt;td&gt;decent&lt;/td&gt;&lt;td&gt;60&lt;/td&gt;&lt;td&gt;76.50&lt;/td&gt;&lt;td&gt;16.50&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;006&lt;/td&gt;&lt;td&gt;Xbox Showcase 2025&lt;/td&gt;&lt;td&gt;decent&lt;/td&gt;&lt;td&gt;42&lt;/td&gt;&lt;td&gt;74.75&lt;/td&gt;&lt;td&gt;32.75&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;007&lt;/td&gt;&lt;td&gt;Activision Blizzard acquisition&lt;/td&gt;&lt;td&gt;flop&lt;/td&gt;&lt;td&gt;18&lt;/td&gt;&lt;td&gt;59.50&lt;/td&gt;&lt;td&gt;41.50&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;008&lt;/td&gt;&lt;td&gt;5 games to play this weekend&lt;/td&gt;&lt;td&gt;flop&lt;/td&gt;&lt;td&gt;22&lt;/td&gt;&lt;td&gt;37.00&lt;/td&gt;&lt;td&gt;15.00&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;009&lt;/td&gt;&lt;td&gt;Pentiment retrospective&lt;/td&gt;&lt;td&gt;flop&lt;/td&gt;&lt;td&gt;15&lt;/td&gt;&lt;td&gt;60.75&lt;/td&gt;&lt;td&gt;45.75&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;010&lt;/td&gt;&lt;td&gt;Concord shutdown post-mortem&lt;/td&gt;&lt;td&gt;outlier&lt;/td&gt;&lt;td&gt;50&lt;/td&gt;&lt;td&gt;57.00&lt;/td&gt;&lt;td&gt;7.00&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;The pattern is structural. The Simulator's natural output band is roughly 60 to 76. Posts that should clear 80 get pulled down to 65 to 70. Posts that should land below 25 get pulled up toward 60, with one flop (post_008, the "5 games this weekend" listicle) the only exception at 37. The model has an attractor zone in the middle of the scale and refuses to leave it.&lt;/P&gt;
&lt;P&gt;Look at the most accurate prediction in the table. It's post_010, the outlier (truth 50, Simulator 57, error 7). Why is it the most accurate? Because 50 happens to sit inside the attractor zone. The Simulator's bias accidentally cancels out for posts that are supposed to be average. It isn't accurate, it's wrong in a way that lands near the truth for one specific case.&lt;/P&gt;
&lt;P&gt;This was the test I almost didn't run. It needs labeled data, which is annoying to gather, and three out of four tests had already declared the model healthy. By every internal measure, the Simulator was working as designed.&lt;/P&gt;
&lt;P&gt;It just couldn't tell viral from decent. It rated the GTA 6 reaction TikTok (truth 91) at 66, and the Steam Deck OLED post (truth 55) at 71. The model is consistent, rubric-faithful, and format-stable, and on real cases it literally inverts virality and decentness.&lt;/P&gt;
&lt;P&gt;The shape of that failure (flops pulled up hard, by 15, 42, and 46 points; virals pulled down by 12 and 25; the whole range collapsed into a narrow band) is what survives the synthetic-label uncertainty. If the labels were simply inaccurate, you'd see scatter. A symmetric squeeze toward the middle requires the Simulator itself to be conservative. The Pearson r of 0.51 (p around 0.13, not significant on n = 10) is the number to hold loosely. The squeeze is the result. Running this against measured engagement metrics is the natural Post 3, and I'd expect the qualitative finding to hold.&lt;/P&gt;
&lt;H2&gt;Bugs the suite caught along the way&lt;/H2&gt;
&lt;P&gt;This is something I want to keep doing in my write-ups. I usually publish the clean, glamorous version (here's what I built, here's what I learned, the end), which quietly erases the bugs that taught me the most about how the system actually behaves. So here are three real ones the eval suite caught while I was running it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The production parser regex was silently failing.&lt;/STRONG&gt; Post 1's viral_or_fail.py extracts the Simulator's weighted total with a regex like Weighted\s*Total[^\n]+. That works for same-line layouts (Weighted Total: 73/100). It does not work for the multi-line layout the model produces about half the time:&lt;/P&gt;
&lt;PRE&gt;**WEIGHTED TOTAL:** &lt;BR /&gt;&lt;BR /&gt;= 22.5 + 15 + 14 + 12.75 + 9 =&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;**73.25/100**&lt;/PRE&gt;
&lt;P&gt;When the regex misses, the production code silently falls back to a default of 50. Which means the public Viral or Fail demo had been quietly showing readers 50/100 on many of its runs since Post 1 went live. The eval suite caught it on the very first call: parse_weighted_total returned None, the harness logged it loudly, and the bug had nowhere to hide. The fix strips the bold markers, finds the header, then scans a few non-blank lines past it, preferring N/100, then a trailing = N, then the first number it sees:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;clean = response.replace("**", "")
header = _WT_HEADER_RE.search(clean)          # r"Weighted\s*Total\s*:?"
if not header:
    return None
after = clean[header.end():]
window = []
for raw in after.splitlines()[: _WT_LOOKAHEAD_LINES + 1]:
    line = raw.strip()
    if not line and window:
        break
    if line:
        window.append(line)
blob = " ".join(window)
# prefer "N/100", then a trailing "= N", then the first number found&lt;/LI-CODE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;That regex hunt alone justified the whole exercise.&lt;/P&gt;
&lt;P&gt;The Google Trends "Games" topic is contaminated. Test 4 originally fetched live trending topics and got back "kentucky derby 2026", "kentucky oaks", and "fanduel" alongside the actual gaming. The cause: Google's taxonomy bundles horse racing, gambling, and sportsbooks under the same Games topic ID it uses for video games, and the trends_tool.py filter from Post 1 was matching on that topic ID alone. The fix was a two-layer filter: require the games topic &lt;EM&gt;and not&lt;/EM&gt; topic 17 (Sports), plus a small denylist for gambling keywords. Now the results come back as xbox game pass, crimson desert, and the hobbit mtg collector booster, with no horse racing.&lt;/P&gt;
&lt;P&gt;The first version of the judge was too lenient. My initial RubricAdherenceJudge rewarded "every criterion explicitly scored." But the Simulator's system prompt forces exactly that, so the judge handed out 5/5 trivially across all 10 posts and told me nothing. I tightened it to recompute the weighted sum and report math_diff, and to classify reasoning as specific, mixed, or generic based on whether justifications cite platform mechanics. Even under the strict judge the Simulator still scored 5/5, but now I'd earned that result instead of getting it for free.&lt;/P&gt;
&lt;H2&gt;Why this matters in production&lt;/H2&gt;
&lt;P&gt;I built four tests to evaluate the Algorithm Simulator from Post 1. Three of them (consistency, rubric adherence, pipeline format checks) declared it healthy. The fourth, calibration, compared its scores against labeled engagement and found systematic bias: the predictions are squashed into a narrow band regardless of how the post actually performed. A flop with engagement of 18 gets a 60. A viral hit with engagement of 91 gets a 66. The model isn't broken in any visible way. It's just consistently, faithfully, formally wrong. That's exactly why validation against ground truth isn't optional. It's the only check that catches a model doing everything right except being correct.&lt;/P&gt;
&lt;P&gt;Format, consistency, and rubric-coverage tests tell you the model isn't malfunctioning. They cannot tell you it's correct. They test the process, and only validation tests the output. A model can have a flawless process and still produce numbers that don't track reality.&lt;/P&gt;
&lt;P&gt;Now zoom out. Viral or Fail's Simulator is low stakes. Worst case, a creator publishes a post the Simulator liked and it flops. Embarrassing, not dangerous. The same failure at higher stakes is dangerous, and the same shape shows up everywhere in production AI. Ask a language model to be "objective" and it hedges toward the middle. Content moderation agents under-flag clearly harmful content and over-flag clearly benign content, because both extremes feel risky to the model. Resume screeners compress every candidate into a 60-to-80 band and call the lack of spread "fairness." Code-review bots return a comfortable 7/10 on a PR with real problems and on a PR with none. Support routing labels almost everything "medium priority" and quietly breaks the downstream automation that relied on the signal meaning something.&lt;/P&gt;
&lt;P&gt;Each of those has shipped in real deployments and then underperformed for months before anyone noticed. The teams weren't careless. They had observability, CI, process checks. What they lacked was a labeled validation set. And without one, a confidently miscalibrated model looks identical to a working one. A model that's wrong randomly gets caught, because outliers get flagged and reviewed. A model that's wrong &lt;EM&gt;consistently&lt;/EM&gt; gets trusted, because it never trips an alarm. Once a downstream product depends on the miscalibrated output, the bias gets amplified at scale.&lt;/P&gt;
&lt;P&gt;Most production AI systems are not validated this way. Most LLM-as-judge components in agentic systems have never had their predictions compared against any external ground truth at all. And when something does feel off, teams reach for fine-tuning. But you can't fine-tune what you haven't characterized, and characterization is exactly what calibration testing produces. Without it, fine-tuning is guesswork in an engineering costume. "It works in eval" usually means it passed process checks, which is not the same thing as working.&lt;/P&gt;
&lt;P&gt;So evaluation is a discipline, not a phase. It belongs in the same loop as deployment, not as a one-off before launch. Internal-process checks belong in CI. Validation against labels belongs on a schedule. Both should alert when they regress, and both should be visible to the people accountable for the model's decisions.&lt;/P&gt;
&lt;P&gt;If there's one thing to take from this post: build a validation step into your eval suite from day one, even with synthetic labels, and especially if you can't get measured ones yet. Process tests keep you safe from regressions. Only the validation step keeps you honest about whether the model is right.&lt;/P&gt;
&lt;H2&gt;What's next: the cloud-tier upgrade path&lt;/H2&gt;
&lt;P&gt;Everything here runs on the GitHub Models free tier. That's deliberate, and it also means I've built the free-tier version of three things Microsoft already does better at production scale.&lt;/P&gt;
&lt;P&gt;The first is &lt;STRONG&gt;FoundryEvals&lt;/STRONG&gt; in agent_framework_azure_ai. My RubricAdherenceJudge is a homemade FoundryEvals.TaskAdherence. Foundry's version uses Azure-hosted judges on a managed pipeline, with calibration handled internally and a portal for tracking runs over time. Same structural test, but operationally serious. The same idea applies to Relevance, Coherence, Groundedness, IntentResolution, and the rest of the catalogue. If you've built the harness from this post, swapping it for evaluate_agent plus FoundryEvals is mostly an import change.&lt;/P&gt;
&lt;P&gt;The second is the &lt;STRONG&gt;AI Red Teaming Agent&lt;/STRONG&gt;. I didn't run any safety evaluation in this suite. The Audience Persona is the agent most likely to drift into unsafe territory, and the natural counterpart to quality evaluation is adversarial probing with PyRIT. The AI Red Teaming Agent wires that straight into Foundry. That's a Post 4.&lt;/P&gt;
&lt;P&gt;The third is &lt;STRONG&gt;observability&lt;/STRONG&gt;. DevUI gives you real-time visualization of agent sessions, and OpenTelemetry traces flow into Azure Monitor. Both earn their keep when an eval flags a regression and you need to walk back through the failing run to find the cause.&lt;/P&gt;
&lt;P&gt;And then there's Post 3: the calibration test against real engagement data. If you have a Twitter, YouTube, or TikTok dataset with both post content and post-hoc engagement metrics, and you'd be open to collaborating, I'd love to hear from you.&lt;/P&gt;
&lt;P&gt;The full eval suite is on GitHub: &lt;A href="https://github.com/HamidOna/viral-or-fail" target="_blank" rel="noopener"&gt;github.com/HamidOna/viral-or-fail&lt;/A&gt;. Run pip install -r requirements.txt, set GITHUB_TOKEN, and run python -m evals.run_all. Six to eight minutes start to finish on the free tier. The suite runs, the JSONs write, the plots render, and you'll see the same thing I did: the easy tests will tell you everything is fine.&lt;/P&gt;
&lt;P&gt;The last test will tell you what's actually happening.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 13:09:41 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/evaluating-the-evaluator-how-to-test-an-llm-judge-with-microsoft/ba-p/4516639</guid>
      <dc:creator>Abdulhamid_Onawole</dc:creator>
      <dc:date>2026-06-02T13:09:41Z</dc:date>
    </item>
    <item>
      <title>OneNote Future?</title>
      <link>https://techcommunity.microsoft.com/t5/class-notebook/onenote-future/m-p/4524560#M490</link>
      <description>&lt;P&gt;MS has been putting out a lot of new apps, but OneNote seems to be left out to pasture. Are there any development roadmaps for OneNote? There are some other notetaking apps in the market: Notion, Obsidian, etc. If there are any MS OneNote devs on this thread, I'd be interesting if I should cut bait or keep my notes in OneNote.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 17:26:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/class-notebook/onenote-future/m-p/4524560#M490</guid>
      <dc:creator>RobertoFoster1971</dc:creator>
      <dc:date>2026-06-01T17:26:26Z</dc:date>
    </item>
    <item>
      <title>Making Academic Standards More Accessible</title>
      <link>https://techcommunity.microsoft.com/t5/education-blog/making-academic-standards-more-accessible/ba-p/4523719</link>
      <description>&lt;H3&gt;Why standards matter&lt;/H3&gt;
&lt;P&gt;Academic standards are the shared language that connects curriculum, instruction, and assessment. When educators can easily access and apply them:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Lesson planning becomes more intentional.&lt;/STRONG&gt; You design instruction around clear learning goals rather than guessing what to cover.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Assessment aligns with instruction.&lt;/STRONG&gt; Quizzes, rubrics, and assignments reflect what students are actually expected to demonstrate.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;AI-powered tools become more relevant.&lt;/STRONG&gt; AI-generated content is grounded in real curriculum expectations, not generic suggestions.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Collaboration improves.&lt;/STRONG&gt; Teachers across grade levels and departments can speak the same language about what students should know and do.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;How Microsoft Education uses standards&lt;/H3&gt;
&lt;P&gt;Standards are woven into the experiences educators use every day.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In the Teach module and Microsoft 365 LTI&lt;/STRONG&gt;, educators can align lesson plans to specific standards by location, subject, and grade band, use the "Align to Standards" tool to refine lesson instructions, and generate quizzes and rubrics grounded in standards.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In Assignments in Teams for Education and Microsoft 365 LTI&lt;/STRONG&gt;, educators can tag assignments with curriculum expectations, build standards-aligned rubrics, and create a clear thread from instruction to assessment.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Across AI-powered workflows&lt;/STRONG&gt;, standards can serve as grounding data that helps make generated lesson plans, quizzes, and rubrics more relevant to real curriculum expectations. This reflects Microsoft’s broader approach to AI in education: using AI to support educators with useful, contextual assistance while helping institutions maintain alignment with their instructional goals, policies, and professional judgment. Educators can select standards by location, subject, and language.&lt;/P&gt;
&lt;H3&gt;Expanding coverage through partnership with EdGate&lt;/H3&gt;
&lt;P&gt;Making standards useful in digital tools globally requires more than a large catalog. It requires structured, machine-readable data, ongoing maintenance, and a partner with deep expertise in education standards. EdGate has spent years building and maintaining one of the largest catalogs of digitized standards in education technology. Microsoft partnered with EdGate to help make that infrastructure more accessible inside the workflows educators and institutions already use.&lt;/P&gt;
&lt;P&gt;Through this partnership, Microsoft has significantly expanded the set of standards EdGate offers, especially internationally. Together, we have grown coverage to include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;All 50 U.S. states&lt;/STRONG&gt;, including Common Core, NGSS, and state-specific frameworks&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;70+ countries&lt;/STRONG&gt;, with international standards covering core subjects, vocational education, and qualification frameworks&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Hundreds of supplemental frameworks&lt;/STRONG&gt;, from career and technical education to world languages and the arts&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;We continue to expand coverage with new international standards rolling out regularly.&lt;/P&gt;
&lt;P&gt;EdGate offers access to over 5 million standard statements, aggregating and normalizing global standards for consistent delivery across platforms. Their capabilities include a comprehensive standards catalog, standards authoring tools used by ministries of education, API-based access for platform integration, and certified CASE 1.1 compliance.&lt;/P&gt;
&lt;P&gt;Microsoft and EdGate are partnering to make a select set of standards freely available to education institutions, lowering barriers for educators and developers who want to explore standards-aligned workflows without a commercial commitment. To expand the impact even further, EdGate is piloting a project in 1EdTech's CASE Global Ecosystem initiative, to demonstrate how interoperable, machine-readable frameworks can improve the discoverability, alignment, and portability of learning and credentialing data across platforms, institutions and borders.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;The CASE format: Why it matters&lt;/H3&gt;
&lt;P&gt;CASE stands for &lt;STRONG&gt;Competencies and Academic Standards Exchange&lt;/STRONG&gt;, an open standard from &lt;A href="https://www.1edtech.org/" target="_blank" rel="noopener"&gt;1EdTech&lt;/A&gt; that defines how learning outcomes and standards are represented in a machine-readable, interoperable format.&lt;/P&gt;
&lt;P&gt;Why does CASE matter?&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Machine-readability:&lt;/STRONG&gt; Platforms, AI tools, and learning management systems can read, search, and apply standards programmatically.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Interoperability:&lt;/STRONG&gt; Standards move between systems. An assignment tagged with a standard in Microsoft Teams can be understood by an LMS, a reporting tool, or a curriculum mapping platform without manual re-entry.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cross-region equivalence:&lt;/STRONG&gt; CASE enables comparing and mapping standards across countries and frameworks.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;EdGate is a certified CASE 1.1 provider, meaning the standards they deliver to Microsoft (and to the broader ecosystem) follow this open, interoperable format. The expanded catalog we have built together benefits not just Microsoft's products, but the entire ecosystem of education technology that relies on structured standards data.&lt;/P&gt;
&lt;H3&gt;A shared commitment to open standards&lt;/H3&gt;
&lt;P&gt;Microsoft is proud to be a &lt;STRONG&gt;Contributing Member of 1EdTech&lt;/STRONG&gt;, the organization that stewards CASE and other critical interoperability standards for education technology, including LTI, OneRoster, and Open Badges. By collaborating with fellow 1EdTech members like EdGate, we ensure that investments in standards infrastructure benefit educators everywhere, regardless of which platforms or tools they use.&lt;/P&gt;
&lt;P&gt;When standards are open, structured, and interoperable, everyone wins: educators spend less time on manual alignment, developers can build smarter tools, and students benefit from instruction that is intentionally connected to what they are expected to learn.&lt;/P&gt;
&lt;H3&gt;What this means for educators&lt;/H3&gt;
&lt;P&gt;Within Microsoft Education, you do not need to think about CASE or data formats to benefit from this work. What you will see is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;More standards available in the Teach module and Assignments, covering more countries, subjects, and grade bands&lt;/LI&gt;
&lt;LI&gt;AI-powered experiences that are better grounded in your actual curriculum&lt;/LI&gt;
&lt;LI&gt;Less manual work translating curriculum documents into classroom materials&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;We are committed to continuing this investment: expanding coverage, improving the experience, and working with partners like EdGate and the 1EdTech community to make standards-aligned teaching easier for educators everywhere.&lt;/P&gt;
&lt;H3&gt;Helpful links&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://support.microsoft.com/topic/c4b05fdd-527f-4f85-9775-afb0781a9178" target="_blank" rel="noopener"&gt;Getting started with Teach&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://support.microsoft.com/topic/22ef609f-eb15-454b-9f77-5b1d19ec0d57" target="_blank" rel="noopener"&gt;Modify content: Align to Standards&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://www.microsoft.com/education/products/teams" target="_blank" rel="noopener"&gt;Microsoft Teams for Education&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/M365LTIGABlog" target="_blank" rel="noopener"&gt;Microsoft 365 LTI&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://edgate.com/standards/international" target="_blank" rel="noopener"&gt;International standards currently available through EdGate&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/StandardsFeedback" target="_blank" rel="noopener"&gt;Request additional standards in Microsoft Education&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://www.1edtech.org/" target="_blank" rel="noopener"&gt;About 1EdTech&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://www.1edtech.org/standards/case" target="_blank" rel="noopener"&gt;About CASE (Competencies and Academic Standards Exchange)&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Have questions or feedback about standards in Microsoft Education? Drop a comment below or submit a request through our &lt;A href="https://aka.ms/StandardsFeedback" target="_blank" rel="noopener"&gt;Standards Feedback form&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 14:01:25 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/education-blog/making-academic-standards-more-accessible/ba-p/4523719</guid>
      <dc:creator>MikeMast</dc:creator>
      <dc:date>2026-06-01T14:01:25Z</dc:date>
    </item>
    <item>
      <title>Building a hands-free voice concierge with Microsoft Foundry Voice Live and a Hosted Agent</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-a-hands-free-voice-concierge-with-microsoft-foundry/ba-p/4523960</link>
      <description>&lt;P&gt;This post walks through a small, working sample that wires the browser microphone to&amp;nbsp;&lt;STRONG&gt;Azure AI Speech Voice Live&lt;/STRONG&gt;, binds the realtime session to a &lt;STRONG&gt;Foundry hosted agent&lt;/STRONG&gt;, and lets the agent answer travel questions using tool calls. The full source, infrastructure, and labs live in the repository linked at the end.&lt;/P&gt;
&lt;H2&gt;Why this combination matters&lt;/H2&gt;
&lt;P&gt;Voice user interfaces have historically been hard to build well. Streaming audio, partial transcripts, barge-in, voice activity detection, tool dispatch, and audio playback have traditionally meant stitching together five or six services. The combination of Voice Live and a Foundry hosted agent collapses that into &lt;STRONG&gt;one realtime WebSocket session&lt;/STRONG&gt; with a single binding field.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Voice Live&lt;/STRONG&gt; owns the audio loop: speech to text, neural text to speech, semantic turn detection, noise suppression, and echo cancellation.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The Foundry hosted agent&lt;/STRONG&gt; owns the brain: instructions, memory, model selection, evaluators, and tool calling.&lt;/LI&gt;
&lt;LI&gt;The link between them is one query parameter on the WebSocket URL.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;What this means in practice: the browser never sees a model API key, never instantiates a tool, and never owns the agent prompt. The browser does microphone capture and audio playback. Everything else lives server-side.&lt;/P&gt;
&lt;H2&gt;The scenario&lt;/H2&gt;
&lt;P&gt;The sample is called &lt;STRONG&gt;Contoso Travel Concierge&lt;/STRONG&gt;. The user is mid-journey, hands and eyes busy, and wants to ask things like:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What is the weather in Tokyo this weekend?&lt;/LI&gt;
&lt;LI&gt;Is BA005 from Heathrow on time?&lt;/LI&gt;
&lt;LI&gt;What time is check-in at the Marriott Marquis?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Each question triggers a tool call on the hosted agent. The reply is short, speakable, and synthesised back to the user in under a second on a warm connection.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Architecture&lt;/H2&gt;
&lt;P&gt;There are four moving parts. Three of them are managed Azure services. Only the broker is your code.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Browser client&lt;/STRONG&gt; – captures PCM16 audio at 24 kHz and streams it over a WebSocket to the broker. Plays back audio chunks the broker forwards from Voice Live.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Session broker&lt;/STRONG&gt; (FastAPI) – authenticates to Azure with &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt;, builds the Voice Live WebSocket URL with a short-lived bearer token, and relays frames in both directions.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Voice Live&lt;/STRONG&gt; – the Azure AI Speech realtime endpoint. Transcribes the user, hands the text to the bound agent, and synthesises the agent’s reply.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry hosted agent&lt;/STRONG&gt; – a prompt-kind agent in Azure AI Foundry with instructions, tool definitions, and the &lt;CODE&gt;microsoft.voice-live.enabled&lt;/CODE&gt; metadata flag set to &lt;CODE&gt;true&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Two design choices are worth calling out.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The broker is small on purpose.&lt;/STRONG&gt; It does authentication, URL construction, and WebSocket relay. It does not transcode audio, run business logic, or hold conversation state. Voice Live and the agent already do those things well.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The agent binding is a URL query parameter, not an SDK call.&lt;/STRONG&gt; There is no per-turn HTTP request to the agent runtime. Voice Live opens a session against the agent once and streams turns through it for the lifetime of the WebSocket. That is what keeps latency low.&lt;/P&gt;
&lt;H2&gt;The Voice Live URL contract&lt;/H2&gt;
&lt;P&gt;This is the single most important thing to get right. The public Microsoft sample that ships under &lt;CODE&gt;liupeirong/ai-foundry-voice-agent&lt;/CODE&gt; targets a different URL shape (&lt;CODE&gt;services.ai.azure.com&lt;/CODE&gt; host, &lt;CODE&gt;agent-id&lt;/CODE&gt; + &lt;CODE&gt;agent-access-token&lt;/CODE&gt; parameters, an &lt;CODE&gt;Authorization&lt;/CODE&gt; header). That shape is rejected by Foundry resources that expose voice-live-enabled agents. The shape below is the one the portal itself uses, and the one this sample dials.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Three details cause most failures:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The host must be &lt;CODE&gt;&amp;lt;resource&amp;gt;.cognitiveservices.azure.com&lt;/CODE&gt;, not &lt;CODE&gt;services.ai.azure.com&lt;/CODE&gt;. The broker rewrites this automatically from &lt;CODE&gt;VOICE_LIVE_ENDPOINT&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;The bearer token travels in the &lt;CODE&gt;authorization&lt;/CODE&gt; query parameter, URL-encoded, with a literal &lt;CODE&gt;Bearer&lt;/CODE&gt; prefix and a &lt;CODE&gt;+&lt;/CODE&gt; (or &lt;CODE&gt;%20&lt;/CODE&gt;) before the token. No &lt;CODE&gt;Authorization&lt;/CODE&gt; header is sent.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;agent-name&lt;/CODE&gt; and &lt;CODE&gt;model&lt;/CODE&gt; are both the agent’s display name. &lt;CODE&gt;agent-version&lt;/CODE&gt; is empty when you want the latest published version.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Walkthrough: from clone to spoken reply&lt;/H2&gt;
&lt;H3&gt;Prerequisites&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Python 3.11 or later (the sample is developed on 3.13).&lt;/LI&gt;
&lt;LI&gt;The Azure CLI, signed in with &lt;CODE&gt;az login --tenant &amp;lt;your-tenant-id&amp;gt;&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;An Azure AI Foundry project in a Voice Live region (&lt;CODE&gt;eastus2&lt;/CODE&gt;, &lt;CODE&gt;swedencentral&lt;/CODE&gt;, or &lt;CODE&gt;westus2&lt;/CODE&gt;).&lt;/LI&gt;
&lt;LI&gt;A deployed prompt-kind agent in that project with &lt;STRONG&gt;Enable Voice Live&lt;/STRONG&gt; turned on.&lt;/LI&gt;
&lt;LI&gt;The &lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt; role on the Foundry resource for the identity the broker will use.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Configure the broker&lt;/H3&gt;
&lt;P&gt;Copy &lt;CODE&gt;.env.sample&lt;/CODE&gt; to &lt;CODE&gt;.env&lt;/CODE&gt; and fill in four values:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;AZURE_AI_PROJECT_ENDPOINT=https://&amp;lt;your-resource&amp;gt;.services.ai.azure.com
AZURE_AI_PROJECT_NAME=&amp;lt;your-foundry-project-name&amp;gt;
VOICE_LIVE_ENDPOINT=wss://&amp;lt;your-resource&amp;gt;.services.ai.azure.com/voice-live/realtime
VOICE_LIVE_API_VERSION=2025-10-01
FOUNDRY_AGENT_ID=&amp;lt;your-agent-name&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The agent name is what the Foundry portal shows on the agent card. The broker uses it for both the &lt;CODE&gt;agent-name&lt;/CODE&gt; and &lt;CODE&gt;model&lt;/CODE&gt; query parameters.&lt;/P&gt;
&lt;H3&gt;Install and run&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
.\scripts\start-local.ps1
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The broker exposes three endpoints:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;CODE&gt;GET /healthz&lt;/CODE&gt; – liveness probe.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;GET /config&lt;/CODE&gt; – returns the &lt;CODE&gt;session.update&lt;/CODE&gt; the browser sends as its first frame.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;WS /ws&lt;/CODE&gt; – the bi-directional relay to Voice Live.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Smoke test&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;.\scripts\test-session.ps1
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A successful run prints:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;[OK] /ws upgraded
   -&amp;gt; sent session.update
   &amp;lt;- {"type":"session.created",…}
   &amp;lt;- {"type":"session.updated",…}
[OK] session.updated received -- E2E works
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This confirms the entire chain: local broker, &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt; token, Foundry Portal URL shape, Voice Live handshake, and the bound agent acknowledging the session.&lt;/P&gt;
&lt;H3&gt;Open the browser UI&lt;/H3&gt;
&lt;P&gt;Browse to &lt;CODE&gt;http://localhost:8000/&lt;/CODE&gt;, click &lt;STRONG&gt;Start talking&lt;/STRONG&gt;, and ask one of the sample questions. Transcripts appear in real time and the spoken reply plays back through the audio context.&lt;/P&gt;
&lt;H2&gt;Inside the broker&lt;/H2&gt;
&lt;P&gt;The relay logic is tiny – the heavy lifting is the URL construction. The function below is the canonical reference; copy it if you are porting the pattern to another language.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;def build_voice_live_ws_url(agent_access_token: str) -&amp;gt; str:
    """
    Build the Foundry Portal style Voice Live WebSocket URL.

    Auth lives in the query string only. No Authorization header is sent.
    """
    host = _ws_host_from_endpoint(VOICE_LIVE_ENDPOINT)
    qs = urlencode(
        {
            "trafficType": "FoundryPortal",
            "agent-name": FOUNDRY_AGENT_ID,
            "agent-version": "",
            "agent-project-name": AZURE_AI_PROJECT_NAME,
            "api-version": VOICE_LIVE_API_VERSION,
            "model": FOUNDRY_AGENT_ID,
            "client-request-id": str(uuid.uuid4()),
            "authorization": f"Bearer {agent_access_token}",
        },
        quote_via=quote,
    )
    return f"wss://{host}/voice-live/realtime?{qs}"
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The relay itself is a pair of asyncio tasks: one forwarding browser frames upstream, one forwarding Voice Live frames back. Audio bytes are passed straight through – the broker never decodes them.&lt;/P&gt;
&lt;H2&gt;Deploying the hosted agent&lt;/H2&gt;
&lt;P&gt;The most reliable way to create a voice-live-enabled agent is the Foundry portal. Agents created via the Assistants v2 SDK do not carry the required metadata by default and will be rejected by the Voice Live URL shape above.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The portal steps are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Open the Foundry project, go to &lt;STRONG&gt;Agents&lt;/STRONG&gt;, and click &lt;STRONG&gt;New agent&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Choose &lt;STRONG&gt;Prompt agent&lt;/STRONG&gt; as the kind, name it (for example &lt;CODE&gt;travel-concierge&lt;/CODE&gt;), and pick a model deployment.&lt;/LI&gt;
&lt;LI&gt;Paste the contents of &lt;CODE&gt;agent/src/prompts/system.txt&lt;/CODE&gt; into the instructions box.&lt;/LI&gt;
&lt;LI&gt;On the &lt;STRONG&gt;Voice&lt;/STRONG&gt; tab, switch &lt;STRONG&gt;Enable Voice Live&lt;/STRONG&gt; on. This is what sets the &lt;CODE&gt;microsoft.voice-live.enabled = true&lt;/CODE&gt; metadata.&lt;/LI&gt;
&lt;LI&gt;Add the three tools (&lt;CODE&gt;get_weather&lt;/CODE&gt;, &lt;CODE&gt;get_flight_status&lt;/CODE&gt;, &lt;CODE&gt;get_hotel_info&lt;/CODE&gt;) from &lt;CODE&gt;agent/agent.yaml&lt;/CODE&gt; on the &lt;STRONG&gt;Tools&lt;/STRONG&gt; tab.&lt;/LI&gt;
&lt;LI&gt;Publish the version and write the agent name back to &lt;CODE&gt;.env&lt;/CODE&gt; as &lt;CODE&gt;FOUNDRY_AGENT_ID&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The full deployment guide, including how to host the broker on Azure Container Apps with a managed identity, is in &lt;CODE&gt;docs/deployment.md&lt;/CODE&gt; in the repository.&lt;/P&gt;
&lt;H2&gt;Three lessons from getting this to production&lt;/H2&gt;
&lt;H3&gt;1. Voice output must be written for speech, not for screens&lt;/H3&gt;
&lt;P&gt;Foundry agents tend to format answers in markdown with citations like &lt;CODE&gt;([data.jma.go.jp](https://…))&lt;/CODE&gt;. When Voice Live synthesises that text, the user hears the URL read aloud, character by character. The fix is to write the agent instructions so the spoken text never contains URLs, markdown, or symbols. A short block at the end of the agent instructions does the job:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Voice output rules
- This output is read aloud by TTS. Never include URLs, domain names, or
  citation markers like "(source.com)" in your reply. Cite by speakable
  source name only.
- Never use markdown for formatting. No asterisks, brackets, backticks,
  bullets, or hashes. Write in plain spoken sentences.
- Keep numbers speakable: say "thirty degrees Celsius", not "30C / 86F".
- Keep replies under about 40 words unless the user asks for detail.
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The browser transcript can still render markdown for the eyes. The sample does so with a small, escaping markdown renderer that whitelists bold, italic, code, and &lt;CODE&gt;http(s)&lt;/CODE&gt; links only, so the same agent reply looks polished on screen even though the spoken version contains none of it.&lt;/P&gt;
&lt;H3&gt;2. Identity is simpler than it looks&lt;/H3&gt;
&lt;P&gt;The broker uses &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt; and requests the &lt;CODE&gt;https://ai.azure.com/.default&lt;/CODE&gt; scope. Locally that resolves to your &lt;CODE&gt;az login&lt;/CODE&gt; credentials. In Azure Container Apps it resolves to the user-assigned managed identity. In both cases the only role assignment you need on the Foundry account is &lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;. There is no API key path on the working URL shape – it is bearer tokens all the way down.&lt;/P&gt;
&lt;H3&gt;3. The wrong sample wastes a day&lt;/H3&gt;
&lt;P&gt;If you start from the public &lt;CODE&gt;liupeirong/ai-foundry-voice-agent&lt;/CODE&gt; repository against a portal-provisioned voice-live agent, the WebSocket either returns HTTP 400 or closes silently with code 1006. The cause is the URL shape, not your code. The reference probe in &lt;CODE&gt;scripts/probe_portal_shape.py&lt;/CODE&gt; is the single source of truth for the working contract – keep it as a regression test.&lt;/P&gt;
&lt;H2&gt;Responsible AI and security notes&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Credentials never reach the browser.&lt;/STRONG&gt; Tokens are minted server-side and travel only on the upstream Voice Live URL.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No secrets in source.&lt;/STRONG&gt; The &lt;CODE&gt;.env&lt;/CODE&gt; file is gitignored. The &lt;CODE&gt;.env.sample&lt;/CODE&gt; contains only placeholders.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Markdown rendering is escape-first.&lt;/STRONG&gt; The browser HTML-escapes the agent reply before applying its small markdown whitelist, and links are restricted to &lt;CODE&gt;http(s)&lt;/CODE&gt; URLs so the rule cannot emit &lt;CODE&gt;javascript:&lt;/CODE&gt; hrefs.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Tool calls are auditable.&lt;/STRONG&gt; Every turn shows up as a run in the Foundry portal under the agent, with the prompt, model output, and tool inputs and outputs visible for review.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Voice biometric considerations.&lt;/STRONG&gt; If you plan to handle account verification by voice, plug in dedicated speaker recognition rather than relying on the conversational model.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Key takeaways&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Voice Live plus a Foundry hosted agent is a session-level integration, not an API integration. One URL, one binding field, one WebSocket.&lt;/LI&gt;
&lt;LI&gt;The browser is a thin client. Authentication, URL construction, and relay all live in a small FastAPI broker.&lt;/LI&gt;
&lt;LI&gt;Get the URL shape right (&lt;CODE&gt;cognitiveservices.azure.com&lt;/CODE&gt;, token in the query string, &lt;CODE&gt;agent-name&lt;/CODE&gt; equals &lt;CODE&gt;model&lt;/CODE&gt; equals the agent display name) and the rest is plumbing.&lt;/LI&gt;
&lt;LI&gt;Use the Foundry portal to create the agent so the voice-live metadata is set correctly.&lt;/LI&gt;
&lt;LI&gt;Write agent instructions for the ear, not the eye, then layer screen formatting on top in the browser.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Get the code and try it&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Repository:&lt;/STRONG&gt; &lt;A href="https://github.com/microsoft/foundry-agent-voice-mode-sample" target="_blank" rel="noopener"&gt;github.com/microsoft/foundry-agent-voice-mode-sample&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Deployment guide:&lt;/STRONG&gt; &lt;CODE&gt;docs/deployment.md&lt;/CODE&gt; in the repository.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Labs:&lt;/STRONG&gt; three progressive workshops under &lt;CODE&gt;labs/&lt;/CODE&gt; – basic voice, adding tools, and binding a hosted agent.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reference docs:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/azure/ai-services/speech-service/voice-live" target="_blank" rel="noopener"&gt;Voice Live in Azure AI Speech&lt;/A&gt; and &lt;A href="https://learn.microsoft.com/azure/ai-foundry/agents/overview" target="_blank" rel="noopener"&gt;Agents in Microsoft Foundry&lt;/A&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you build something on top of this pattern, open an issue or pull request on the repository. The sample is intentionally small so it stays easy to fork.&lt;/P&gt;</description>
      <pubDate>Fri, 29 May 2026 10:16:09 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-a-hands-free-voice-concierge-with-microsoft-foundry/ba-p/4523960</guid>
      <dc:creator>Lee_Stott</dc:creator>
      <dc:date>2026-05-29T10:16:09Z</dc:date>
    </item>
    <item>
      <title>Building Reliable AI Coding Workflows Using Modular AI Agent Optimization</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-reliable-ai-coding-workflows-using-modular-ai-agent/ba-p/4523252</link>
      <description>&lt;P class="lia-align-justify"&gt;Artificial Intelligence is rapidly transforming the modern software development industry. AI-powered coding assistants such as GitHub Copilot, Claude Code, and other Large Language Model (LLM)-based systems are helping developers automate repetitive coding tasks, improve productivity, and accelerate software development processes. These tools can generate code, assist with debugging, provide recommendations, and support developers during implementation. However, despite their growing capabilities, many AI coding assistants still face challenges related to reliability, maintainability, project-specific conventions, and structured software engineering workflows.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Most coding assistants perform well for generic programming tasks but often struggle when working with domain-specific development requirements, API integrations, project architectures, validation workflows, and coding standards. In real-world software engineering environments, developers require systems that not only generate code but also follow project conventions, maintain readability, support modular development, and improve long-term maintainability.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;The project &lt;A class="lia-external-url" href="https://github.com/shardakaurr/ai-agent-optimization" target="_blank"&gt;“AI Agents Optimization”&lt;/A&gt; focuses on improving the reliability and effectiveness of AI coding agents by designing structured workflows, modular configurations, validation mechanisms, and optimized task execution strategies. The objective of the project is to investigate how AI agents can become dependable collaborators in practical software engineering tasks instead of functioning only as autocomplete systems.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The project explores different approaches for organizing AI agent workflows using structured instruction handling, modular task division, context management, validation systems, and integration of external tools and documentation sources. Different agent configurations are analyzed and evaluated to understand how workflow optimization affects software development quality and performance.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Why Existing AI Coding Workflows Often Fail&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;Most AI coding assistants perform well for isolated coding tasks but struggle in real-world engineering environments where projects involve multiple files, coding standards, APIs, validation requirements, and contextual dependencies.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;For example, a generic prompt such as:&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;“Build authentication middleware”&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;may generate functional code, but the output often lacks:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Project-specific architecture&lt;/LI&gt;
&lt;LI&gt;Error handling consistency&lt;/LI&gt;
&lt;LI&gt;Validation logic&lt;/LI&gt;
&lt;LI&gt;Security best practices&lt;/LI&gt;
&lt;LI&gt;Dependency awareness&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;This project approaches the problem differently by introducing a structured workflow pipeline where AI agents operate in defined stages rather than generating outputs in a single step.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The workflow separates planning, generation, validation, and refinement into independent modules. This improves maintainability, reduces inconsistent outputs, and supports iterative refinement similar to real software engineering workflows.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Project Objectives&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The primary objective of this project is to optimize AI coding agents for real-world software engineering workflows. The project aims to improve how AI systems handle development tasks such as code generation, debugging, testing, validation, feature implementation, and workflow management.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Another major objective is to design modular AI workflows where different stages of software development are managed systematically. The workflow focuses on task planning, instruction processing, validation, refinement, and output evaluation. This structured approach improves transparency, maintainability, and consistency in AI-generated outputs.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The project also aims to evaluate how AI coding agents perform under different configurations and development scenarios. By testing multiple workflows and structured instruction methods, the project analyzes how optimization techniques improve development reliability and coding quality.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Technologies and Tools Used&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The project utilizes multiple modern technologies and development tools for experimentation and workflow optimization.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN lia-align-justify"&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Technology / Tool&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Python&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Automation and scripting&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;GitHub Copilot&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AI-assisted coding&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Claude / LLM APIs&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AI workflow experimentation&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Visual Studio Code&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Development environment&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Git &amp;amp; GitHub&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Version control and repository management&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Structured Prompting&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Workflow optimization&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;MCP Concepts&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Tool and context integration&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;These tools collectively support the implementation and testing of optimized AI coding workflows.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Implementation Workflow&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The system was implemented using a modular AI workflow pipeline where each stage performs a dedicated engineering task.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Step 1 — Task Parsing&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The user submits a development task or coding requirement. The Instruction Processing Module extracts:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Objective&lt;/LI&gt;
&lt;LI&gt;Constraints&lt;/LI&gt;
&lt;LI&gt;Project context&lt;/LI&gt;
&lt;LI&gt;Expected output format&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;Example structured prompt:&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Task: Create JWT authentication middleware&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Language: Node.js&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Constraints:&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;- Use Express.js&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;- Add token validation&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;- Follow modular architecture&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;- Include error handling&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Step 2 — Planning &amp;amp; Reasoning&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The Planning Module divides the task into subtasks such as:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Route handling&lt;/LI&gt;
&lt;LI&gt;Token verification&lt;/LI&gt;
&lt;LI&gt;Error management&lt;/LI&gt;
&lt;LI&gt;Security validation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;This improves reasoning consistency before generation begins.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Step 3 — Code Generation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The Code Generation Module produces outputs using structured prompts and contextual references instead of generic instructions.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Step 4 — Validation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Generated outputs are validated using:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Syntax checks&lt;/LI&gt;
&lt;LI&gt;Logical consistency checks&lt;/LI&gt;
&lt;LI&gt;Formatting standards&lt;/LI&gt;
&lt;LI&gt;Dependency validation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Step 5 — Refinement&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;If validation fails, the workflow loops back into refinement where issues are corrected before final delivery.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;System Workflow&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The workflow of the&lt;A class="lia-external-url" href="https://github.com/shardakaurr/ai-agent-optimization" target="_blank"&gt; AI Agents Optimization system&lt;/A&gt; is based on modular task execution and structured development processes. The workflow begins with task planning and requirement analysis. The AI agent receives structured instructions along with coding constraints, project context, and validation requirements.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The system processes the provided instructions and generates outputs according to defined workflows and development standards. Different configurations are tested to evaluate how instruction structures and modular task handling influence the quality of generated code&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The workflow also includes validation and refinement stages where generated outputs are analyzed for correctness, maintainability, and consistency. The project focuses not only on code generation but also on improving readability, workflow transparency, debugging support, and adherence to project conventions.&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Key Features of the Project&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Structured AI workflow design&lt;/LI&gt;
&lt;LI&gt;Modular task execution&lt;/LI&gt;
&lt;LI&gt;AI-assisted software development&lt;/LI&gt;
&lt;LI&gt;Workflow optimization strategies&lt;/LI&gt;
&lt;LI&gt;Validation and refinement mechanisms&lt;/LI&gt;
&lt;LI&gt;Integration of development tools and documentation&lt;/LI&gt;
&lt;LI&gt;Improved maintainability and readability&lt;/LI&gt;
&lt;LI&gt;Support for practical software engineering workflows&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Challenges Faced During Development&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;One of the major challenges encountered during the project was maintaining consistency and reliability in AI-generated outputs. Different AI models often produce different responses depending on prompts, context, and task structure. Designing workflows that improve output stability and maintain coding standards required careful experimentation and optimization.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Another challenge involved integrating structured workflows while ensuring flexibility in task execution. AI systems often require clear instructions and contextual information to produce accurate outputs. Balancing automation with maintainability and project-specific requirements was an important aspect of the project.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Managing validation and refinement processes was also challenging because generated outputs needed to be evaluated not only for correctness but also for readability, maintainability, and software engineering best practices.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Observations and Outcomes&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;During experimentation, structured workflows produced more reliable and maintainable outputs compared to single-prompt generation approaches.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Some important observations included:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Reduced repetitive corrections during code refinement&lt;/LI&gt;
&lt;LI&gt;Improved consistency in generated outputs&lt;/LI&gt;
&lt;LI&gt;Better adherence to coding structure and formatting&lt;/LI&gt;
&lt;LI&gt;More stable workflow behavior for multi-step tasks&lt;/LI&gt;
&lt;LI&gt;Improved readability and maintainability of generated code&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;The validation and refinement stages were particularly effective in reducing incomplete outputs and improving response quality.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Although the project focuses primarily on workflow architecture and qualitative analysis rather than benchmark testing, the results demonstrate that modular AI pipelines can significantly improve practical software engineering workflows.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Future Enhancements&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The project can be further enhanced by implementing advanced multi-agent collaboration systems where multiple AI agents work together on complex software development tasks. Future versions may also include real-time documentation integration, automated testing frameworks, cloud-based workflow management, and improved reasoning models.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Additional enhancements may include IDE extensions, intelligent debugging systems, automated code review mechanisms, and adaptive workflow optimization based on project requirements.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;The &lt;A class="lia-external-url" href="https://github.com/shardakaurr/ai-agent-optimization" target="_blank"&gt;AI Agents Optimization project &lt;/A&gt;demonstrates how structured workflows and modular configurations can improve the effectiveness of AI-powered coding assistants in modern software engineering environments.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;By focusing on workflow optimization, validation mechanisms, modular task execution, and structured instruction handling, the project highlights the future potential of AI agents as reliable development collaborators capable of supporting real-world software engineering processes.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;The project represents an important step toward building dependable AI-assisted development systems that improve productivity, maintainability, and software quality while supporting modern engineering practices.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H4&gt;&lt;STRONG&gt;How to Try This Workflow&lt;/STRONG&gt;&lt;/H4&gt;
&lt;/DIV&gt;
&lt;OL class="lia-align-justify"&gt;
&lt;LI&gt;Define a structured development task&lt;/LI&gt;
&lt;LI&gt;Provide project constraints and context&lt;/LI&gt;
&lt;LI&gt;Break the task into subtasks&lt;/LI&gt;
&lt;LI&gt;Generate output using structured prompts&lt;/LI&gt;
&lt;LI&gt;Validate output quality&lt;/LI&gt;
&lt;LI&gt;Refine based on validation feedback&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 28 May 2026 19:53:45 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/building-reliable-ai-coding-workflows-using-modular-ai-agent/ba-p/4523252</guid>
      <dc:creator>ShardaKaur</dc:creator>
      <dc:date>2026-05-28T19:53:45Z</dc:date>
    </item>
    <item>
      <title>Hybrid AI Agents in Python: Routing Between Foundry Local and Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/hybrid-ai-agents-in-python-routing-between-foundry-local-and/ba-p/4522979</link>
      <description>&lt;H2&gt;Why hybrid, and why now&lt;/H2&gt;
&lt;P&gt;If you build AI features today, you are caught between three forces. Users want low latency and strong privacy. Product teams want frontier reasoning capability. Finance teams want predictable cost. No single model satisfies all three. Run everything on a small on-device model and you bottleneck on complex questions. Send everything to a frontier cloud model and you pay for trivial requests, leak sensitive data across a network boundary, and add hundreds of milliseconds of latency to greetings.&lt;/P&gt;
&lt;P&gt;The pragmatic answer is hybrid inference: a lightweight local model classifies every request first, simple or sensitive ones stay on the device, and only the genuinely hard or frontier-capability requests escalate to the cloud. Microsoft now ships both halves of that pattern as supported Python SDKs — &lt;A href="https://pypi.org/project/foundry-local-sdk/" target="_blank"&gt;foundry-local-sdk&lt;/A&gt; for on-device inference and &lt;A href="https://pypi.org/project/azure-ai-projects/" target="_blank"&gt;azure-ai-projects&lt;/A&gt; for Microsoft Foundry cloud models. This post walks through a working reference implementation that combines them behind a single &lt;CODE&gt;ask()&lt;/CODE&gt; call.&lt;/P&gt;
&lt;P&gt;The full source is at &lt;A href="https://github.com/leestott/fl-mixedmodel" target="_blank"&gt;github.com/leestott/fl-mixedmodel&lt;/A&gt;. It is Python-only, secretless by design, and ships with a Gradio diagnostics UI, a CLI demo mode, and a full &lt;CODE&gt;pytest&lt;/CODE&gt; suite.&lt;/P&gt;
&lt;H2&gt;The contract: one schema, two paths&lt;/H2&gt;
&lt;P&gt;The most important architectural decision is that callers never know which path served a request. Every response, local or cloud, returns the same dataclass:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;class InferencePath(str, Enum):
    LOCAL = "local"
    CLOUD = "cloud"
    LOCAL_FALLBACK = "local_fallback"   # cloud attempted, fell back to local
    CLOUD_FALLBACK = "cloud_fallback"   # local attempted, fell back to cloud

@dataclass
class AgentResponse:
    answer: str
    path: InferencePath
    model: str
    reason: str
    confidence: float
    latency_ms: float
    correlation_id: str
    prompt_tokens: Optional[int] = None
    completion_tokens: Optional[int] = None
    fallback: bool = False
    fallback_reason: Optional[str] = None
    metadata: dict = field(default_factory=dict)
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is what makes the design honest. The router can change, the cloud model can be swapped from &lt;CODE&gt;gpt-4o&lt;/CODE&gt; to &lt;CODE&gt;gpt-5.4&lt;/CODE&gt;, fallback policies can flip — and the calling code never breaks. The four &lt;CODE&gt;InferencePath&lt;/CODE&gt; values give you full observability without leaking implementation details into the API surface.&lt;/P&gt;
&lt;H2&gt;Architecture in one diagram&lt;/H2&gt;
&lt;PRE&gt;&lt;CODE&gt;┌─────────────┐   prompt    ┌──────────────────────────┐
│   caller    │ ──────────► │   HybridAgentService     │
└─────────────┘             │      .ask(prompt)        │
                            └────────────┬─────────────┘
                                         │
                            ┌────────────▼─────────────┐
                            │     RoutingPolicy        │
                            │  1. Heuristic gate       │
                            │  2. Local router LLM     │
                            │  3. Hard policy gates    │
                            └─────┬─────────────┬──────┘
                                  │             │
                          LOCAL  ◄┘             └► CLOUD
                                  │             │
                       ┌──────────▼──┐   ┌──────▼───────┐
                       │ Foundry     │   │ Microsoft    │
                       │ Local SDK   │   │ Foundry      │
                       │ (phi-4-mini)│   │ (gpt-5.4)    │
                       └─────────────┘   └──────────────┘
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H2&gt;Best practice: the two-stage router pattern&lt;/H2&gt;
&lt;P&gt;Before walking through the implementation, it is worth stating the design pattern explicitly, because it is the part that generalises beyond this specific repo. &lt;STRONG&gt;The cleanest design for hybrid inference is a two-stage router.&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 1 — local router.&lt;/STRONG&gt; A small local model performs intent and complexity classification first. It does not answer the question; it decides where the question should go.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stage 2 — route the answer.&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;If the prompt is &lt;STRONG&gt;simple, private, latency-sensitive, or clearly within local capability&lt;/STRONG&gt;, route to a &lt;EM&gt;local task model&lt;/EM&gt; on the device.&lt;/LI&gt;
&lt;LI&gt;If the prompt is &lt;STRONG&gt;complex, needs deeper reasoning, a larger context window, or a capability unavailable locally&lt;/STRONG&gt;, escalate to a &lt;EM&gt;cloud frontier model in Microsoft Foundry&lt;/EM&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Microsoft's current guidance for the cloud side is to use the &lt;STRONG&gt;Responses API&lt;/STRONG&gt; and choose one of two control modes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Pass a &lt;STRONG&gt;specific deployment name&lt;/STRONG&gt; (for example &lt;CODE&gt;gpt-5.4&lt;/CODE&gt;) when you want deterministic control over which model serves the request, which is the right choice for regulated workloads, repeatable evaluations, or cost ceilings.&lt;/LI&gt;
&lt;LI&gt;Pass &lt;STRONG&gt;&lt;CODE&gt;model-router&lt;/CODE&gt;&lt;/STRONG&gt; as the deployment when you want Microsoft Foundry to automatically select the best available cloud model for each request. This is a sensible default for general-purpose agents where you would rather let the platform optimise the model choice as new ones are released.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The reference repo exposes both as environment variables so you can switch without code changes:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# .env.example
FOUNDRY_CLOUD_MODEL_DEPLOYMENT=gpt-5.4        # deterministic
FOUNDRY_CLOUD_ROUTER_DEPLOYMENT=model-router  # auto-select
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H2&gt;Best practice: pin the right SDK versions&lt;/H2&gt;
&lt;P&gt;Two SDKs do the heavy lifting and both have had recent breaking changes, so version discipline matters.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Local development — &lt;CODE&gt;foundry-local-sdk&lt;/CODE&gt;.&lt;/STRONG&gt; The current public guidance is to use the Foundry Local SDK package &lt;A href="https://pypi.org/project/foundry-local-sdk/" target="_blank"&gt;&lt;CODE&gt;foundry-local-sdk&lt;/CODE&gt;&lt;/A&gt;, which provides model discovery, download, cache, load, unload, chat completions, embeddings, audio transcription, and an optional built-in web service. Use &lt;STRONG&gt;version 1.1.0&lt;/STRONG&gt;, released on &lt;STRONG&gt;5 May 2026&lt;/STRONG&gt;. Earlier versions used an OpenAI-compatible client surface that has since been replaced by the &lt;CODE&gt;FoundryLocalManager → load_model → get_chat_client → complete_chat&lt;/CODE&gt; chain shown above. Pin it explicitly:
&lt;PRE&gt;&lt;CODE&gt;# requirements.txt
foundry-local-sdk&amp;gt;=1.1.0&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cloud orchestration and agents — &lt;CODE&gt;azure-ai-projects&lt;/CODE&gt;.&lt;/STRONG&gt; For cloud-side orchestration, Microsoft's current Python guidance is to use &lt;A href="https://pypi.org/project/azure-ai-projects/" target="_blank"&gt;&lt;CODE&gt;azure-ai-projects&lt;/CODE&gt;&lt;/A&gt;, which the docs describe as part of the Microsoft Foundry SDK and as the entry point for &lt;EM&gt;agents, deployments, connections, datasets, evaluations&lt;/EM&gt;, and an OpenAI-compatible client returned by &lt;CODE&gt;get_openai_client()&lt;/CODE&gt;. The current PyPI listing shows &lt;STRONG&gt;azure-ai-projects 2.1.0&lt;/STRONG&gt;. Pin it explicitly:
&lt;PRE&gt;&lt;CODE&gt;# requirements.txt
azure-ai-projects&amp;gt;=2.1.0
azure-identity&amp;gt;=1.17.0&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you find yourself reading old samples that import &lt;CODE&gt;azure.ai.inference&lt;/CODE&gt; as the cloud entry point, or that initialise Foundry Local through a raw &lt;CODE&gt;openai.OpenAI(base_url=...)&lt;/CODE&gt; client, you are looking at pre-2026 patterns. The current shape is what the reference repo uses: &lt;CODE&gt;FoundryLocalManager.initialize(Configuration(...))&lt;/CODE&gt; for the device and &lt;CODE&gt;AIProjectClient(...).get_openai_client()&lt;/CODE&gt; for the cloud.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Stage 1: a deterministic privacy gate&lt;/H2&gt;
&lt;P&gt;Before any model touches a prompt, a deterministic heuristic classifier scans for sensitive patterns — passwords, API keys, SSN/NHS numbers, PII signals, explicit "do not share" flags. If the heuristic returns &lt;CODE&gt;PrivacyClass.RESTRICTED&lt;/CODE&gt;, the prompt is forced local. The router LLM is not called. The cloud provider is not called. The decision is auditable from a single regex pass.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# app/routing/policy.py
def decide(self, prompt: str, correlation_id: str = "") -&amp;gt; RoutingDecision:
    hint, privacy, complexity, h_reason = self._heuristic.classify(prompt)

    # Hard gate: restricted content never leaves the device
    if privacy == PrivacyClass.RESTRICTED:
        return self._make_decision(
            target=RouteTarget.LOCAL,
            confidence=1.0,
            reason=f"Policy hard-gate: {h_reason}",
            privacy=privacy,
            complexity=complexity,
            deterministic=True,
            correlation_id=correlation_id,
        )

    # Hard gate: very high complexity always goes to cloud
    if complexity == ComplexityBand.VERY_HIGH:
        return self._make_decision(
            target=RouteTarget.CLOUD,
            confidence=1.0,
            reason="Policy hard-gate: very_high complexity requires frontier model",
            ...
        )
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is the most important responsible-AI control in the whole system. If your privacy review depends on an LLM correctly classifying every prompt, you do not have a privacy control — you have a probability distribution. Deterministic gates first, model judgement second.&lt;/P&gt;
&lt;H2&gt;Stage 2: a local LLM as the router&lt;/H2&gt;
&lt;P&gt;For everything that passes the privacy gate, a small local model classifies whether the prompt needs frontier capability. This is the bit that surprises most engineers: &lt;EM&gt;you can do useful routing with a 4B parameter model running on a laptop CPU&lt;/EM&gt;. The router does not need to answer the question. It only needs to classify it.&lt;/P&gt;
&lt;P&gt;The reference implementation uses &lt;A href="https://huggingface.co/microsoft/Phi-4-mini-instruct" target="_blank"&gt;phi-4-mini&lt;/A&gt; via Foundry Local. Initialising it is two lines:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# app/providers/local_provider.py (excerpt)
from foundry_local import FoundryLocalManager
from foundry_local.models import Configuration

self._manager = FoundryLocalManager.initialize(
    Configuration(app_name="hybrid-agent")
)
self._router_model = self._manager.load_model(self._config.local_router_alias)
self._chat_client  = self._router_model.get_chat_client()

response = self._chat_client.complete_chat(
    messages=[
        {"role": "system", "content": ROUTER_SYSTEM_PROMPT},
        {"role": "user",   "content": prompt},
    ],
)
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The router prompt asks for a strict JSON response: &lt;CODE&gt;{ "target": "local|cloud", "confidence": 0.0-1.0, "complexity": "low|medium|high|very_high", "reason": "..." }&lt;/CODE&gt;. The application parses it, applies the confidence threshold from config (default 0.6), and falls back to the heuristic decision if the router LLM is unsure or its JSON is malformed. &lt;STRONG&gt;The router never blocks the answer path&lt;/STRONG&gt; — that is a deliberate reliability choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Cloud inference via Microsoft Foundry&lt;/H2&gt;
&lt;P&gt;When the policy returns &lt;CODE&gt;RouteTarget.CLOUD&lt;/CODE&gt;, the request goes through &lt;CODE&gt;AIProjectClient&lt;/CODE&gt;, which gives you an &lt;CODE&gt;openai.OpenAI&lt;/CODE&gt;-compatible client wired to your Foundry project with &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt;. No API keys. No secrets in &lt;CODE&gt;.env&lt;/CODE&gt;.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# app/providers/cloud_provider.py (excerpt)
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

self._project = AIProjectClient(
    endpoint=self._config.foundry_project_endpoint,
    credential=DefaultAzureCredential(),
)
self._openai_client = self._project.get_openai_client()

response = self._openai_client.chat.completions.create(
    model=self._config.foundry_cloud_model_deployment,  # e.g. "gpt-5.4"
    messages=messages,
    max_completion_tokens=max_tokens,
)
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A subtle gotcha worth flagging: gpt-5 and o-series deployments reject the legacy &lt;CODE&gt;max_tokens&lt;/CODE&gt; parameter and require &lt;CODE&gt;max_completion_tokens&lt;/CODE&gt;. They also reject custom &lt;CODE&gt;temperature&lt;/CODE&gt; values. The reference repo handles this by trying the new parameter first and falling back to the legacy one only when the API returns the specific &lt;CODE&gt;unsupported parameter&lt;/CODE&gt; error. That keeps the same code working against older deployments without forking the provider.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Graceful degradation: the fallback paths&lt;/H2&gt;
&lt;P&gt;Hybrid systems fail in interesting ways. The cloud can be down. The local model can throw because the GPU ran out of memory. A reasoning model can return an empty completion. The service handles all of these by attempting the alternative path and labelling the response so observability stays honest:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Cloud route fails → local fallback.&lt;/STRONG&gt; The response carries &lt;CODE&gt;path=LOCAL_FALLBACK&lt;/CODE&gt;, &lt;CODE&gt;fallback=true&lt;/CODE&gt;, and a populated &lt;CODE&gt;fallback_reason&lt;/CODE&gt;. The user gets an answer instead of an error.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Local route fails → cloud fallback,&lt;/STRONG&gt; &lt;EM&gt;but only if privacy class is not RESTRICTED.&lt;/EM&gt; A sensitive prompt that the local model could not handle never leaks to the cloud as a fallback. It returns a clear error instead. This is the second hard gate in the system.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Both fail.&lt;/STRONG&gt; A structured error response with a correlation ID, never a stack trace.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;That last rule — fallback respects privacy class — is the kind of decision that is easy to skip and impossible to bolt on later. Encode it once in the service layer and your privacy reviewers will thank you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;What it looks like in practice&lt;/H2&gt;
&lt;P&gt;The diagnostics panel in the Gradio UI shows the routing decision live: path, model, confidence, latency, privacy class, complexity band, and the full JSON response. Five canonical scenarios shake out the entire decision tree:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;CODE&gt;"hello"&lt;/CODE&gt; → &lt;CODE&gt;path=local, confidence=1.0, complexity=low&lt;/CODE&gt;. Heuristic only. No router LLM call. ~3 seconds end-to-end with phi-4-mini cached.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;"explain transformer self-attention in depth with maths"&lt;/CODE&gt; → &lt;CODE&gt;path=cloud, model=gpt-5.4, complexity=high&lt;/CODE&gt;. Router LLM classifies, hard gate confirms.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;"my password is hunter2, suggest a stronger one"&lt;/CODE&gt; → &lt;CODE&gt;path=local, privacy=restricted, deterministic=true&lt;/CODE&gt;. Privacy gate fires before any model sees it.&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;"summarise this 8 KB document"&lt;/CODE&gt; with cloud unavailable → &lt;CODE&gt;path=cloud_fallback&lt;/CODE&gt; (local handles it, response is labelled).&lt;/LI&gt;
&lt;LI&gt;Complex prompt with local model error → &lt;CODE&gt;path=local_fallback&lt;/CODE&gt;, &lt;CODE&gt;fallback_reason&lt;/CODE&gt; populated.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;You can reproduce all five without any models installed by running &lt;CODE&gt;python -m app.main --demo&lt;/CODE&gt;. The demo mode swaps the providers for deterministic stubs so you can validate the routing logic and the response schema in under a second on any machine.&lt;/P&gt;
&lt;H2&gt;Operational lessons learned&lt;/H2&gt;
&lt;P&gt;Some things the reference implementation only gets right because it got them wrong first:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Pick a non-reasoning model for the router.&lt;/STRONG&gt; Reasoning-tuned local models (Phi-4-reasoning, o-style) wrap their output in &lt;CODE&gt;&amp;lt;think&amp;gt;&lt;/CODE&gt; blocks and blow your JSON parser. &lt;CODE&gt;phi-4-mini&lt;/CODE&gt; is faster and more reliable for classification.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cache the local model.&lt;/STRONG&gt; First load can take 30–60 seconds while Foundry Local downloads weights. Initialise the service once at process startup, not per request.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use correlation IDs everywhere.&lt;/STRONG&gt; The service attaches one per request and the structured JSON logger emits it on every event. When you are debugging a fallback path across two model providers, this is the difference between five minutes and five hours.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Run the privacy heuristic on every fallback path too.&lt;/STRONG&gt; A naive implementation might route locally, fail, and then send the same sensitive prompt to the cloud as a "graceful" fallback. That is not graceful, it is a data leak.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keep configuration in &lt;CODE&gt;.env&lt;/CODE&gt; and out of code.&lt;/STRONG&gt; Privacy mode, fallback toggles, confidence threshold, model aliases — all environment-driven. The &lt;CODE&gt;config.py&lt;/CODE&gt; module is the only place that reads them.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Responsible AI in a hybrid topology&lt;/H2&gt;
&lt;P&gt;Hybrid does not make responsible AI harder, but it does make it different. Three controls earn their keep:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Data residency by default.&lt;/STRONG&gt; The local path keeps prompts and answers on the device. For RESTRICTED content this is mandatory; for everything else it is a free latency and cost win.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Auditability.&lt;/STRONG&gt; Every routing decision is logged with the deterministic reason, the heuristic class, the router LLM output, the confidence, and the correlation ID. You can answer "why did this prompt go to the cloud?" months later.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keyless auth.&lt;/STRONG&gt; &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt; means there is no API key to leak, rotate, or commit by accident. The repo's &lt;CODE&gt;.gitignore&lt;/CODE&gt;, &lt;CODE&gt;SECURITY.md&lt;/CODE&gt;, and pre-push checklist enforce this end-to-end.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Try it&lt;/H2&gt;
&lt;P&gt;Five minutes, no Azure account needed for the demo:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;git clone https://github.com/leestott/fl-mixedmodel.git
cd fl-mixedmodel

python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS / Linux

pip install -r requirements.txt
python -m app.main --demo       # all five scenarios, no models required
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To run with real models, install &lt;A href="https://learn.microsoft.com/azure/ai-foundry/foundry-local/" target="_blank"&gt;Foundry Local&lt;/A&gt;, copy &lt;CODE&gt;.env.example&lt;/CODE&gt; to &lt;CODE&gt;.env&lt;/CODE&gt;, set your &lt;CODE&gt;FOUNDRY_PROJECT_ENDPOINT&lt;/CODE&gt;, then:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;az login
python -m app.main --ui --port 7860
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H2&gt;Where to go next&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Repository:&lt;/STRONG&gt; &lt;A href="https://github.com/leestott/fl-mixedmodel" target="_blank"&gt;github.com/leestott/fl-mixedmodel&lt;/A&gt; — full source, tests, specification, screenshots.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry Local SDK:&lt;/STRONG&gt; &lt;A href="https://pypi.org/project/foundry-local-sdk/" target="_blank"&gt;pypi.org/project/foundry-local-sdk&lt;/A&gt; and the &lt;A href="https://learn.microsoft.com/azure/ai-foundry/foundry-local/" target="_blank"&gt;Foundry Local docs&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure AI Projects SDK:&lt;/STRONG&gt; &lt;A href="https://pypi.org/project/azure-ai-projects/" target="_blank"&gt;pypi.org/project/azure-ai-projects&lt;/A&gt; and the &lt;A href="https://learn.microsoft.com/azure/ai-foundry/" target="_blank"&gt;Microsoft Foundry docs&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Identity:&lt;/STRONG&gt; &lt;A href="https://learn.microsoft.com/python/api/azure-identity/azure.identity.defaultazurecredential" target="_blank"&gt;DefaultAzureCredential reference&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Phi-4-mini:&lt;/STRONG&gt; &lt;A href="https://huggingface.co/microsoft/Phi-4-mini-instruct" target="_blank"&gt;Phi-4-mini on Hugging Face&lt;/A&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Key takeaways&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;The best-practice pattern is a &lt;STRONG&gt;two-stage router&lt;/STRONG&gt;: local model classifies first, then either a local task model or a Microsoft Foundry cloud model answers.&lt;/LI&gt;
&lt;LI&gt;For cloud control, use the &lt;STRONG&gt;Responses API&lt;/STRONG&gt; with either a named deployment (deterministic) or &lt;CODE&gt;model-router&lt;/CODE&gt; (auto-select).&lt;/LI&gt;
&lt;LI&gt;Pin &lt;STRONG&gt;&lt;CODE&gt;foundry-local-sdk &amp;gt;= 1.1.0&lt;/CODE&gt;&lt;/STRONG&gt; (5 May 2026) and &lt;STRONG&gt;&lt;CODE&gt;azure-ai-projects &amp;gt;= 2.1.0&lt;/CODE&gt;&lt;/STRONG&gt;. The 2026 SDK surfaces are not backwards-compatible with pre-2026 samples.&lt;/LI&gt;
&lt;LI&gt;Hybrid inference is a routing problem, not a model problem. A small local model is enough to classify the request.&lt;/LI&gt;
&lt;LI&gt;Deterministic privacy gates beat probabilistic ones. Code the rules; let the LLM judge only what is left.&lt;/LI&gt;
&lt;LI&gt;Return the same response schema from every path. Label fallbacks honestly. Carry a correlation ID everywhere.&lt;/LI&gt;
&lt;LI&gt;Keep auth keyless with &lt;CODE&gt;DefaultAzureCredential&lt;/CODE&gt; and your &lt;CODE&gt;.env&lt;/CODE&gt; out of git.&lt;/LI&gt;
&lt;LI&gt;Test the routing decisions, not just the model outputs. Demo mode and a strong &lt;CODE&gt;pytest&lt;/CODE&gt; suite pay back every time you swap a model.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Hybrid AI is not a compromise between local and cloud. It is the supervisor pattern applied to inference — fast and private where you can be, frontier where you have to be, observable everywhere. The hard part is the contract, not the models.&lt;/P&gt;</description>
      <pubDate>Wed, 27 May 2026 07:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/hybrid-ai-agents-in-python-routing-between-foundry-local-and/ba-p/4522979</guid>
      <dc:creator>Lee_Stott</dc:creator>
      <dc:date>2026-05-27T07:00:00Z</dc:date>
    </item>
    <item>
      <title>Set clear AI expectations for every assignment with Student AI Guidelines</title>
      <link>https://techcommunity.microsoft.com/t5/education-blog/set-clear-ai-expectations-for-every-assignment-with-student-ai/ba-p/4521932</link>
      <description>&lt;H3&gt;&lt;STRONG&gt;The challenge: students don't know where they stand with AI&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Every educator has a different approach to AI in their classroom. Some want students using it freely. Others want AI limited to brainstorming or editing. Some assignments shouldn't involve AI at all.&lt;/P&gt;
&lt;P&gt;The problem? Students are left guessing. Educators have been piecing together workarounds — writing AI policies into assignment instructions, referencing school handbooks, or adding disclaimers to rubrics. None of these are built into the assignment itself, and students often miss them entirely.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Student AI Guidelines in Assignments&lt;/STRONG&gt;&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;Student AI Guidelines give educators a structured way to set AI expectations directly inside an assignment in Microsoft Teams. When creating an assignment, educators now see a new option to set a guideline level with suggested text:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Full AI use allowed.&lt;/STRONG&gt;&amp;nbsp;Students can use Copilot for any part of the assignment.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;AI for editing only.&lt;/STRONG&gt;&amp;nbsp;Students write their own work first, then use Copilot to polish, revise, or check grammar.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;AI for brainstorming only.&lt;/STRONG&gt;&amp;nbsp;Students can use Copilot to generate ideas or explore topics, but the final work should be their own.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No AI.&lt;/STRONG&gt;&amp;nbsp;The assignment should be completed without AI assistance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Student AI Guidelines are available for all grade levels, on desktop and mobile. All students in the assignment see the same guideline.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;A note on what these guidelines are — and aren't.&lt;/STRONG&gt;&amp;nbsp;Student AI Guidelines are a communication tool, not a lockdown. They set clear expectations that students see in the assignment, but they don't technically block access to AI tools. They work the same way a teacher's verbal instruction does: "Here's what I expect for this assignment." The value is in making that expectation visible, consistent, and built into the assignment itself.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;These are starting points, not fixed rules.&lt;/STRONG&gt;&amp;nbsp;Each level comes with suggested text that educators can edit freely to match their school's policies, terminology, or classroom norms. If your school uses different language around AI use — or has its own framework — update the text to reflect that. The feature adapts to your school, not the other way around.&lt;/P&gt;
&lt;P&gt;Even if your school hasn't enabled Copilot, Student AI Guidelines give you a structured way to communicate AI expectations to students — whether that's encouraging responsible AI use or formalizing a no-AI policy.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What students see&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When an educator sets a guideline, students see it in their assignment view — no hunting through instructions or attachments. The guideline card shows the educator's expectations and, for levels that allow AI use, a direct button to launch Copilot Chat.&lt;/P&gt;
&lt;P&gt;The Copilot launch button appears for students aged 13 and older at schools where an IT admin has enabled Copilot. If your school hasn't set up Copilot yet, check out the&amp;nbsp;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=PBMUtNZizM8" target="_blank" rel="noopener"&gt;Copilot setup guide for IT admins&lt;/A&gt;&amp;nbsp;to get started. If Copilot isn't enabled, students still see the guideline — just without the launch button.&lt;/P&gt;
&lt;P&gt;If no guideline is set, nothing changes — the student experience stays exactly as it is today.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Save time: set a default and reuse across classes&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Two features help you avoid repeating setup work:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Set as default.&lt;/STRONG&gt;&amp;nbsp;Any guideline level — including "No AI" — can be set as the default for all new assignments you create. If your school's policy is that most assignments should restrict AI use, set that as your default and you're covered. You can always override it on individual assignments when you want to allow more (or less) AI use.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Import Settings.&lt;/STRONG&gt;&amp;nbsp;Once you've configured your Student AI Guidelines in one class, you can apply those same settings to other classes using Import Settings. This copies your guideline levels and custom text across classes so you don't have to re-create them each time. Learn more:&amp;nbsp;&lt;A class="lia-external-url" href="https://support.microsoft.com/education/import-settings-in-assignments-and-grades" target="_blank" rel="noopener"&gt;Import Settings in Assignments and Grades&lt;/A&gt;.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why this matters&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;This feature sits at the intersection of two things educators have been asking for: clarity around AI use, and an easy on-ramp to Copilot. Instead of competing with third-party AI tools through restriction, Student AI Guidelines give educators a way to channel AI use purposefully — on their terms, per assignment, with clear communication to students.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Resources&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://support.microsoft.com/education/assignments/set-ai-guidelines-assignment-microsoft-teams" target="_blank" rel="noopener"&gt;Set Student AI Guidelines on and assignment in Microsoft Teams&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://support.microsoft.com/education/assignments/manage-student-ai-guidelines-assignments" target="_blank" rel="noopener"&gt;Manage Student AI Guidelines in Assignments&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 May 2026 14:30:34 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/education-blog/set-clear-ai-expectations-for-every-assignment-with-student-ai/ba-p/4521932</guid>
      <dc:creator>Leif Brenne</dc:creator>
      <dc:date>2026-05-26T14:30:34Z</dc:date>
    </item>
    <item>
      <title>CI/CD for AI Agents on Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/ci-cd-for-ai-agents-on-microsoft-foundry/ba-p/4522218</link>
      <description>&lt;H1&gt;Introduction&lt;/H1&gt;
&lt;P&gt;Building an AI agent is the straightforward part. Shipping it reliably to production with version control, evaluation-driven quality gates, multi-environment promotion, and enterprise governance is where most teams run into friction.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt; changes this. It is Microsoft's AI app and agent factory: a fully managed platform for building, deploying, and governing AI agents at scale. It provides a first-class agent runtime with built-in lifecycle management, making it possible to apply the same CI/CD rigour you already use for application software to AI agents — regardless of whether you are building containerised hosted agents or declarative prompt-based agents.&lt;/P&gt;
&lt;P&gt;This post walks through a complete, production-ready reference architecture for doing exactly that. You will find the GitHub Actions workflow, the Azure DevOps pipeline YAML, and the architecture diagram linked throughout.&lt;/P&gt;
&lt;P&gt;Reference implementation repository: &lt;A href="https://github.com/ericchansen/foundry-agents-lifecycle" target="_blank" rel="noopener"&gt;foundry-agents-lifecycle&lt;/A&gt;&lt;BR /&gt;and&amp;nbsp;&lt;A class="lia-external-url" href="https://github.com/leestott/foundry-cicd" target="_blank" rel="noopener"&gt;CI/CD for AI Agents on Microsoft Foundry&lt;/A&gt;&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Why Agent CI/CD Is Different&lt;/H2&gt;
&lt;P&gt;Traditional software pipelines gate releases on test pass/fail. Agent pipelines require an additional, critical layer: &lt;STRONG&gt;evaluation-driven quality gates&lt;/STRONG&gt;. Before any agent version can be promoted to the next environment, it must pass three categories of evaluation:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Quality&lt;/STRONG&gt; — answer correctness, task completion rate, hallucination rate&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Safety&lt;/STRONG&gt; — grounded responses, policy compliance, tool usage validation&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Performance&lt;/STRONG&gt; — token usage per query, p95 response latency&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A second key difference is the &lt;STRONG&gt;deployment unit&lt;/STRONG&gt;. You are not deploying a binary or a container tag in isolation. You are deploying an &lt;EM&gt;agent version&lt;/EM&gt; — an immutable artefact that bundles the model selection, system instructions, tool definitions, and configuration together. This is what enables deterministic promotion and full auditability across environments.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;"Agents follow a standard CI/CD pattern, but with a critical shift: promotion happens at the &lt;STRONG&gt;agent version&lt;/STRONG&gt; level, and release gates are driven by &lt;STRONG&gt;evaluation outcomes&lt;/STRONG&gt;, not just test results."&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;HR /&gt;
&lt;H2&gt;Reference Architecture&lt;/H2&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 1: End-to-end CI/CD reference architecture for hosted and prompt-based agents on Microsoft Foundry.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The architecture has five logical layers, flowing from developer commit to production monitoring:&lt;/P&gt;
&lt;H3&gt;Layer 1 — Developer Layer&lt;/H3&gt;
&lt;P&gt;The developer layer is a standard source-controlled repository in GitHub or Azure DevOps. It contains:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Agent code written in Python or .NET&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;agent.yaml&lt;/CODE&gt; or prompt definition files for prompt-based agents&lt;/LI&gt;
&lt;LI&gt;Tool configurations: MCP servers, REST API connectors, or other integrations&lt;/LI&gt;
&lt;LI&gt;Infrastructure as Code: Bicep or ARM templates for provisioning the Foundry project and dependencies&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Layer 2 — CI Pipeline (Build · Validate · Evaluate)&lt;/H3&gt;
&lt;P&gt;Every push or pull request triggers the CI pipeline. It performs five steps:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Docker build&lt;/STRONG&gt; — for hosted agents, build and tag the container image&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Static checks&lt;/STRONG&gt; — lint with &lt;CODE&gt;ruff&lt;/CODE&gt;, security scan with &lt;CODE&gt;bandit&lt;/CODE&gt;, agent YAML schema validation&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Unit and tool tests&lt;/STRONG&gt; — &lt;CODE&gt;pytest&lt;/CODE&gt; suites covering agent logic and tool integrations&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Evaluation gate&lt;/STRONG&gt; — run evaluation datasets; fail the pipeline if thresholds are breached&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Image push&lt;/STRONG&gt; — push the validated container to Azure Container Registry (ACR)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Prompt-based agents skip the Docker build step. Instead, the YAML definition and prompt bundle are validated against schema and evaluated against golden datasets.&lt;/P&gt;
&lt;H3&gt;Layer 3 — CD Pipeline (Multi-stage Promotion)&lt;/H3&gt;
&lt;P&gt;A single agent version is promoted through three Foundry project environments:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Stage&lt;/th&gt;&lt;th&gt;Environment&lt;/th&gt;&lt;th&gt;Activities&lt;/th&gt;&lt;th&gt;Gate&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Stage 1&lt;/td&gt;&lt;td&gt;Dev Foundry Project&lt;/td&gt;&lt;td&gt;Deploy vNext version, smoke tests, developer evals&lt;/td&gt;&lt;td&gt;Eval quality thresholds&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Stage 2&lt;/td&gt;&lt;td&gt;Test / QA Foundry Project&lt;/td&gt;&lt;td&gt;Scenario tests, HITL validation, safety evaluation&lt;/td&gt;&lt;td&gt;Eval gates + human approval&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Stage 3&lt;/td&gt;&lt;td&gt;Production Foundry Project&lt;/td&gt;&lt;td&gt;Promote version, enable endpoint, post-deploy smoke test&lt;/td&gt;&lt;td&gt;Required reviewer approval&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Rollback is straightforward: switch the active version pointer back to the previous agent version. No re-deployment is needed.&lt;/P&gt;
&lt;H3&gt;Layer 4 — Microsoft Foundry Agent Service&lt;/H3&gt;
&lt;P&gt;The Foundry Agent Service runtime provides:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Hosted agent runtime&lt;/STRONG&gt; — managed container execution supporting Agent Framework, LangGraph, Semantic Kernel, or custom code&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Prompt-based agent runtime&lt;/STRONG&gt; — declarative agent definitions, no container required&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Built-in lifecycle operations&lt;/STRONG&gt; — version, start, stop, rollback&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Entra Agent Identity&lt;/STRONG&gt; — each deployed version receives a dedicated Microsoft Entra managed identity&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;RBAC and policy enforcement&lt;/STRONG&gt; — Azure role-based access controls per project&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Observability&lt;/STRONG&gt; — distributed traces, structured logs, and evaluation signals&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Layer 5 — Monitoring, Governance, and Control Plane&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Foundry control plane: agent registry, environment configuration, version history&lt;/LI&gt;
&lt;LI&gt;OpenTelemetry forwarded to Azure Monitor and Application Insights&lt;/LI&gt;
&lt;LI&gt;Continuous evaluation pipelines for ongoing quality, grounding, and safety monitoring&lt;/LI&gt;
&lt;LI&gt;Azure Policy and RBAC enforcement at the platform level&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;Environment Topology&lt;/H2&gt;
&lt;P&gt;There are two topology options. We recommend &lt;STRONG&gt;Option A&lt;/STRONG&gt; for all production workloads:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Option&lt;/th&gt;&lt;th&gt;Structure&lt;/th&gt;&lt;th&gt;Best for&lt;/th&gt;&lt;th&gt;Trade-off&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;A — Recommended&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Dev Project → Test Project → Prod Project (separate Foundry projects)&lt;/td&gt;&lt;td&gt;Enterprise workloads&lt;/td&gt;&lt;td&gt;Full isolation, clean RBAC boundaries, easier governance&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;B — Lightweight&lt;/td&gt;&lt;td&gt;Single Foundry project with agent version tags (dev/test/prod)&lt;/td&gt;&lt;td&gt;Small teams, prototyping&lt;/td&gt;&lt;td&gt;Simpler setup, but weaker environment separation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Separate projects mean separate RBAC policies, separate connection strings, and separate evaluation signals. A developer service principal has access only to the Dev project; the CI/CD identity has restricted access to promote to Test and Production.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Evaluation Gates — The Core Difference&lt;/H2&gt;
&lt;P&gt;Evaluation gates transform a standard software pipeline into an AI-safe deployment pipeline. They run at two points: pre-merge (CI) and pre-promotion (CD).&lt;/P&gt;
&lt;H3&gt;Defining the Gates&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Category&lt;/th&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;CI threshold&lt;/th&gt;&lt;th&gt;Prod threshold&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Quality&lt;/td&gt;&lt;td&gt;Hallucination rate&lt;/td&gt;&lt;td&gt;&amp;lt; 5%&lt;/td&gt;&lt;td&gt;&amp;lt; 3%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Quality&lt;/td&gt;&lt;td&gt;Task completion rate&lt;/td&gt;&lt;td&gt;&amp;gt; 90%&lt;/td&gt;&lt;td&gt;&amp;gt; 95%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Safety&lt;/td&gt;&lt;td&gt;Grounded response rate&lt;/td&gt;&lt;td&gt;&amp;gt; 95%&lt;/td&gt;&lt;td&gt;&amp;gt; 98%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Safety&lt;/td&gt;&lt;td&gt;Policy violations&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Performance&lt;/td&gt;&lt;td&gt;p95 latency&lt;/td&gt;&lt;td&gt;&amp;lt; 4 000 ms&lt;/td&gt;&lt;td&gt;&amp;lt; 3 000 ms&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost&lt;/td&gt;&lt;td&gt;Token usage per query&lt;/td&gt;&lt;td&gt;Track only&lt;/td&gt;&lt;td&gt;Alert on &amp;gt; 20% regression&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Gate Enforcement (Python)&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;import json
import sys

def check_gates(results_path: str) -&amp;gt; None:
    with open(results_path) as f:
        results = json.load(f)

    failures = []

    if results["hallucination_rate"] &amp;gt; 0.05:
        failures.append(f"Hallucination rate {results['hallucination_rate']:.1%} exceeds 5% threshold")

    if results["task_completion_rate"] &amp;lt; 0.90:
        failures.append(f"Task completion {results['task_completion_rate']:.1%} below 90% threshold")

    if results["latency_p95_ms"] &amp;gt; 4000:
        failures.append(f"p95 latency {results['latency_p95_ms']}ms exceeds 4000ms threshold")

    if results.get("policy_violations", 0) &amp;gt; 0:
        failures.append(f"Policy violations detected: {results['policy_violations']}")

    if failures:
        for f in failures:
            print(f"GATE FAILED: {f}", file=sys.stderr)
        sys.exit(1)

    print("All evaluation gates passed — proceeding to deployment")

if __name__ == "__main__":
    check_gates(sys.argv[1])
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;
&lt;H2&gt;Hosted vs Prompt-Based Agents — Pipeline Differences&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Capability&lt;/th&gt;&lt;th&gt;Hosted Agents&lt;/th&gt;&lt;th&gt;Prompt-Based Agents&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Deployment unit&lt;/td&gt;&lt;td&gt;Container image + agent definition&lt;/td&gt;&lt;td&gt;YAML / prompt configuration bundle&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Build step required&lt;/td&gt;&lt;td&gt;Yes — Docker build + ACR push&lt;/td&gt;&lt;td&gt;No — YAML validation only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Supported frameworks&lt;/td&gt;&lt;td&gt;Agent Framework, LangGraph, Semantic Kernel, custom&lt;/td&gt;&lt;td&gt;Foundry declarative runtime&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Promotion artefact&lt;/td&gt;&lt;td&gt;Versioned agent with container image reference&lt;/td&gt;&lt;td&gt;Versioned prompt/config bundle&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CI focus&lt;/td&gt;&lt;td&gt;Code quality, tool tests, evaluation&lt;/td&gt;&lt;td&gt;Prompt schema validation, evaluation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rollback mechanism&lt;/td&gt;&lt;td&gt;Switch active agent version&lt;/td&gt;&lt;td&gt;Switch active agent version&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Runtime management&lt;/td&gt;&lt;td&gt;Foundry manages container lifecycle&lt;/td&gt;&lt;td&gt;Foundry manages declarative runtime&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;HR /&gt;
&lt;H2&gt;CI Pipeline Walkthrough&lt;/H2&gt;
&lt;P&gt;The following steps are representative of the full GitHub Actions workflow available in &lt;CODE&gt;github-actions-pipeline.yml&lt;/CODE&gt; alongside this post.&lt;/P&gt;
&lt;H3&gt;Hosted Agent CI&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;# 1. Static checks
ruff check .
bandit -r src/ -ll
python scripts/validate_agent_config.py --config agent.yaml

# 2. Tests
pytest tests/unit/ -v --tb=short
pytest tests/tools/ -v --tb=short

# 3. Evaluation gate
python scripts/run_evaluations.py \
    --dataset eval/datasets/golden_set.jsonl \
    --output  eval/results/results.json

python scripts/check_eval_gates.py \
    --results eval/results/results.json \
    --max-hallucination   0.05 \
    --min-task-completion 0.90 \
    --max-latency-p95     4000

# 4. Push container image
az acr build \
    --registry myregistry.azurecr.io \
    --image    "myagent:$SHA" \
    --file     Dockerfile .
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Prompt-Based Agent CI&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;# Validate YAML / prompt definitions
python scripts/validate_agent_config.py --config agent.yaml

# Evaluation against golden dataset
python scripts/run_evaluations.py \
    --dataset eval/datasets/golden_set.jsonl \
    --output  eval/results/results.json

python scripts/check_eval_gates.py \
    --results eval/results/results.json
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;
&lt;H2&gt;CD Pipeline Walkthrough&lt;/H2&gt;
&lt;H3&gt;Stage 1 — Dev Deployment&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;python scripts/deploy_agent.py \
    --env              dev \
    --image            "myregistry.azurecr.io/myagent:$SHA" \
    --foundry-endpoint $FOUNDRY_ENDPOINT_DEV \
    --agent-config     agent.yaml

# Returns the new agent version ID, stored for promotion
AGENT_VERSION=$(python scripts/get_active_version.py --env dev)
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Stage 2 — Promote to Test (after approval gate)&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;python scripts/promote_agent.py \
    --from-env         dev \
    --to-env           test \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_TEST

# Run scenario tests and safety evaluation
python scripts/run_evaluations.py \
    --dataset  eval/datasets/scenario_set.jsonl \
    --output   eval/results/test-results.json

python scripts/check_eval_gates.py \
    --results              eval/results/test-results.json \
    --max-hallucination    0.03 \
    --min-task-completion  0.95
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Stage 3 — Promote to Production (after required reviewer approval)&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;python scripts/promote_agent.py \
    --from-env         test \
    --to-env           prod \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

# Enable the production endpoint
python scripts/enable_agent_endpoint.py \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Rollback&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;# Switch the active version to the previous known-good version
python scripts/promote_agent.py \
    --from-env         prod \
    --to-env           prod \
    --agent-version    $PREVIOUS_AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

# OR delete the failing version
python scripts/delete_agent_version.py \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;
&lt;H2&gt;Deployment Using the Azure AI Projects SDK&lt;/H2&gt;
&lt;P&gt;The &lt;CODE&gt;azure-ai-projects&lt;/CODE&gt; SDK provides programmatic control over the full agent lifecycle. This is the recommended approach for CI/CD scripts where you need deterministic, scriptable deployment.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

# Connect to the Foundry project
client = AIProjectClient(
    endpoint=FOUNDRY_PROJECT_ENDPOINT,
    credential=DefaultAzureCredential()
)

# List existing agents (useful for idempotent deploy scripts)
for agent in client.agents.list():
    print(f"Agent: {agent.name}  version: {agent.id}")

# Create a new agent version (hosted agent)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="my-enterprise-agent",
    instructions="You are a helpful assistant ...",
    tools=[...],  # tool definitions
    metadata={"version": GIT_SHA, "environment": "dev"}
)
print(f"Created agent version: {agent.id}")
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;For hosted agents, the SDK call also references the container image pushed to ACR. Refer to the &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/deploy-hosted-agent" target="_blank" rel="noopener"&gt; Deploy a hosted agent — Microsoft Foundry&lt;/A&gt; documentation for the full SDK flow including container image registration and version polling.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Reference Implementation Stack&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Concern&lt;/th&gt;&lt;th&gt;Technology&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Source control and pipelines&lt;/td&gt;&lt;td&gt;GitHub Actions or Azure DevOps Pipelines&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Infrastructure and agent deployment&lt;/td&gt;&lt;td&gt;Azure Developer CLI (&lt;CODE&gt;azd up&lt;/CODE&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Programmatic agent lifecycle&lt;/td&gt;&lt;td&gt;&lt;CODE&gt;azure-ai-projects&lt;/CODE&gt; Python SDK&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Agent evaluation&lt;/td&gt;&lt;td&gt;&lt;CODE&gt;azure-ai-evaluation&lt;/CODE&gt; Python SDK&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Agent runtime&lt;/td&gt;&lt;td&gt;Microsoft Foundry Agent Service&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Container registry&lt;/td&gt;&lt;td&gt;Azure Container Registry (hosted agents only)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Observability&lt;/td&gt;&lt;td&gt;OpenTelemetry, Azure Monitor, Application Insights&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity and access&lt;/td&gt;&lt;td&gt;Microsoft Entra (Agent ID, OIDC workload identity federation)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Governance&lt;/td&gt;&lt;td&gt;Azure Policy, RBAC, Foundry control plane&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;HR /&gt;
&lt;H2&gt;Governance and Responsible AI&lt;/H2&gt;
&lt;P&gt;Shipping AI agents at enterprise scale requires governance beyond what a traditional CI/CD pipeline provides. Microsoft Foundry addresses this at the platform level:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;RBAC per environment&lt;/STRONG&gt; — each Foundry project has independent access controls. Developers deploy to Dev; only CI/CD service principals (with audited OIDC tokens) can promote to Test and Production.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Agent registry and audit trail&lt;/STRONG&gt; — the Foundry control plane records which agent version is active in each environment, who deployed it, and when. This satisfies enterprise audit requirements without additional tooling.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Content safety and policy enforcement&lt;/STRONG&gt; — Azure Policy governs model access, data handling, and content safety rules at the infrastructure level, not just at the application code level. Policy violations block deployment automatically.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Entra Agent Identity&lt;/STRONG&gt; — each deployed agent version receives a dedicated, short-lived managed identity. Agents authenticate to downstream services using least-privilege credentials scoped to that specific deployment.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Continuous evaluation in production&lt;/STRONG&gt; — evaluation pipelines run on sampled production traffic, alerting when quality, safety, or cost metrics drift from their baseline.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A key trade-off to be transparent about: evaluation datasets must be maintained and updated as the agent's tasks evolve. Stale datasets produce misleading pass/fail signals. Treat your golden evaluation set as a first-class engineering artefact alongside the agent code itself.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Pipeline Files&lt;/H2&gt;
&lt;P&gt;Two pipeline files accompany this reference architecture. Both implement the same four-stage pipeline (CI Build, CI Evaluate, CD Dev, CD Test, CD Production) with environment-appropriate approval gates.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;CODE&gt;&lt;A class="lia-external-url" href="https://github.com/leestott/foundry-cicd" target="_blank" rel="noopener"&gt;github-actions-pipeline.yml&lt;/A&gt;&lt;/CODE&gt;&lt;/STRONG&gt; — GitHub Actions workflow. Uses GitHub Environments for approval gates and OIDC Workload Identity Federation for passwordless Azure authentication. No stored Azure credentials required.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;CODE&gt;&lt;A class="lia-external-url" href="https://github.com/leestott/foundry-cicd" target="_blank" rel="noopener"&gt;azure-devops-pipeline.yml&lt;/A&gt;&lt;/CODE&gt;&lt;/STRONG&gt; — Azure DevOps multi-stage YAML pipeline. Uses ADO Environments with required approvers and variable groups per environment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Both pipelines share these security practices:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;OIDC / Workload Identity Federation — no long-lived Azure credentials stored in pipeline secrets&lt;/LI&gt;
&lt;LI&gt;Per-environment variable groups, each with scoped connection strings and endpoints&lt;/LI&gt;
&lt;LI&gt;Evaluation quality gates enforced before every promotion step&lt;/LI&gt;
&lt;LI&gt;Mandatory human approval before production deployment&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;Summary&lt;/H2&gt;
&lt;P&gt;The full pipeline in one view:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Developer commit
        |
   CI Pipeline
   ├── Docker build (hosted agents) / YAML validation (prompt agents)
   ├── Static checks + unit tests + tool tests
   └── Evaluation gate  ←  quality · safety · performance
        |
   Agent Version created  ← immutable, versioned artefact
        |
   CD Pipeline
   ├── Deploy to Dev       → smoke tests + eval gate
   ├── Promote to Test     → scenario tests + HITL + approval gate
   └── Promote to Prod     → enable endpoint + monitoring
        |
   Microsoft Foundry Agent Service
   └── Versioned runtime · Entra identity · RBAC · Observability
        |
   Control Plane
   └── Agent registry · Governance · Continuous evaluation
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Microsoft Foundry provides the platform primitives — versioned agent deployments, multi-environment Foundry projects, built-in lifecycle management, and an enterprise observability stack — needed to operate AI agents with the same confidence as any production software system.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The key takeaway:&lt;/STRONG&gt; treat the agent version as your deployment artefact, and evaluation outcomes as your release gate. The rest follows familiar CI/CD patterns you already know and trust.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Next Steps&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Clone the CI/CD Repo at &lt;A href="https://github.com/leestott/foundry-cicd" target="_blank" rel="noopener"&gt;leestott/foundry-cicd&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Clone the reference demo:&amp;nbsp;&lt;A href="https://github.com/ericchansen/foundry-agents-lifecycle" target="_blank" rel="noopener"&gt;foundry-agents-lifecycle on GitHub&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Set up your environment: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/environment-setup" target="_blank" rel="noopener"&gt;Set up your environment for Foundry Agent Service&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Deploy your first hosted agent: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/quickstarts/quickstart-hosted-agent" target="_blank" rel="noopener"&gt;Quickstart: Deploy your first hosted agent&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Understand hosted agent concepts: &lt;A href="https://learn.microsoft.com/en-us/agent-framework/hosting/foundry-hosted-agent" target="_blank" rel="noopener"&gt;Foundry Hosted Agents concepts&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Automate deployments in CI/CD: &lt;A href="https://microsoft.github.io/TechWorkshop-L300-AI-Apps-and-agents/docs/05_agentic_devops/05_02.html" target="_blank" rel="noopener"&gt;Automate deployment of Microsoft Foundry agents&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Manage agent versions: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/manage-hosted-agent" target="_blank" rel="noopener"&gt;Manage hosted agents — Microsoft Foundry&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Deploy via SDK: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/deploy-hosted-agent" target="_blank" rel="noopener"&gt;Deploy a hosted agent — Microsoft Foundry&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;SDK and endpoint reference: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/sdk-endpoints-reference" target="_blank" rel="noopener"&gt;Microsoft Foundry SDK and Endpoints reference&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Azure AI Projects SDK: &lt;A href="https://learn.microsoft.com/en-us/python/api/overview/azure/ai-projects-readme" target="_blank" rel="noopener"&gt;azure-ai-projects Python SDK&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Azure Developer CLI: &lt;A href="https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/overview" target="_blank" rel="noopener"&gt;Azure Developer CLI (azd) overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Microsoft Foundry documentation hub: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/" target="_blank" rel="noopener"&gt;Microsoft Foundry on Microsoft Learn&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 22 May 2026 08:50:02 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/ci-cd-for-ai-agents-on-microsoft-foundry/ba-p/4522218</guid>
      <dc:creator>Lee_Stott</dc:creator>
      <dc:date>2026-05-22T08:50:02Z</dc:date>
    </item>
    <item>
      <title>Meet Dileepa Bandara</title>
      <link>https://techcommunity.microsoft.com/t5/student-developer-blog/meet-dileepa-bandara/ba-p/4522125</link>
      <description>&lt;P&gt;&lt;EM&gt;This is the next segment of our blog series highlighting Microsoft Student Ambassadors who achieved the Gold milestone, the highest level attainable, and have recently graduated from university. Each blog in the series features a different student and highlights their accomplishments, their experience with the Student Ambassador community, and what they are up to now. &lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;How has being part of the Microsoft Student Ambassadors community helped you connect with other tech enthusiasts and expand your global network? &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Before beginning my journey as a Microsoft Student Ambassador in 2022, I had only a few connections on campus. Joining the program transformed my network and opened doors I had never imagined. It gave me the opportunity to connect with student organizations, collaborate with peers, and even establish new student-led communities inspired by the program.&lt;/P&gt;
&lt;P&gt;Through this experience, I founded and led initiatives such as the NIBM Computing Society, NIBM FOSS Community, and Microsoft Student Club – NIBM. These initiatives not only strengthened the tech culture on campus but also expanded my network across the country through the recognition and credibility of the Microsoft Student Ambassadors program.&lt;/P&gt;
&lt;P&gt;Beyond my university, I connected with like-minded individuals and communities such as Microsoft Student Ambassadors Sri Lanka, Sri Lanka Developer Forum, and AI Community Sri Lanka. One of the most memorable milestones was delivering sessions at the Microsoft Sri Lanka office, a lifetime experience where I met inspiring professionals.&lt;/P&gt;
&lt;P&gt;Through these connections, I also had opportunities to volunteer and conduct sessions for industry-level audiences. From university students to professionals, I built meaningful relationships locally and globally, including connections formed through international sessions and LinkedIn networking.&lt;/P&gt;
&lt;P&gt;Being part of this community has truly expanded my global network and strengthened my passion for technology.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What impact did access to Microsoft tools and resources - like Azure credits, Visual Studio Enterprise, and Copilot - have on your journey as an AI-focused tech leader?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Before becoming a Microsoft Student Ambassador, I had never used Microsoft Azure because I couldn’t afford paid cloud services as a student. Receiving Azure credits through the program removed that barrier and became a turning point in my journey as an AI-focused tech leader.&lt;/P&gt;
&lt;P&gt;With Azure, I was able to explore AI services such as Foundry, Azure Machine Learning, and cloud-based deployments. I built proof-of-concept solutions, intelligent apps, and demo projects that addressed real-world problems. This hands-on experimentation deepened my understanding of model deployment, scalability, and responsible AI practices skills essential for leading AI initiatives.&lt;/P&gt;
&lt;P&gt;As I grew technically, I also grew as a leader. I used these tools to conduct workshops, mentor peers, and guide student communities in building AI-powered solutions. Instead of just learning AI concepts, I was enabling others to adopt them.&lt;/P&gt;
&lt;P&gt;Visual Studio Enterprise enhanced my development workflow, while Microsoft 365 improved collaboration and project management. Copilot accelerated my coding, supported debugging, and helped me explore new AI frameworks efficiently.&lt;/P&gt;
&lt;P&gt;These resources did more than support my learning; they empowered me to innovate, lead AI-driven communities, and confidently position myself as an emerging AI-focused tech leader.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In what ways has the Student Ambassadors program strengthened your resume, boosted your confidence, or helped you develop AI-ready skills for your future career?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We are living in the AI era, where competition is intense and building a strong career path requires more than just academic knowledge. The &lt;A class="lia-external-url" href="https://mvp.microsoft.com/studentambassadors" target="_blank"&gt;Microsoft Student Ambassadors program&lt;/A&gt; became a turning point during this critical phase of my journey.&lt;/P&gt;
&lt;P&gt;Through continuous learning, hands-on experimentation, and knowledge sharing, I was able to build strong foundations in cloud computing and AI. The program encouraged me not only to consume knowledge but to apply it through real projects, workshops, and community initiatives. This practical exposure significantly strengthened my resume and helped me showcase measurable impact rather than just technical interest.&lt;/P&gt;
&lt;P&gt;One of the biggest advantages was access to free Microsoft certification exams. Earning these certifications validated my skills, increased my professional credibility, and made my profile stand out in a competitive job market. As a result, I secured a role as an Associate AI Engineer, marking a major milestone in my career.&lt;/P&gt;
&lt;P&gt;Beyond technical growth, the program boosted my confidence. Speaking at events, leading communities, mentoring peers, and collaborating with industry professionals shaped my communication and leadership abilities.&lt;/P&gt;
&lt;P&gt;Overall, the Student Ambassadors program equipped me with AI-ready technical skills, industry-recognized credentials, leadership experience, and the confidence to pursue my future career in AI with clarity and purpose.&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2026 05:22:47 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/student-developer-blog/meet-dileepa-bandara/ba-p/4522125</guid>
      <dc:creator>StudentDeveloperTeam</dc:creator>
      <dc:date>2026-05-22T05:22:47Z</dc:date>
    </item>
    <item>
      <title>Meet Azmat Ullah</title>
      <link>https://techcommunity.microsoft.com/t5/student-developer-blog/meet-azmat-ullah/ba-p/4522123</link>
      <description>&lt;P&gt;&lt;EM&gt;This is the next segment of our blog series highlighting Microsoft Student Ambassadors who achieved the Gold milestone, the highest level attainable, and have recently graduated from university. Each blog in the series features a different student and highlights their accomplishments, their experience with the Student Ambassador community, and what they are up to now.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;How has being part of the Microsoft Student Ambassadors community helped you connect with other tech enthusiasts and expand your global network? &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Being part of the Microsoft Student Ambassadors community has transformed my journey from a local learner into a truly global collaborator. I began my ambassador experience in the APAC region in 2021, where I connected with passionate technologists across South Asia and Southeast Asia. In 2023, transitioning to the Americas region expanded my perspective even further, allowing me to bridge communities across continents.&lt;/P&gt;
&lt;P&gt;Through co-hosting events with ambassadors from India, Pakistan, Nepal, Sri Lanka, Malaysia, the Middle East, and Latin America, I experienced firsthand how technology dissolves borders. These collaborations didn’t just amplify impact; they built lasting relationships. I was also invited to speak at events hosted by Middle Eastern and Latin American ambassadors, giving me a platform to share knowledge while learning from diverse global voices.&lt;/P&gt;
&lt;P&gt;Microsoft Learn became a common language for this network, helping me continuously skill up and mentor learners worldwide. Over the past five years, these experiences have helped me build over thousands of meaningful connections, including fellow ambassadors, Microsoft staff, and MVPs. More than a community, the Microsoft Student Ambassadors program became my gateway to a global ecosystem driven by learning, collaboration, and shared growth.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What impact did access to Microsoft tools and resources - like Azure credits, Visual Studio Enterprise, and Copilot - have on your journey as an AI-focused tech leader?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Access to Microsoft tools shaped how I grew into an AI focused tech leader. Azure credits gave me the confidence to move from learning to building real systems. I was able to deploy AI models, host live demos, and test ideas on a scale. As an AI Project Lead, this hands-on experience taught me how real-world AI solutions behave beyond theory.&lt;/P&gt;
&lt;P&gt;The monthly credits also helped me run practical workshops on Azure Databases and cloud fundamentals. During my time as a Microsoft Student Trainer, I created sandbox environments where learners could experiment without fear. Watching others gain confidence through these environments reminded me that leadership in AI is about enabling people, not just building technology.&lt;/P&gt;
&lt;P&gt;Visual Studio Enterprise and Copilot supported me every day. They helped me prototype faster, debug more clearly, and mentor more effectively. By removing barriers to experimentation, Microsoft tools allowed me to focus on impact. They helped me grow with empathy, technical depth, and a strong sense of responsibility toward the AI community I serve.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In what ways has the Student Ambassadors program strengthened your resume, boosted your confidence, or helped you develop AI-ready skills for your future career?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The &lt;A class="lia-external-url" href="https://mvp.microsoft.com/studentambassadors" target="_blank"&gt;Microsoft Student Ambassadors program&lt;/A&gt; played a major role in shaping my confidence, skills, and future career. Microsoft Learn helped me start from scratch and build a strong foundation, eventually growing into a subject matter expert on topics like Databases, Cloud Computing &amp;amp; AI. The structured learning paths and hands-on practice made complex technologies approachable and practical.&lt;/P&gt;
&lt;P&gt;Support from Microsoft Cloud Advocates and the program staff helped me narrow my focus and refine my strengths. Skill-based training sessions challenged me to think deeper and build with intent. AI Cloud Skills Challenges and certifications strengthened my technical knowledge and added strong credibility to my resume.&lt;/P&gt;
&lt;P&gt;Hosting global workshops significantly improved my communication and leadership skills. Engaging with diverse audiences taught me how to explain technical concepts clearly while keeping equity, diversity, and inclusion at the center. It helped me grow as a thoughtful and inclusive leader.&lt;/P&gt;
&lt;P&gt;Throughout this journey, Copilot became a trusted learning companion. It helped me explore ideas faster, sharpen my thinking, and stay AI ready. Together, these experiences transformed me into a confident, future-reading AI professional.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2026 05:18:31 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/student-developer-blog/meet-azmat-ullah/ba-p/4522123</guid>
      <dc:creator>StudentDeveloperTeam</dc:creator>
      <dc:date>2026-05-22T05:18:31Z</dc:date>
    </item>
    <item>
      <title>Student Devs: Build AI Agents, Compete for $55K in Prizes</title>
      <link>https://techcommunity.microsoft.com/t5/educator-developer-blog/student-devs-build-ai-agents-compete-for-55k-in-prizes/ba-p/4521764</link>
      <description>&lt;H1&gt;Student Devs: Build AI Agents, Compete for $55K in Prizes&lt;/H1&gt;
&lt;P&gt;&lt;EM&gt;🎮 AI Skills Fest • June 4–14, 2026 • Free to Enter&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;$55K&lt;/STRONG&gt;&lt;BR /&gt;Prize Pool&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;3&lt;/STRONG&gt;&lt;BR /&gt;Challenge Tracks&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;10&lt;/STRONG&gt;&lt;BR /&gt;Days of Hacking&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Free&lt;/STRONG&gt;&lt;BR /&gt;To Enter&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;Whether you're a first-year CS student or a final-year senior with a portfolio full of projects, &lt;STRONG&gt;Agents League&lt;/STRONG&gt; is the best way to gain hands-on experience with agentic AI this summer and walk away with real skills employers are hiring for right now.&lt;/BLOCKQUOTE&gt;
&lt;HR /&gt;
&lt;H2&gt;What You'll Actually Learn&lt;/H2&gt;
&lt;P&gt;Forget passive tutorials. Agents League is project-based learning at full speed. By the end of the hackathon, you'll have built a working AI agent and gained practical experience with the tools shaping the future of software development.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;🤖 AI-Assisted Development&lt;/STRONG&gt;&lt;BR /&gt;Use GitHub Copilot to accelerate your coding workflow — from scaffolding to debugging — the way professional developers do today.&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;🧩 Multi-Step Reasoning&lt;/STRONG&gt;&lt;BR /&gt;Build agents with Microsoft Foundry that can plan, reason, and execute complex tasks — the core of agentic AI.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;🏢 Enterprise AI Patterns&lt;/STRONG&gt;&lt;BR /&gt;Learn to build production-ready agents that integrate with Microsoft 365 and Copilot Studio — skills that translate directly to industry jobs.&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;🔧 Prompt Engineering&lt;/STRONG&gt;&lt;BR /&gt;Design effective prompts and orchestration flows that make AI agents reliable and useful in the real world.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;📦 GitHub Workflows&lt;/STRONG&gt;&lt;BR /&gt;Submit your project through GitHub — practising version control, README writing, and open-source collaboration.&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;🎯 Competitive Problem-Solving&lt;/STRONG&gt;&lt;BR /&gt;Work under real constraints with deadlines, judging criteria, and peer competition — just like industry hackathons and sprints.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;HR /&gt;&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Pick Your Track (or Try All Three)&lt;/H2&gt;
&lt;P&gt;Agents League has three challenge tracks, each using different Microsoft AI tools. Choose based on your interests or stretch yourself by competing in multiple tracks.&lt;/P&gt;
&lt;H3&gt;Track 01. Creative Apps&lt;/H3&gt;
&lt;P&gt;Build an innovative application with AI-assisted development. This track rewards creativity, dream big and let GitHub Copilot help you bring ideas to life faster than ever.&lt;BR /&gt;&lt;STRONG&gt;Tool:&lt;/STRONG&gt; GitHub Copilot&lt;/P&gt;
&lt;H3&gt;Track 02. Reasoning Agents&lt;/H3&gt;
&lt;P&gt;Create intelligent agents that solve complex problems through multi-step reasoning. Think: agents that can research, plan, and act. This is the cutting edge of AI.&lt;BR /&gt;&lt;STRONG&gt;Tool:&lt;/STRONG&gt; Microsoft Foundry&lt;/P&gt;
&lt;H3&gt;Track 03. Enterprise Agents&lt;/H3&gt;
&lt;P&gt;Build knowledge agents that integrate with Microsoft 365 Copilot. Learn how businesses are deploying AI today and add enterprise AI to your skillset.&lt;BR /&gt;&lt;STRONG&gt;Tool:&lt;/STRONG&gt; Copilot Studio • M365&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Opportunities You Won't Want to Miss&lt;/H2&gt;
&lt;P&gt;Agents League isn't just a competition, it's a launchpad. Here's what's in it for you beyond the code:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;💰 Win from a $55,000 USD Prize Pool&lt;/STRONG&gt;&lt;BR /&gt;Prizes are awarded across all three tracks smaller teams and solo hackers have a real shot.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;📺 Watch Live Coding Battles at Microsoft Reactor&lt;/STRONG&gt;&lt;BR /&gt;See industry experts go head-to-head building AI agents live. Learn advanced techniques you can apply immediately to your own project.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;🎓 Free Learning Resources on Microsoft Learn&lt;/STRONG&gt;&lt;BR /&gt;Access curated learning paths and the AI Skills Navigator, structured content designed to get you from zero to submission-ready.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;🌍 Join a Global Developer Community&lt;/STRONG&gt;&lt;BR /&gt;Connect with thousands of developers on the Agents League Discord. Find teammates, ask questions, and build your professional network.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;📂 Build Your Portfolio with a Real Project&lt;/STRONG&gt;&lt;BR /&gt;Every submission lives on GitHub. Walk away with a polished, public project that demonstrates your AI skills to future employers and grad schools.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;🏆 Gain Recognition from Microsoft and the Community&lt;/STRONG&gt;&lt;BR /&gt;Top projects get visibility across the Microsoft developer ecosystem. Stand out from the crowd in internship and job applications.&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;Key Dates to Remember&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Event&lt;/th&gt;&lt;th&gt;Date&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hacking Period Opens&lt;/td&gt;&lt;td&gt;June 4, 2026&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Registration Deadline&lt;/td&gt;&lt;td&gt;June 12, 2026 — 12:00 PM PT&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Submission Deadline&lt;/td&gt;&lt;td&gt;June 14, 2026 — 11:59 PM PT&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;HR /&gt;
&lt;H2&gt;How to Get Started (Right Now)&lt;/H2&gt;
&lt;P&gt;You don't have to wait until June 4th to start preparing. Here's your pre-hackathon game plan:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Register for the hackathon&lt;/STRONG&gt;&amp;nbsp; it's free and open to everyone.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pick a track&lt;/STRONG&gt; that matches your interests or curiosity.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Explore the learning resources&lt;/STRONG&gt; on Microsoft Learn and the AI Skills Navigator.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Join the Discord community&lt;/STRONG&gt; to find teammates and get early tips.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Watch the Reactor event series&lt;/STRONG&gt; for live coding battles and expert walkthroughs.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Set up your GitHub repo&lt;/STRONG&gt; and start experimenting before the hacking window opens.&lt;/LI&gt;
&lt;/OL&gt;
&lt;HR /&gt;
&lt;H2&gt;Helpful Links&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/agents-league" target="_blank" rel="noopener"&gt;Register for Agents League&lt;/A&gt;&amp;nbsp; Free entry, sign up now&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://developer.microsoft.com/en-us/reactor/" target="_blank" rel="noopener"&gt;Microsoft Reactor Events&lt;/A&gt;&amp;nbsp; Live coding battles &amp;amp; workshops&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aiskillsfest.com" target="_blank" rel="noopener"&gt;AI Skills Fest&lt;/A&gt; The broader event&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com" target="_blank" rel="noopener"&gt;Microsoft Learn&lt;/A&gt; Free learning paths&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2&gt;The Arena Awaits 🏆&lt;/H2&gt;
&lt;P&gt;Ten days. Three tracks. $55K in prizes. Whether you go solo or squad up, this is your chance to build something real with AI and have a blast doing it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://aka.ms/agents-league" target="_blank" rel="noopener"&gt;Register Now It's Free&lt;/A&gt;&lt;/STRONG&gt; &amp;nbsp;|&amp;nbsp; &lt;A href="https://developer.microsoft.com/en-us/reactor/" target="_blank" rel="noopener"&gt;Watch Reactor Events&lt;/A&gt;&lt;/P&gt;
&lt;HR /&gt;
&lt;P&gt;&lt;SMALL&gt; Agents League is part of &lt;A href="https://aiskillsfest.com" target="_blank" rel="noopener"&gt;AI Skills Fest&lt;/A&gt; and is open to the public at no cost.&lt;BR /&gt;Review the &lt;A href="https://aka.ms/agents-league" target="_blank" rel="noopener"&gt;Hackathon Rules and Regulations&lt;/A&gt; and the &lt;A href="https://developer.microsoft.com/en-us/reactor/" target="_blank" rel="noopener"&gt;Microsoft Event Code of Conduct&lt;/A&gt; before participating. &lt;/SMALL&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 May 2026 07:23:46 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/educator-developer-blog/student-devs-build-ai-agents-compete-for-55k-in-prizes/ba-p/4521764</guid>
      <dc:creator>Lee_Stott</dc:creator>
      <dc:date>2026-05-21T07:23:46Z</dc:date>
    </item>
  </channel>
</rss>

