<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>rss.livelink.threads-in-node</title>
    <link>https://techcommunity.microsoft.com/t5/microsoft-foundry/ct-p/azure-ai-foundry</link>
    <description>rss.livelink.threads-in-node</description>
    <pubDate>Sat, 20 Jun 2026 06:10:01 GMT</pubDate>
    <dc:creator>azure-ai-foundry</dc:creator>
    <dc:date>2026-06-20T06:10:01Z</dc:date>
    <item>
      <title>Cross-Region Model Connectivity Options in Microsoft Foundry: Supported Patterns and Tradeoffs</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/cross-region-model-connectivity-options-in-microsoft-foundry/ba-p/4528620</link>
      <description>&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Model availability in Microsoft Foundry is region-dependent. The region approved for your project may not be the one where the model or Foundry Agent Service support you need is available. That creates a common cross-region design choice: connect directly to another Foundry resource, or add a gateway layer for more control and governance.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This article complements the official Bring Your Own Model guidance by focusing specifically on cross-region scenarios with Foundry resources, including practical architecture tradeoffs and enterprise deployment patterns.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Executive summary&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&lt;SPAN data-contrast="auto"&gt;Foundry supports more than one way to access models across regions. The simplest option is when you add another Foundry resource as a connection and access models created in the foundry resource. An &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Azure API Management (APIM)&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; connection adds a customer-controlled gateway for routing, policy, observability, and network isolation. This article explains the tradeoffs and then walks through an APIM-based implementation in a VNet-secured topology and &lt;SPAN data-olk-copy-source="MessageBody"&gt;cross-region patterns for Azure AI Foundry, including how to connect Foundry agents (agents built with the Foundry Agent Service) to models in other regions&lt;/SPAN&gt;.&lt;/SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Architecture options &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;at a glance&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&lt;SPAN data-contrast="auto"&gt;Before going deeper on the APIM implementation, it helps to visualize the main options. The diagrams below show different options. The VNet-secured variant is covered in more details later in this article.&lt;/SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Direct &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Connection &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;(&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;N&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;o gateway)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;img&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;73c1f9d8-8b0b-56b7-b4e6-b3e744084a67|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777843,&amp;quot;Aptos&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman,Aptos&amp;quot;,268442635,&amp;quot;24&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,469775450,&amp;quot;msocodeblock&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;msocodeblock&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;Use when the goal is to reach models in another Foundry resource with the fewest moving parts.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/img&gt;
&lt;H3&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Azure API Management AI Gateway&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;img&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Use when the platform team wants a governed model surface and a customer-owned control plane in front of backend model traffic.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/img&gt;
&lt;H3&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;VNet&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;-secured APIM variant&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;img&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;73c1f9d8-8b0b-56b7-b4e6-b3e744084a67|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777843,&amp;quot;Aptos&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman,Aptos&amp;quot;,268442635,&amp;quot;24&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,469775450,&amp;quot;msocodeblock&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;msocodeblock&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;In this variant, APIM ingress is itself fronted by a private endpoint in the project &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;VNet's&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; `pe-subnet`, resolved via the `privatelink.azure-api.net` private DNS zone. Combined with the backend Foundry account's private endpoint, the entire request path — caller → APIM → backend — stays on the Azure backbone with no public hostname in the loop.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/img&gt;
&lt;H2&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Choosing the right cross-region model pattern&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The table below compares different options and their tradeoffs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Pattern&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Best fit&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Control plane and auth model&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Tradeoffs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Directly connecting to another Foundry resource&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Best when a team needs the simplest supported path to models in another Foundry resource and does not need an additional gateway tier.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The model connection points directly to the backend Foundry project endpoint. Authentication can use API key or managed identity, and model exposure is controlled by the connection configuration.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Lowest architectural overhead, but less room for a platform-owned policy, routing, throttling, or observability layer on every model call.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure API Management model connection&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Best when the platform team wants a stable gateway surface in front of model traffic and expects routing, throttling, centralized policy, or richer telemetry requirements.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The model connection targets an Azure API Management endpoint that fronts the backend Foundry resource. Azure API Management can normalize auth, apply policies, route across backends, and enforce governance before forwarding the call.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Adds another service hop and requires Azure API Management design, subnet planning, and operations ownership.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure API Management in front of the agent surface&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Governance and telemetry on agent ingress — throttling, rate-limit-by-key, request/response logging, schema validation, regional egress policy.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Inbound traffic into the agent endpoint, but Foundry continues to validate the caller's token directly so MCP On-Behalf-Of, customer-data tool authorization, and per-user audit trails still work. The underlying model calls stay inside Foundry-managed surfaces.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;APIM cannot replace identity on the agent surface today — Foundry needs to see the caller (at least for MCP OBO). Treat APIM here as a governance and observability layer, not an auth layer.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;In practice, teams have two supported ways to reach models across regions. Direct connections are the simplest starting point. Azure API Management is useful when the platform team wants a governed model surface with centralized routing, policy, or observability. The right choice depends on ownership, control needs, and operational overhead.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;What this architecture proves: agents, models, and tracing still work&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This architecture answers a key question: does the experience still hold together when models are reached cross-region through a gateway? In this setup, the answer is yes. Both Foundry agents (state-bearing agent runs in the project Playground) and prompt agents (Responses-API invocations through the same BYO connection) execute end-to-end against models reached through APIM.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Operational workflows also remain intact. APIM tracing shows the rewritten backend URL and managed-identity token path for each model call. Per-request values such as &lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN data-contrast="auto"&gt;apim-request-id&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN data-contrast="auto"&gt; and &lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN data-contrast="auto"&gt;apim-trace-id&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN data-contrast="auto"&gt;&lt;EM&gt; &lt;/EM&gt;continue to flow through the standard APIM telemetry surfaces that platform teams already use. The routing policy is reusable across model deployments because the &lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN data-contrast="auto"&gt;/inference/deployments/{deploymentName}/chat/completions&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN data-contrast="auto"&gt; path is parameterized. APIM does not need a separate API or policy for every backend model that follows the same inference route. To expose a new model end to end, the model still needs to be deployed on the backend Foundry account and registered in the project-side AI Gateway or connected-model configuration so agents can select it.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This URL-parameterized pattern works as written for the chat-completions surface, where the deployment name lives on the URL. The Responses API carries the model name in the JSON payload ("model": "&amp;lt;deployment&amp;gt;"), so a single APIM route cannot fan out by URL parameter alone — APIM would need a body-inspection policy that reads model from the request JSON and rewrites the backend per call. The cleaner answer for that surface is Foundry's dynamic model connection, which resolves model → backend on the project side at invocation time.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Dynamic connection&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:90}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:90}"&gt;&lt;SPAN data-contrast="auto"&gt;Dynamic model connection (Responses API). For payload-based routing — Responses API, any future surface that carries the model name in the body — Foundry's dynamic connection resolves model → backend on the project side at invocation time. The advantage is that APIM never has to inspect the request body and the AI Gateway can stay payload-agnostic. The trade-off is that the routing decision moves back into the project, so the single observable APIM URL no longer covers every model call. Teams that want centralized APIM control should stay with the parameterized chat-completions template; teams adopting the Responses API at scale should pair APIM (for backend access + Managed Identity) with the dynamic connection (for per-call dispatch).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:90}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&lt;SPAN data-contrast="none"&gt;BYOM feature compatibility at the time of publication&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This table summarizes how current Foundry features interact with BYOM via APIM at the time this post is published. The experience continues to evolve, and teams across Foundry and related platform areas are actively improving coverage and integration over time.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry feature&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Status with BYOM via APIM&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Notes&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Chat completions / Responses API&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Works&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Both surfaces invoke the BYO connection like any HTTP backend; APIM forwards to the backend Foundry account and the response is returned unchanged.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry agents (Assistants / Threads / Runs)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Works&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Agent state lives on the project side; only the model call traverses APIM. The agent runtime, tool calls, and conversation memory remain managed by Foundry.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Prompt agents (Responses API)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Works&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Prompt-agent invocations go through the BYO connection on every call. No project-side state is required beyond the prompt definition.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;1P on-behalf-of tools (SharePoint Grounding, Fabric Data Agent, etc.)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;❌&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Not supported&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry returns bad_request because forwarding a 1P-token tenant context to a non-Microsoft-managed model endpoint would leak it outside the Microsoft trust boundary. Use a directly-connected Foundry deployment for those tools, or split into two agents (one for 1P-tool calls, one for BYOM calls).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Evaluations (offline, on a dataset)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Partial&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Eval runs that call the model via the BYO connection execute fine. Built-in eval quality metrics that rely on Foundry-side model introspection (token-level scoring, log-probs, internal model metadata) may degrade because Foundry only sees the request/response surface that APIM exposes.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Continuous evaluations (production)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Partial&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Same caveat as offline evals. Sampling/scoring works on the request-response pair; deeper telemetry that needs first-party model metadata is reduced.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Red teaming / adversarial testing&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;✅&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Works&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Red-team probes are payload-level — they invoke the BYO connection like any other caller and score the response. No first-party hook into the model deployment is required.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry IQ (model evaluation suggestions)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Reduced fidelity&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;IQ recommendations rely on Foundry's view of model usage patterns. BYO traffic shows up as opaque connection-level usage, so recommendations are coarser than for directly-connected deployments.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Content Safety in Foundry Control Plane &lt;SPAN data-contrast="auto"&gt;(input/output filters)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Configure on APIM&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry-side content filters do not apply to BYO connections automatically. Either attach the&lt;SPAN data-olk-copy-source="MessageBody"&gt;&amp;nbsp;Content Safety&amp;nbsp;&lt;/SPAN&gt;policy at APIM (recommended — every BYO caller is scanned in one place) or invoke Content Safety as an explicit pre/post step in the agent.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Prompt Shields (jailbreak / indirect injection)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Configure on APIM&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Same as&amp;nbsp;&lt;/SPAN&gt;Content Safety in Foundry Control Plane &lt;SPAN data-contrast="auto"&gt;— attach the Prompt Shields policy at APIM so every request is scanned before the model sees it, regardless of which agent or caller originated it.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Fine-tuning&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;❌&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Not applicable to the connection&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Fine-tuning targets a specific deployment on a Foundry account, not a connection. Fine-tune on the backend Foundry account directly; once the new deployment exists, it becomes routable through the same parameterised /inference/deployments/{deploymentName}/chat/completions template with no APIM change.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Batch API&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;⚠️&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; Direct only&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Batch jobs use long-running deployment-scoped APIs that don't fit the synchronous APIM passthrough model. Submit batch jobs directly against the backend Foundry account; APIM is bypassed for those calls.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Sample implementation topology &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;for APIM in a &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;vnet&lt;/SPAN&gt; &lt;SPAN data-ccp-parastyle="heading 2"&gt;and core components&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This implementation uses two Foundry resources with distinct roles. The project-side resource hosts the agent-facing project in the approved region and remains VNet-integrated. The backend resource hosts model deployments in the capacity region and is exposed only through a private endpoint. APIM sits between them as the customer-owned gateway, with private DNS resolving backend FQDNs to private IPs.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/16-private-network-standard-agent-apim-setup" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Template 16&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; provides the baseline private-network and APIM setup for Foundry. This extension builds on that template by adding one more subnet for the backend Foundry model resource, so APIM can reach the backend privately in a cross-region setup.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The request flow is straightforward: the agent runs in the project-side Foundry resource, the project calls APIM through the AI Gateway connection, and APIM forwards the request privately to the backend Foundry account that hosts the model deployment. The table below maps each component in that path to its role in the overall design.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Component&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Resource&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Notes&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Project resource&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&amp;lt;foundry-account&amp;gt; / &amp;lt;project-name&amp;gt; (project region, e.g., canadaeast)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Hosts the agent-facing project in the approved region. The project remains VNet-integrated and uses a system managed identity.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;APIM AI Gateway&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&amp;lt;apim-name&amp;gt; (StandardV2)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Serves as the customer-owned gateway between the project and the backend model resource. Its system-assigned managed identity is granted Cognitive Services User on the backend account so it can forward model calls without API keys.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Backend account&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&amp;lt;backend-account&amp;gt; (backend region, e.g., japaneast)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Hosts the model deployments in the capacity region. After setup, public network access is disabled so model traffic reaches the account only through the private path.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Private endpoint&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&amp;lt;backend-account&amp;gt;-pe in the backend-pe subnet&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Provides a private, cross-region path from the project VNet to the backend Foundry account. Private DNS resolves the backend hostname to the private endpoint so APIM can reach it without using a public route.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;AI Gateway connection&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&amp;lt;connection-name&amp;gt; on the project&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The project uses this connection to send model calls to APIM by using managed identity authentication and a defined list of available models.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Together, these components make the pattern practical for production use. Standard service names resolve privately, public network access is disabled on the backend Foundry account, model calls authenticate through APIM by using managed identity, and every model is reachable through a single observable gateway URL.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;What is &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;actually inside&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt; the &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;VNet&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt; today&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Today, the project VNet (&amp;lt;vnet-name&amp;gt;) contains four subnets, each with a specific role in the design:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;agent-subnet — delegated to Microsoft.App/environments for the Foundry agent runtime.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:0}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;pe-subnet — private endpoints for the project Foundry account, Storage, Cosmos DB, AI Search.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:0}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;apim-outbound — APIM SV2 outbound VNet integration, defaultOutboundAccess: false.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:0}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;backend-pe — the cross-region private endpoint to the backend Foundry account, defaultOutboundAccess: false.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:0}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The Foundry project, the APIM outbound integration, and the cross-region private endpoint to the backend account all sit inside the same VNet. The backend Foundry account itself remains a regional managed resource in the backend region, such as japaneast, but with public network access disabled, the cross-region private endpoint becomes the only route to it. The apim-outbound and backend-pe subnets also use defaultOutboundAccess: false, which means neither subnet has implicit public internet egress.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:120}"&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Hop&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Auth artefact&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Identity used&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Configured by&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Agent → Project&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;None — in-process&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;n/a&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry SDK&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Project → APIM&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;AAD bearer token&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Foundry Project managed identity&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;validate-azure-ad-token policy needs the project MI's app ID (projectMiClientId template param)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;APIM → Backend&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Fresh AAD bearer token&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;APIM managed identity&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;authentication-managed-identity policy + Cognitive Services User role (Azure RBAC) on the backend account for &lt;SPAN data-olk-copy-source="MessageBody"&gt;Azure AI Services acces&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Network path&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Private throughout&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;n/a&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Backend publicNetworkAccess: Disabled; resolution via privatelink DNS zones; data path via the cross-region PE on the Microsoft backbone&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:278}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;No API keys are stored anywhere. No APIM subscription key is required (subscriptionRequired: false on the inference API). Token rotation is handled automatically by both managed identities.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;For the full hop-by-hop trace and the source-of-truth auth-chain breakdown, see the extension README ("How the pieces are wired together" section).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Consuming the connection from an agent&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:90}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Once the connection exists, every Foundry surface that accepts a deployment name — Foundry agents, prompt agents, raw chat-completions and Responses calls from the SDK, the portal Playground — accepts the alias &amp;lt;connection-name&amp;gt;/&amp;lt;deployment-name&amp;gt; in the same slot:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;73c1f9d8-8b0b-56b7-b4e6-b3e744084a67|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777843,&amp;quot;Aptos&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman,Aptos&amp;quot;,268442635,&amp;quot;24&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,469775450,&amp;quot;msocodeblock&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;msocodeblock&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;from &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;azure.ai.projects&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; import AIProjectClient&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;from &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;azure.identity&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; import DefaultAzureCredential&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;project = &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;AIProjectClient&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;(&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; endpoint="https://&amp;lt;foundry-account&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;gt;.services.ai.azure.com&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;/&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;api&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;/projects/&amp;lt;project-name&amp;gt;",&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; credential=&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;DefaultAzureCredential&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;(),&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;agent = &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;project.agents.create_agent&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;(&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model="ai-gateway/gpt-5&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;",&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # &amp;lt;connection-name&amp;gt;/&amp;lt;deployment-name&amp;gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; name="cross-region-agent",&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; instructions="You are a helpful assistant.",&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt; &lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;H2&gt;What's Next&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;If you want to keep going, these are the next resources and actions that matter most.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/azure/foundry/agents/how-to/ai-gateway" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Bring your own model with the AI Gateway&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; — Canonical BYOM pattern for Foundry Agent Service.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/azure/foundry/configuration/enable-ai-api-management-gateway-portal" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Configure AI Gateway in your Foundry resources&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; — Enable Azure API Management as the gateway in front of Foundry from the portal.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;AI gateway capabilities in Azure API Management&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; — The APIM side of the story: policies, throttling, observability, and token limits for AI traffic.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/azure/api-management/azure-ai-foundry-api" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Import a Microsoft Foundry API &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;— Bring a Foundry endpoint into APIM as a managed API.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/azure/foundry/" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Microsoft Foundry documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; — Explore the full platform.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Deployment walkthrough&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The full implementation lives in the companion repo: &lt;A href="https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/16-private-network-standard-agent-apim-setup/extensions/byom-cross-region" target="_blank" rel="noopener"&gt;foundry-samples/infrastructure/infrastructure-setup-bicep/16-private-network-standard-agent-apim-setup/extensions/byom-cross-region at main · microsoft-foundry/foundry-samples&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;It contains the phased Bicep modules, a PowerShell orchestrator, a preflight checker, and a sample configuration. The condensed sequence below is enough to stand up the pattern and verify the gateway path works end to end.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-contrast="none"&gt;Prerequisites&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Azure CLI 2.67+ and PowerShell 7+&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;An Azure subscription with Contributor on the target resource group&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Az PowerShell module (Install-Module Az -Scope CurrentUser)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Owner or Contributor + User Access Administrator on the target resource group (the template creates role assignments).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="5" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Foundry Account Owner to create the Foundry account.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="6" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Quota for gpt-4o/gpt-5/gpt-5.1 in the backend region, and gpt-4o in the project region.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="7" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Quota for APIM StandardV2 in the project region.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="8" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;The Foundry project's managed-identity client ID (projectMiClientId). Look this up after creating the project; APIM needs it to validate inbound tokens.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;Deploy&lt;/SPAN&gt;&lt;/H3&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;git clone https://github.com/azure-ai-foundry/foundry-samples.git&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;cd foundry-samples/samples/microsoft/infrastructure-setup/16-private-network-standard-agent-apim-setup/extensions/byom-cross-region&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;az group create --name &amp;lt;rg&amp;gt; --location &amp;lt;project-region&amp;gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;az deployment group create \&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp; --resource-group &amp;lt;rg&amp;gt; \&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp; --template-file main.bicep \&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp; --parameters @samples/parameters-cross-region.json \&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp; --parameters projectMiClientId=&amp;lt;paste-client-id&amp;gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;After deployment, the gateway URL is in outputs.apimGatewayUrl. The connected models appear in the Foundry portal under Connected resources as &amp;lt;connection-name&amp;gt;/&amp;lt;deployment-name&amp;gt; (e.g. ai-gateway/gpt-5).&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Smoke test&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The /inference API has subscriptionRequired: false because authentication is managed-identity end to end. To verify the gateway is reachable from your own session:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;73c1f9d8-8b0b-56b7-b4e6-b3e744084a67|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777843,&amp;quot;Aptos&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman,Aptos&amp;quot;,268442635,&amp;quot;24&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,469775450,&amp;quot;msocodeblock&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;msocodeblock&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;$&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;apim&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = '&amp;lt;your-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;apim&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;-name&amp;gt;'&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;$deployment = 'gpt-5'&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;$token = &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;az&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; account get-access-token --resource 'https://cognitiveservices.azure.com' --query &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;accessToken&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; -o tsv&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;$body&amp;nbsp; = @{ messages = @(@{ role = 'user'; content = 'Say hi in five words.' }); &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;max_tokens&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; = 30 } | &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;ConvertTo&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;-Json -Compress&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;Invoke-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;RestMethod&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt; -Method POST `&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp; -Uri "https://$apim.azure-api.net/inference/deployments/$deployment/chat/&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;completions?api-version&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;=2024-10-21" `&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp; -Headers @{ Authorization = "Bearer $token"; 'Content-Type' = 'application/&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;json&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;' } `&lt;/SPAN&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="msocodeblock"&gt;&amp;nbsp; -Body $body&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;A 200 with x-aigw-backend and x-aigw-region response headers proves the policy chain (MI validation → backend rewrite → cross-region PE) is healthy. &lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-contrast="auto"&gt;For end-to-end validation via a Foundry hosted agent — agent creation, model selection, run loop — see the Use the connected model from a Foundry agent section of the extension README.&lt;/SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:120,&amp;quot;335559739&amp;quot;:180}"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Appendix — what the APIM trace proves&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;A live POST against /inference/deployments/gpt-4o/chat/completions confirms the policy chain: APIM acquires a managed-identity token for the Cognitive Services audience, deletes the caller api-key, rewrites the backend to the japaneast Foundry account, forwards the request privately, and returns self-describing gateway headers.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;59c78f43-c41b-59de-a7ae-e0181f6d360c|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469775450,&amp;quot;Code Block&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;CodeBlock&amp;quot;,335572020,&amp;quot;1&amp;quot;,469777841,&amp;quot;Consolas&amp;quot;,469777843,&amp;quot;Consolas&amp;quot;,469777844,&amp;quot;Consolas&amp;quot;,469769226,&amp;quot;Consolas&amp;quot;,268442635,&amp;quot;17&amp;quot;,335559685,&amp;quot;360&amp;quot;,335559739,&amp;quot;120&amp;quot;,335559738,&amp;quot;80&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777842,&amp;quot;Arial&amp;quot;,201341986,&amp;quot;1&amp;quot;]}"&gt;Inbound&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; authentication-managed-identity -&amp;gt; token &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;acquired&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt; for https://cognitiveservices.azure.com&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-header Authorization&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; Bearer token from APIM SMI, not the caller&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-header &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;api&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-key (&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;delete)&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; -&amp;gt; removed&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-backend-service&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; https://ctcfvce-jpe.openai.azure.com/openai&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-header x-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;aigw&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-route&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; customer-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;apim&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;fvce&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt; to &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;japaneast&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;Backend&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; request-forwarder&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; POST /&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;openai&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;/deployments/gpt-4o/chat/completions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; forward-request&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; 200 OK, x-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;ms&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-region: Japan East&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;Outbound&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-header x-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;aigw&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-backend&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;apim-fvce-aigw&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt; / &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;ctcfvce-jpe&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; set-header x-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;aigw&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;-region&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;canadaeast&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt; to &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;japaneast&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559685&amp;quot;:360,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Code Block"&gt;&amp;nbsp; transfer-response&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;gt; response sent to caller in full&lt;/SPAN&gt;&lt;/SPAN&gt; &lt;/PRE&gt;</description>
      <pubDate>Fri, 19 Jun 2026 15:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/cross-region-model-connectivity-options-in-microsoft-foundry/ba-p/4528620</guid>
      <dc:creator>rayankhoury</dc:creator>
      <dc:date>2026-06-19T15:00:00Z</dc:date>
    </item>
    <item>
      <title>Auto-Generated Rubric Evaluators: Building Context-Aware Evaluators for AI Agents</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/auto-generated-rubric-evaluators-building-context-aware/ba-p/4524095</link>
      <description>&lt;P&gt;Authors: Shuo Qiu, Sydney Lister, Ilya Matiach, Ali Mahmoudzadeh, Salma Elshafey, José Santos, Vivek Bhadauria, &lt;SPAN style="color: rgb(30, 30, 30);"&gt;Morteza Ziyadi, April Kwong&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2&gt;Why Your Agent Needs a Task-Specific Evaluator&lt;/H2&gt;
&lt;P&gt;Picture a customer-service agent for a telecom company. A customer messages in asking to switch plans and get a refund for last month's overcharge. The agent needs to verify the customer's identity and confirm the new plan before ending the conversation. Miss the verification step and you have a security incident. Those success criteria are specific to this one scenario.&lt;/P&gt;
&lt;P&gt;The &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/concepts/evaluation-evaluators/rubric-evaluators" target="_blank" rel="noopener"&gt;auto-generated rubric evaluator&lt;/A&gt; is designed to help address this: use the context you already have to generate a task-specific rubric evaluator that returns a weighted score with per-dimension explanations, then can be reused across iterations.&lt;/P&gt;
&lt;H2&gt;How We Validated Evaluator Quality&lt;/H2&gt;
&lt;P&gt;We validate auto-generated rubric evaluators across four aspects:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Verdict Validity&lt;/STRONG&gt; — whether judgments on real cases reflect what a competent reviewer would conclude.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Rubric Validity&lt;/STRONG&gt; — whether generated rubrics capture the task requirements and failure modes.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Manual Quality Inspection&lt;/STRONG&gt; — whether judgments on real cases look right to a human reviewer.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reliability and Separability&lt;/STRONG&gt; — whether judgments are stable across repeated runs and distinguish stronger from weaker candidate agents.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Validation Results&lt;/H2&gt;
&lt;H3&gt;1. Agreement with Trusted Reference Signals&lt;/H3&gt;
&lt;P&gt;We first validate the auto-generated rubric evaluator end-to-end: we use the rubric generator to produce the rubric's dimensions, then the rubric evaluator scores each case against them. We use GPT-5.4 for both rubric generator and rubric evaluator.&lt;/P&gt;
&lt;P&gt;The first question is whether those end-to-end scores move with signals teams already trust. For example, does the rubric evaluator give lower scores to failed cases, and higher scores to successful ones? We start by choosing benchmarks the community already uses as reference points:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; height: 385px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;Dataset&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;What It Tests&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 67px;"&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;JSON Editing&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;Deterministic structured-editing tasks where outputs can be checked exactly.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 67px;"&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;&lt;A href="https://taubench.com/" target="_blank" rel="noopener"&gt;TauBench Telecom&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;Customer-service agent tasks requiring policy following, tool use, and task completion.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 67px;"&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;&lt;A href="https://the-agent-company.com/" target="_blank" rel="noopener"&gt;The Agent Company&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;Long-horizon workplace-agent tasks with multi-step tool use. We InspectAI’s 10-case subset.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;&lt;A href="https://gorilla.cs.berkeley.edu/leaderboard.html" target="_blank" rel="noopener"&gt;BFCL Multi-Turn Tool Calling&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;Multi-turn function-calling behavior across realistic tool-use scenarios.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 67px;"&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;&lt;A href="https://github.com/Mosi-AI/LiveClawBench" target="_blank" rel="noopener"&gt;LiveClawBench&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 67px;"&gt;
&lt;P&gt;Open-ended web-agent tasks that require browsing, interaction, and judgment.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39px;"&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;Retail-Agent Customer Service&lt;/P&gt;
&lt;/td&gt;&lt;td style="height: 39px;"&gt;
&lt;P&gt;Real production-style retail support conversations.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;We then ask the generation pipeline to generate rubric evaluators for each scenario, and measure the correlation between the evaluator's scores and the trusted reference signals. For the three datasets with per-case reference signals, we can directly check whether the evaluator gives higher scores to successful cases than failed ones.&lt;/P&gt;
&lt;P&gt;We then create traces from different candidate agents. In these experiments, each candidate agent uses the same task setup and prompt but a different underlying model, which gives us a controlled range of stronger and weaker agent behaviors.&lt;/P&gt;
&lt;P&gt;Because the evaluator returns a continuous score, we use receiver operating characteristic area under the curve (ROC AUC) when the trusted case-level signal can be read as success versus failure. It measures how often, when comparing a successful case with a failed case, the evaluator assigns the successful case the higher score.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In these experiments, generated rubric evaluators align well with trusted signals at the case level, with ROC AUC of 0.794 on TauBench Telecom, 0.869 on The Agent Company, and 0.972 on JSON Editing.&lt;/P&gt;
&lt;P&gt;An important goal of evaluation is to score candidate agents that perform better on the reference signal also higher by the evaluator. This is more directly relevant when choosing among candidate agents, and it is a more forgiving test of alignment because aggregated scores are less sensitive to noise in individual judgments.&lt;/P&gt;
&lt;P&gt;We measure this with aggregate candidate-agent Spearman ρ, which checks whether the evaluator ranks candidate agents the same way as the oracle — a ρ of 1.0 means the evaluator's ranking is perfectly aligned with the oracle's, while 0 means no relationship. For BFCL and LiveClawBench, the oracle ranking comes from their official leaderboard scores.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At the aggregate candidate-agent level, Spearman ρ ranges from 0.69 on The Agent Company to 0.98 on JSON Editing across all five benchmarks. Aggregation reduces per-case noise, so the candidate-agent ranking is the more relevant view when the goal is agent selection.&lt;/P&gt;
&lt;H3&gt;2. Rubric Quality on GDPVal&lt;/H3&gt;
&lt;P&gt;&lt;A href="https://openai.com/index/gdpval/" target="_blank" rel="noopener"&gt;GDPVal&lt;/A&gt; is a benchmark that measures how well AI models perform real-world, economically valuable work in sectors such as government, manufacturing, and technical services. This benchmark includes a rubric for each task, authored by a domain expert, which is useful for rubric-validity measurement.&lt;/P&gt;
&lt;P&gt;We ask the rubric generator to produce a rubric for each test case, then use a separate matching judge to match the generated dimensions to the expert dimensions. This gives us two metrics for rubric quality:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Recall.&lt;/STRONG&gt; For each annotated dimension, did at least one generated dimension express a similar requirement?&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Precision.&lt;/STRONG&gt; For each generated dimension, did at least one annotated dimension express a similar requirement?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Under this setup, the generated rubric achieved 72.1% recall and 86.4% precision against the expert dimensions on GDPVal tasks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;3. Manual Quality on Retail-Agent Conversations&lt;/H3&gt;
&lt;P&gt;For a real-world retail-agent customer-service dataset, we generated a rubric with six dimensions, then graded 12 conversations over those dimensions, and manually inspected every case-by-dimension judgment.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this small sample (12 conversations), the reviewer disagreed with only one of the 72 case-by-dimension judgments. Most neutral cases involved applicability questions that the evaluator flagged inconsistently.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Reliability and Separability&lt;/H2&gt;
&lt;P&gt;Another key question is how reliable the evaluator's scores are. We look at two things: &lt;STRONG&gt;reliability&lt;/STRONG&gt; (does the same case get the same score next time?) and &lt;STRONG&gt;separability&lt;/STRONG&gt; (can the evaluator confidently rank two candidate agents against each other?).&lt;/P&gt;
&lt;H3&gt;Reliability&lt;/H3&gt;
&lt;P&gt;If you re-grade the same case tomorrow, do you get the same score? We measure this two ways: &lt;STRONG&gt;single-measure intraclass correlation, ICC(3,1)&lt;/STRONG&gt; measures how much of the score variance comes from real case differences rather than repeat noise, and &lt;STRONG&gt;Kendall's W&lt;/STRONG&gt; measures rank reliability across repeats — 1.0 means the evaluator ranks cases in the same order every time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On JSON Editing, single-measure intraclass correlation, ICC(3,1), is 0.852 and Kendall's W is 0.767, which means re-running the evaluator on the same case gives similar numbers under repeated runs in this experimental setup. TauBench Telecom shows similarly strong reliability, with ICC(3,1) of 0.85 and Kendall's W of 0.89 under the same recommended configuration.&lt;/P&gt;
&lt;H3&gt;Separability&lt;/H3&gt;
&lt;P&gt;Separability measures whether the score is &lt;EM&gt;decisive&lt;/EM&gt;: when you put two candidate agents side by side, can the evaluator confidently say which one is better?&lt;/P&gt;
&lt;P&gt;We report &lt;STRONG&gt;mean pairwise bootstrap confidence&lt;/STRONG&gt;, which measures ranking stability. For each pair of candidate agents, we resample cases and recompute each agent's mean evaluator score. The pair confidence is the fraction of bootstrap samples supporting the more common ordering: a value near 0.5 means the ordering is unstable, while a value near 1.0 means the evaluator consistently separates that pair. We average this across all candidate-agent pairs.&lt;/P&gt;
&lt;P&gt;The candidate-agent intervals are tight on JSON Editing and TauBench Telecom. Mean pairwise bootstrap confidence is 0.96 on JSON Editing dataset and 0.95 on TauBench Telecom dataset.&lt;/P&gt;
&lt;H3&gt;&lt;BR /&gt;Get Started&lt;/H3&gt;
&lt;P&gt;The auto-generated rubric evaluator's results may vary depending on task design, input quality, and evaluation setup. Start with a clear, well-defined description for your evaluation in the prompt field, include as much high-quality context as possible, such as the agent definition and examples, and review the generated rubric carefully before using it. Run it against a small set of known-good and known-bad cases to understand how the score reflects different failure modes.&lt;/P&gt;
&lt;P&gt;Try the workflow in the &lt;A href="https://ai.azure.com" target="_blank" rel="noopener"&gt;Foundry portal&lt;/A&gt; and follow the &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/concepts/evaluation-evaluators/rubric-evaluators" target="_blank" rel="noopener"&gt;rubric evaluator tutorial&lt;/A&gt;. For a demo that covers Rubric in the broader observability workflow, watch the Build breakout session &lt;A href="https://build.microsoft.com/en-US/sessions/BRK252" target="_blank" rel="noopener"&gt;From observability to ROI for AI agents on any framework&lt;/A&gt;. For the full set of Build observability announcements, read &lt;A href="https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/" target="_blank" rel="noopener"&gt;Build 2026: From observability to ROI for AI agents on any framework&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2026 19:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/auto-generated-rubric-evaluators-building-context-aware/ba-p/4524095</guid>
      <dc:creator>Shuo</dc:creator>
      <dc:date>2026-06-18T19:00:00Z</dc:date>
    </item>
    <item>
      <title>How to Score a User Simulator: Introducing USR-8</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/how-to-score-a-user-simulator-introducing-usr-8/ba-p/4523642</link>
      <description>&lt;H5&gt;&lt;SPAN class="lia-text-color-19"&gt;&lt;STRONG&gt;Authors: José Santos, Shuo Qiu, Morteza Ziyadi&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/H5&gt;
&lt;P&gt;User simulators have become a standard part of the agent-building toolkit: cheaper than real-user pilots, faster than scripted dialogues, and the only way to push an agent through hundreds of plausible conversations between every code change. The hard question is the one that comes next: how do you know the simulator itself is any good?&lt;/P&gt;
&lt;P&gt;It's easy to underestimate how much can go wrong. A simulator that ends every turn with "Thanks so much for your help!" will quietly inflate your agent's scores. A simulator that coaches your agent ("could you check the fare rules first?") will hide failures behind helpful nudges. A simulator that scores well on a holistic "user-coherence" metric can still produce conversations that cluster around the same two openings and the same four invented names. None of these failure modes are visible from the agent's score sheet; all of them distort it. If the test harness isn't measured with the same rigor as the agent, you're back to vibes — just with bigger numbers attached.&lt;/P&gt;
&lt;P&gt;This post is a research study on how we developed and measured the &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/how-to/develop/cloud-evaluation?tabs=python#conversation-simulation" target="_blank" rel="noopener"&gt;Microsoft Foundry’s user simulator&lt;/A&gt; quality. The post offers two contributions: 1) an &lt;STRONG&gt;Eight-Metric User Simulation Rubric (USR-8)&lt;/STRONG&gt;, a minimal, orthogonal, and sufficient framework for scoring a user simulator that separates behavior from style and surfaces failure modes a single composite score erases, and 2) a&amp;nbsp;&lt;STRONG&gt;set of empirical findings&lt;/STRONG&gt;&amp;nbsp;from running that rubric across 1,200 conversations, three domains, and four simulator configurations.&lt;/P&gt;
&lt;P&gt;The short version: across&amp;nbsp;&lt;STRONG&gt;1,200 conversations&lt;/STRONG&gt;&amp;nbsp;spanning&amp;nbsp;&lt;STRONG&gt;three domains&lt;/STRONG&gt;&amp;nbsp;and&amp;nbsp;&lt;STRONG&gt;four simulator configurations&lt;/STRONG&gt;, the Foundry user simulator scored at or near ceiling on every per-conversation metric except realism, and a small prompt revision substantially improved realism in our evaluation. More interestingly, the same prompt loaded into a third-party harness produced very similar results on the metrics we measured, thus, in our evaluation we conclude that most of what makes a "good" simulator is driven by the prompt policy, not the orchestration code. That last finding likely generalizes past Foundry.&lt;/P&gt;
&lt;H2&gt;Two philosophies of user simulation&lt;/H2&gt;
&lt;P&gt;Before you can evaluate a user simulator, you have to decide what you want it to be. There are two coherent answers, and they pull in opposite directions.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Philosophy A — Realistic foil.&lt;/STRONG&gt;&amp;nbsp;The simulator stays in character and never coaches the agent. If the agent skips a step, the simulator notices the way a real customer would — by getting confused, getting frustrated, or moving on — not by saying "you forgot to check the fare rules." Conversation failures attributable to the agent stay visible and clean. This is what you want when you're measuring the agent's intrinsic quality.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Philosophy B — Helpful tester.&lt;/STRONG&gt;&amp;nbsp;The simulator is part of an end-to-end task-success benchmark. It may nudge the agent back on track when it goes off-script, but only as a real cooperative user would: by answering clarifying questions and supplying missing details, not by coaching the agent on how to do its job. That help can mask failures, so this setup measures whether the task gets completed by the agent and simulator together. Treat the result as an optimistic system-level success rate: it credits the simulator’s help and can overstate the agent’s standalone ability if that help slips into coaching.&lt;/P&gt;
&lt;P&gt;Neither philosophy is wrong. They serve different product use cases, and the choice has direct consequences for what your scores mean.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 96.7593%; height: 210px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 35px;"&gt;&lt;th style="height: 35px;"&gt;You're trying to…&lt;/th&gt;&lt;th style="height: 35px;"&gt;Use Philosophy A&lt;/th&gt;&lt;th style="height: 35px;"&gt;Use Philosophy B&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;Compare two agent prompts&lt;/td&gt;&lt;td style="height: 35px;"&gt;✅&lt;/td&gt;&lt;td style="height: 35px;"&gt;⚠️ (simulator help can flatten differences)&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;Catch regressions during development&lt;/td&gt;&lt;td style="height: 35px;"&gt;✅&lt;/td&gt;&lt;td style="height: 35px;"&gt;⚠️&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;Measure end-to-end task success&lt;/td&gt;&lt;td style="height: 35px;"&gt;⚠️ (no help may lead to lower pass rates)&lt;/td&gt;&lt;td style="height: 35px;"&gt;✅&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;Red-team for adversarial inputs&lt;/td&gt;&lt;td style="height: 35px;"&gt;needs persona seeds&lt;/td&gt;&lt;td style="height: 35px;"&gt;✅&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;Generate realistic-looking sample conversations&lt;/td&gt;&lt;td style="height: 35px;"&gt;✅&lt;/td&gt;&lt;td style="height: 35px;"&gt;❌&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Microsoft Foundry's simulator is built around Philosophy A: stay in character, don't coach. Several widely used third-party user-simulator frameworks default to Philosophy B — a fixed opening prompt and explicit instructions for the simulated user to correct the agent when it misses a step. We'll come back to that contrast when we look at the results.&lt;/P&gt;
&lt;P&gt;The first thing to do, before you score anything, is pick the philosophy that matches your use case. Otherwise, you'll end up measuring a simulator on a job it wasn't designed to do.&lt;/P&gt;
&lt;H2&gt;Eight metrics for scoring a user simulator&lt;/H2&gt;
&lt;P&gt;Once you've picked a philosophy, you need a rubric. Generic agent metrics — task adherence, intent resolution, tool-call accuracy — don't apply, because the simulator isn't trying to&amp;nbsp;&lt;EM&gt;complete&lt;/EM&gt;&amp;nbsp;the task, it's trying to&amp;nbsp;&lt;EM&gt;be a user&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;We built&amp;nbsp;&lt;STRONG&gt;USR-8&lt;/STRONG&gt;: eight LLM-judge metrics specifically for evaluating user-simulator output. We designed these metrics to be&amp;nbsp;&lt;STRONG&gt;minimal, orthogonal, and sufficient&lt;/STRONG&gt;&amp;nbsp;— each captures a distinct failure mode that cannot be reliably inferred from the others. Seven are per-conversation; one is cohort-level. All score on a 1–5 integer scale with a short rationale, and all were judged by a current-generation GPT model running at low reasoning effort.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Per-conversation metrics (each judge sees the full transcript and the original scenario):&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Clarity&lt;/STRONG&gt;&amp;nbsp;— Does the user state their request clearly enough that a competent agent could act on it without guessing? Penalizes ambiguous, fragmented, or under-specified asks.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Relevance&lt;/STRONG&gt;&amp;nbsp;— Does the user stay on the topic the scenario specifies? Penalizes drift, off-topic detours, and wholesale topic swaps (e.g., scenario asks for a California vendor agreement; user opens with an English NDA). Does not penalize the user when the&amp;nbsp;&lt;EM&gt;agent&lt;/EM&gt;&amp;nbsp;goes off-topic.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Steering&lt;/STRONG&gt;&amp;nbsp;— Does the user keep the conversation moving toward their goal&amp;nbsp;&lt;EM&gt;without coaching the agent&lt;/EM&gt;&amp;nbsp;— i.e., without telling the agent how to do its job? A high score requires productive steering that stops short of becoming the agent's manager. This is the Philosophy A guard rail; expect a Philosophy B simulator to score lower here by design.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Responsiveness&lt;/STRONG&gt;&amp;nbsp;— Does each user turn acknowledge and respond to what the agent just said, rather than ignoring it or repeating prior asks? Picks up on options the agent actually offered.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Consistency (coherence)&lt;/STRONG&gt;&amp;nbsp;— Does the user maintain a single coherent identity, set of facts, and conversational thread across the whole conversation? Penalizes self-contradiction — e.g., changing a booking code mid-conversation.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Realism&lt;/STRONG&gt;&amp;nbsp;— Does the user sound like a human writing in the moment, hesitations, false starts, contractions, emotional register that fits the scenario, plausible imperfection — rather than a polished ghostwriter playing a role? This metric scores&amp;nbsp;&lt;EM&gt;how human the prose reads&lt;/EM&gt;, not whether the behavior is correct.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Persona fidelity&lt;/STRONG&gt; — When the scenario specifies a persona ("frustrated returning customer", "junior on-call engineer", "in-house counsel"), does the user embody it faithfully? Defaults to 5 when the scenario specifies no particular traits.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Cohort metric (judged on an entire run of conversations):&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Diversity&lt;/STRONG&gt;&amp;nbsp;— Across, say, 100 conversations of the same simulator on the same scenario set, how varied are the conversations? Considers distinct names, distinct opening framings, distinct emotional registers, and unique scoring terms. 1 = visibly clustered (same names, same opening template); 5 = high variety across all axes.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A few notes on the design:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Separate behavior from style.&lt;/STRONG&gt;&amp;nbsp;Clarity, relevance, steering, responsiveness, consistency, and persona fidelity describe&amp;nbsp;&lt;EM&gt;what the simulator does&lt;/EM&gt;. Realism describes&amp;nbsp;&lt;EM&gt;how it sounds&lt;/EM&gt;. They tend to move independently and conflating them hides important signal.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Include a no-coaching guard.&lt;/STRONG&gt;&amp;nbsp;If you're under Philosophy A, you need at least one metric that explicitly penalizes the simulator for coaching the agent. Otherwise the judge's prior about "helpful" user turns will let coaching slip through.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Score the cohort, not just the conversation.&lt;/STRONG&gt;&amp;nbsp;Per-conversation metrics can be at ceiling while the cohort is visibly clustered (same names, same opening, same closer). Diversity catches that.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use scenarios, not scripts.&lt;/STRONG&gt;&amp;nbsp;A "scenario" in our setup is a short prose task description (typically 80–200 words) given to the simulator as "the thing the user is trying to accomplish today." The simulator invents concrete details — names, booking codes, dates, error counts — that the scenario doesn't specify. This intentionally probes the simulator's ability to stay in character while improvising plausibly.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The minimum subset, in our experience, is&amp;nbsp;&lt;STRONG&gt;relevance + steering + consistency + realism&lt;/STRONG&gt;: it covers on-topic behavior, no-coaching, within-conversation coherence, and prose quality. The other four sharpen the picture, especially when comparing two simulators that look close on the minimum set.&lt;/P&gt;
&lt;H2&gt;How we put it to the test&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Three agents, in three domains.&lt;/STRONG&gt;&amp;nbsp;We created three Microsoft Foundry agents, each in simulation mode (tool responses generated by a current-generation GPT model against a tool catalog rather than a live backend), chosen to span both consumer-vs-professional tone and short-turn-vs-long-turn conversational shape:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;airline-customer-service-sim — refunds, rebookings, fare-rule lookups, baggage allowance, seat changes, 12 tools.&lt;/LI&gt;
&lt;LI&gt;sre-incident-triage-sim&amp;nbsp;— on-call engineer that confirms alerts, looks up service ownership, opens incidents, paginates runbooks, coordinates rollback, 9 tools.&lt;/LI&gt;
&lt;LI&gt;legal-contract-review-sim&amp;nbsp;— vendor agreements, redline liability and indemnity clauses, deviations from a contract template, negotiation drafts, 8 tools.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Ten scenarios per domain, ten conversations per scenario.&lt;/STRONG&gt;&amp;nbsp;Per simulator configuration: 10 scenarios × 10 conversations × 3 domains = 300 conversations. With four simulator configurations, that's&amp;nbsp;&lt;STRONG&gt;1,200 conversations scored&lt;/STRONG&gt;. The 10 × 10 design gives 95% CI half-widths of 0.05–0.15 on per-conversation metrics — tight enough to detect effects above ~0.1.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Four simulator configurations.&lt;/STRONG&gt;&amp;nbsp;This is where the comparative claims come from:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry user simulator, baseline prompt -&lt;/STRONG&gt;&amp;nbsp;an earlier prompt, before the realism revision, included here as an ablation baseline.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry user simulator, production prompt -&lt;/STRONG&gt;&amp;nbsp;the current production prompt, whose realism-focused revision is the change we isolate below.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;A publicly available third-party user-simulator framework&lt;/STRONG&gt;&amp;nbsp;with its default prompt.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The same third-party framework with our production prompt &lt;/STRONG&gt;ported in through its prompt-customization mechanism.&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;This was the isolation experiment — same underlying model, same third-party harness, same scenarios; only the prompt swapped. If results changed dramatically, the prompt was the lever; if they held steady, the harness was.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Every conversation in all four configurations was scored by the eight judges. Total judgments: roughly 10,000.&lt;/P&gt;
&lt;H2&gt;What we found&lt;/H2&gt;
&lt;P&gt;Three findings are worth reporting in detail.&lt;/P&gt;
&lt;H3&gt;1. The Foundry simulator is at or near ceiling on every metric except realism&lt;/H3&gt;
&lt;P&gt;For the baseline prompt (before the realism revision), at n=100 conversations per domain:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Metric&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Airline&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;SRE&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Legal&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Clarity&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.89&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Relevance&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Steering&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.86&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Responsiveness&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Consistency&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.00&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Persona fidelity&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.48&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.89&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.86&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Realism&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;3.97&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;4.13&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;3.34&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Diversity, the eighth USR-8 metric, is cohort-level rather than per-domain, so it does not appear in this per-conversation table; on our current scenario set it does not yet discriminate between simulators. Five of the seven per-conversation metrics are consistently at 5.00 in our evaluation. Persona fidelity is strong without being at ceiling.&amp;nbsp;&lt;STRONG&gt;Realism is the lowest-scoring metric in our evaluation.&lt;/STRONG&gt; The judge's notes pinpoint what's missing in plain language: clean polished phrasing, no hesitation, no false starts, no contractions, fictional names that read like marketing copy ("Priya Singh", "Olivia Hartwell"), and a polished closing thanking the agent the way a real customer rarely does.&lt;/P&gt;
&lt;P&gt;In other words, the simulator sounds less like a customer and more like a professional ghostwriter playing a customer.&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 28px;"&gt;2. A single prompt revision substantially improved realism&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;The realism revision (now the production default) focused entirely on register, hesitation, and dropping the closing-thanks tic, moved realism by +0.61 on airline, +0.62 on SRE, and&amp;nbsp;&lt;STRONG&gt;+1.06&lt;/STRONG&gt;&amp;nbsp;on legal, with no regression on any other metric. The deltas are 4–10× the 95% CI half-width on each domain, well outside noise.&lt;/P&gt;
&lt;P&gt;This says something specific about what "improving a user simulator" actually looks like in practice: at the production-prompt stage, the high-leverage edits are about&amp;nbsp;&lt;EM&gt;prose register&lt;/EM&gt;, not about behavior. The behavior-side metrics were already pinned at ceiling; the work that moved the needle was style work.&lt;/P&gt;
&lt;H3&gt;3. Most of the gap lived in the prompt, not the harness&lt;/H3&gt;
&lt;P&gt;A surprising finding came from the isolation experiment. The third-party simulator with its&amp;nbsp;&lt;STRONG&gt;default&lt;/STRONG&gt;&amp;nbsp;prompt scored realism in the range&amp;nbsp;&lt;STRONG&gt;1.5–2.0&lt;/STRONG&gt;&amp;nbsp;across the three domains — 2.5–3.2 points below the Foundry simulator. The judge's notes named two recurring failures: meta-reasoning leakage in the user turn ("Step 1: Analyze what the Agent just said. Step 2: …") and explicit coaching of the agent ("It looks like you skipped a step. Could you please check the fare rules first?"). Both behaviors are&amp;nbsp;&lt;EM&gt;required&lt;/EM&gt;&amp;nbsp;by the third-party default prompt and&amp;nbsp;&lt;EM&gt;forbidden&lt;/EM&gt; by Foundry's.&lt;/P&gt;
&lt;P&gt;When we ported the Foundry production prompt into the third-party framework's prompt-customization hook — same model, same harness, same scenarios, only the prompt swapped — realism on the third-party harness jumped to&amp;nbsp;&lt;STRONG&gt;4.4–4.6&lt;/STRONG&gt;, essentially matching the Foundry production reference (4.4–4.7). Across all per-conversation metrics, gap reduction was measured in the 96–99% range in our evaluation. Both failure modes we had flagged in the default run — the reasoning leakage and the coaching — were gone once the Foundry prompt was in place.&lt;/P&gt;
&lt;P&gt;This means &lt;STRONG&gt;simulator behavior is encoded primarily in the prompt policy, not in the orchestration code.&lt;/STRONG&gt; The two harnesses are similar enough that, given the same prompt, they produce very similar conversations on every measured per-conversation metric. Where they diverge is in the default policy each one ships with — and that policy is editable.&lt;/P&gt;
&lt;P&gt;The practical consequence, with the caveat that we compared only two harnesses: within that pair, the choice of framework mattered far less than the prompt running on top of it. We would not over-generalize from two systems, but the direction is clear — most of the behavior we measured lived in the prompt policy, and that is the asset worth investing in.&lt;/P&gt;
&lt;H3&gt;A note on diversity&lt;/H3&gt;
&lt;P&gt;Across all four configurations, cohort diversity scored a flat&amp;nbsp;&lt;STRONG&gt;2/5&lt;/STRONG&gt;. This is a feature of the scenario set, not the simulator: when the simulator is given the same 10 prose scenarios 10 times, it produces conversations that cluster on names, openings, and emotional registers. Discriminating between simulators on diversity will require either much richer scenarios or a persona-seed mechanism that supplies named persona archetypes, e.g. cooperative happy-path; frustrated; misinformed; adversarial; non-native speaker; low-effort. That's on our roadmap; it's not a near-term fix you can drop into an existing simulator.&lt;/P&gt;
&lt;H2&gt;Methodological recommendations&lt;/H2&gt;
&lt;P&gt;A checklist for any team doing this kind of evaluation — on Foundry's simulator or a different one — based on the gaps we ran into and the choices that paid off:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Pick the philosophy first.&lt;/STRONG&gt;&amp;nbsp;Decide whether you're measuring a realistic foil or a helpful tester, write that decision down, and design your metrics around it. Mixing them inside one rubric produces incoherent scores.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Separate behavior from style.&lt;/STRONG&gt;&amp;nbsp;Use at least one metric per category. Concretely: in our baseline-versus-production comparison, the entire improvement landed on realism while every behavior metric stayed flat — proof that a single composite score would have averaged away the only signal that moved.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Add an explicit no-coaching metric&lt;/STRONG&gt;&amp;nbsp;(under Philosophy A). LLM judges tend to reward helpfulness; without an explicit penalty, coaching slips through and gets scored as conscientious behavior.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Run at n ≥ 100 per condition.&lt;/STRONG&gt;&amp;nbsp;Below that, CIs are too wide to call effects under ~0.5 reliably. We caught one apparent catastrophic failure at n=10 that turned out to be a sampling artifact at n=100.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Score the cohort, not just the conversation.&lt;/STRONG&gt;&amp;nbsp;A cohort-level diversity judge is what told us our diversity weakness was scenario-set-bound, not simulator-bound.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use scenarios, not scripts.&lt;/STRONG&gt;&amp;nbsp;Forcing the simulator to improvise specifics is what surfaces persona fidelity and realism. Scripted dialogues let a weak simulator pass by having nothing to invent.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Build a human-rated calibration set.&lt;/STRONG&gt; A few dozen conversations, hand-rated by at least two annotators, is a reasonable starting point for bounding judge bias via Spearman correlation, though the right size depends also on inter-human agreement and needs to be validated. We have not done human calibration for this study and this is an important methodological gap to close in future work.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Compare against at least one external baseline.&lt;/STRONG&gt;&amp;nbsp;A score on a custom rubric only means something relative to another simulator scored on the same rubric. "4.7 out of 5" reads as excellent in isolation, but if a baseline simulator also scores 4.7, you've learned nothing about your simulator — only that the rubric is generous. External baselines defend you against your own optimism.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;When you compare, isolate the variable.&lt;/STRONG&gt;&amp;nbsp;A simulator's output depends on multiple components — at minimum the prompt and the surrounding harness, often also the judge configuration and the model. A head-to-head between two simulators changes all of them at once, so the result is uninterpretable: you can't tell which component drove the gap. Run at least one swap experiment that holds everything constant except the variable you care about — port your prompt into the other harness, or theirs into yours, and re-score. Without a controlled swap, framework-vs-framework numbers are a confounded measurement, not a result.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;Where this kind of measurement has limits&lt;/H2&gt;
&lt;P&gt;User simulators are not the right instrument for every evaluation. Three honest caveats:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;A deterministic oracle beats a simulated conversation, when one exists.&lt;/STRONG&gt; If you can assert programmatically that a refund record was created, that check is tighter and cheaper than judging a whole conversation, so reserve user simulation for what no oracle can see: tone, persistence, and recovery from confusion.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;You cannot validate invented facts against a real system.&lt;/STRONG&gt; Because the user simulator will make up specifics such as booking codes, account numbers, and dates, nothing it asserts can be cross-checked against ground-truth records, so simulation cannot test correctness that depends on real identifiers. This is structural, not a defect of the simulator.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Adversarial behaviors are a different design.&lt;/STRONG&gt;&amp;nbsp;A simulator built to stay in character is, by construction, not built to actively try to break the agent. Red-teaming belongs in a separate evaluation track with its own metrics.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Within those limits, the methodology does what we need: it quantifies realism, coherence, on-topic behavior, and persona fidelity at a scale hand-rating cannot reach.&lt;/P&gt;
&lt;H2&gt;Final thoughts&lt;/H2&gt;
&lt;P&gt;Three takeaways are worth carrying out of this work, whether or not you ever look at the Foundry user simulator specifically.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Evaluation tools can be wrong too.&lt;/STRONG&gt; Evaluation is recursive: any tool you use to score your agent is itself a system that can be wrong. A simulator that sounds polished can flatter a mediocre agent; a simulator that coaches when it shouldn't can hide real regressions.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Behavior and style are separate axes.&lt;/STRONG&gt;&amp;nbsp;Style failures (the polished-ghostwriter problem) and behavior failures (the coaching problem) move independently and call for different fixes. A single composite score erases that signal.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The prompt did most of the work.&lt;/STRONG&gt; Swapping the prompt between two harnesses reduced 96–99% of the measured gap on every per-conversation metric in our evaluation. A fair comparison between simulator frameworks has to control for prompt — otherwise you're measuring prompts, not frameworks. We saw this across the two harnesses we compared; a harness with more simulation-side logic could move outcomes more, so treat it as a strong signal, not a universal law.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;User simulators are how teams ship conversational agents without waiting for real users. Measuring them with the same rigor as the agents they're meant to test is what keeps the end-to-end evaluation honest.&lt;/P&gt;
&lt;H2&gt;Get started&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Start building in Microsoft Foundry:&lt;/STRONG&gt; &lt;A href="https://ai.azure.com/" target="_blank" rel="noopener"&gt;ai.azure.com&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Get the BRK252 session:&lt;/STRONG&gt; &lt;A href="https://aka.ms/build26-BRK252" target="_blank" rel="noopener"&gt;aka.ms/build26-BRK252&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Build 2026 Observability team blog post:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/" target="_blank" rel="noopener"&gt;From observability to ROI for AI agents on any framework&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Read the docs: &lt;/STRONG&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/foundry/how-to/develop/cloud-evaluation?tabs=python#conversation-simulation" target="_blank" rel="noopener"&gt;Microsoft Foundry user simulation documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Join the community:&lt;/STRONG&gt; &lt;A href="https://aka.ms/ai/discord" target="_blank" rel="noopener"&gt;aka.ms/ai/discord&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 19 Jun 2026 13:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/how-to-score-a-user-simulator-introducing-usr-8/ba-p/4523642</guid>
      <dc:creator>jcasantos</dc:creator>
      <dc:date>2026-06-19T13:00:00Z</dc:date>
    </item>
    <item>
      <title>Build an Automated SLA Risk Agent with Routines in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/build-an-automated-sla-risk-agent-with-routines-in-microsoft/ba-p/4528103</link>
      <description>&lt;P&gt;&lt;STRONG&gt;What if your support team could start every morning with an AI-generated summary of tickets most likely to breach SLA? &lt;/STRONG&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;This tutorial shows you how to build exactly that:&lt;/SPAN&gt; a grounded agent in Microsoft Foundry that automatically analyzes your ticket data and surfaces risks. Using Routines in Microsoft Foundry, we run the agent on a recurring schedule so risks are identified early and addressed before they escalate.&lt;/P&gt;
&lt;P&gt;You'll learn how to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Connect Azure Blob Storage data to Azure AI Search for RAG&lt;/LI&gt;
&lt;LI&gt;Create a Foundry IQ knowledge base from your search index&lt;/LI&gt;
&lt;LI&gt;Build a grounded agent with SLA triage instructions&lt;/LI&gt;
&lt;LI&gt;Schedule daily automated runs using Routines&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Time to complete:&lt;/STRONG&gt;&amp;nbsp;~45 minutes |&amp;nbsp;&lt;STRONG&gt;Prerequisites:&lt;/STRONG&gt; Azure subscription, Microsoft Foundry access&lt;/P&gt;
&lt;H1&gt;Prepare the dataset in Blob Storage&lt;/H1&gt;
&lt;P&gt;Before the agent can summarize SLA risk, it needs a trusted place to retrieve ticket context from. In this example, we used a &lt;A class="lia-external-url" href="https://huggingface.co/datasets/ameau01/synthetic-it-support-tickets" target="_blank" rel="noopener"&gt;sample dataset from Hugging Face&lt;/A&gt;. A subset of this data is uploaded to a private Azure Blob Storage container and later indexed by Azure AI Search.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Instructions: &lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Open &lt;STRONG&gt;Azure Portal&lt;/STRONG&gt; and go to &lt;STRONG&gt;Storage Accounts.&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Create or open the storage account that you want to use.&lt;/LI&gt;
&lt;LI&gt;In the storage account left navigation, select &lt;STRONG&gt;Data storage → Containers&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Create a new container for your data. This is where the sample dataset will be stored. Open the newly created container.&lt;img /&gt;
&lt;BLOCKQUOTE&gt;&lt;STRONG&gt;&lt;EM&gt;Permission note:&lt;/EM&gt;&lt;/STRONG&gt;&lt;EM&gt; if the portal cannot list or upload blobs with the signed-in Entra account, go to the storage account &lt;STRONG&gt;Access Control (IAM)&lt;/STRONG&gt; and assign &lt;STRONG&gt;Storage Blob Data Contributor&lt;/STRONG&gt; to the signed-in user. The later Search indexing step also requires the Search service managed identity to have &lt;STRONG&gt;Storage Blob Data Reader&lt;/STRONG&gt; on the storage account.&lt;/EM&gt;&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI&gt;In the container toolbar, select&amp;nbsp;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Upload&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;In the &lt;STRONG style="color: rgb(30, 30, 30);"&gt;Upload blob&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt; pane, select &lt;/SPAN&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Browse for files&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Select the desired files.&amp;nbsp;Select &lt;STRONG&gt;Upload&lt;/STRONG&gt;.&lt;img /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H1&gt;Create the Azure AI Search vector index&lt;/H1&gt;
&lt;P&gt;Now that the sample data is available in Blob Storage, the next step is to make it searchable for grounding. Azure AI Search reads the content from the blob container, chunks and vectorizes the text, and creates the index that Foundry IQ will use later as the knowledge source for the agent.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Instructions: &lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to&lt;STRONG&gt; Azure AI Search&lt;/STRONG&gt;. Create an Azure AI Search service or select the one you want to use. Click import data.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;On &lt;STRONG&gt;Choose a data source&lt;/STRONG&gt;, select &lt;STRONG&gt;Azure Blob Storage&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;On &lt;STRONG&gt;What scenario are you targeting?&lt;/STRONG&gt;&amp;nbsp;select &lt;STRONG&gt;RAG&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Configure the &lt;STRONG&gt;blob source&lt;/STRONG&gt;. Fill in your subscription, storage account and blob container. Leave &lt;STRONG&gt;parsing mode&lt;/STRONG&gt; on &lt;STRONG&gt;default&lt;/STRONG&gt; and &lt;STRONG&gt;authentication using managed identity&lt;/STRONG&gt; on &lt;STRONG&gt;enabled&lt;/STRONG&gt;.&amp;nbsp;Select &lt;STRONG&gt;Next&lt;/STRONG&gt;.&lt;img /&gt;
&lt;BLOCKQUOTE&gt;&lt;STRONG&gt;&lt;EM&gt;Permission note:&lt;/EM&gt;&lt;/STRONG&gt;&lt;EM&gt; if the import wizard cannot read from the container, make sure the Search service managed identity has &lt;STRONG&gt;Storage Blob Data Reader&lt;/STRONG&gt; on the storage account. If needed, also assign &lt;STRONG&gt;Reader&lt;/STRONG&gt; so the service can resolve the storage resource during import.&lt;/EM&gt;&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI&gt;On &lt;STRONG&gt;Vectorize your text&lt;/STRONG&gt;, in the&amp;nbsp;&lt;STRONG&gt;Kind&lt;/STRONG&gt;&amp;nbsp;dropdown, select&amp;nbsp;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt; (this replaces the default Azure OpenAI option with Foundry's integrated embedding capabilities). Select the preferred Foundry project and embedding model. Put authentication type on System assigned identity.&amp;nbsp;&amp;nbsp;Check the acknowledgement box for additional costs, then select &lt;STRONG&gt;Next&lt;/STRONG&gt;.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;On&amp;nbsp;&lt;STRONG&gt;Vectorize your images&lt;/STRONG&gt;, leave image vectorization and image text extraction disabled. (The sample data is text-first, so text vectorization is enough for this scenario.). Click Next.&lt;/LI&gt;
&lt;LI&gt;On&amp;nbsp;&lt;STRONG&gt;Advanced settings&lt;/STRONG&gt;, Leave everything on default and select &lt;STRONG&gt;Next&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;On&amp;nbsp;&lt;STRONG&gt;Review and create&lt;/STRONG&gt;, leave default and select &lt;STRONG&gt;Create&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;When the import completes, Azure AI Search creates the index, indexer, and skillset.&lt;/P&gt;
&lt;H1&gt;Connect Azure AI Search to Foundry IQ and create the Foundry IQ knowledge base&lt;/H1&gt;
&lt;P&gt;Let's now connect the Azure AI search to Foundry IQ so we can do agentic RAG across data sources and&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to&amp;nbsp;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt;&amp;nbsp;at&amp;nbsp;&lt;A href="https://ai.azure.com" target="_blank" rel="noopener"&gt;https://ai.azure.com&lt;/A&gt;. And select the preferred project.&lt;/LI&gt;
&lt;LI&gt;In the new Microsoft Foundry experience, click&amp;nbsp;&lt;STRONG&gt;Build on the top right.&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;In the left navigation under Build, click&amp;nbsp;&lt;STRONG&gt;Knowledge&lt;/STRONG&gt;.&amp;nbsp;&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;In the&amp;nbsp;&lt;STRONG&gt;Foundry IQ resource&lt;/STRONG&gt; field, click the dropdown/open button.&lt;/LI&gt;
&lt;LI&gt;Select the AI search we created.&lt;/LI&gt;
&lt;LI&gt;Leave&amp;nbsp;&lt;STRONG&gt;Auth Type&lt;/STRONG&gt; as API Key.&lt;/LI&gt;
&lt;LI&gt;Click&amp;nbsp;&lt;STRONG&gt;Connect&lt;/STRONG&gt;. Foundry IQ is now connected to the Azure AI Search resource and ready to create a knowledge base from one of its indexes.&lt;/LI&gt;
&lt;LI&gt;Click&amp;nbsp;&lt;STRONG&gt;Create a knowledge base&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;In&amp;nbsp;&lt;STRONG&gt;Choose a knowledge type&lt;/STRONG&gt;, select:&amp;nbsp;&lt;STRONG&gt;Azure AI Search Index.&lt;/STRONG&gt;&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Click&amp;nbsp;&lt;STRONG&gt;Connect&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;In the knowledge-base wizard, enter the knowledge source details. Open the &lt;STRONG&gt;Search index&lt;/STRONG&gt;&amp;nbsp;dropdown and select our created search index.&lt;/LI&gt;
&lt;LI&gt;Click &lt;STRONG&gt;Create&lt;/STRONG&gt;&amp;nbsp;/&amp;nbsp;&lt;STRONG&gt;Next&lt;/STRONG&gt;&amp;nbsp;to create the knowledge source.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;After the knowledge source was created, select the chat/completions model for the knowledge base. Open the&amp;nbsp;&lt;STRONG&gt;Model&lt;/STRONG&gt;&amp;nbsp;dropdown and select the desired model. Click&amp;nbsp;&lt;STRONG&gt;Save&lt;/STRONG&gt;&amp;nbsp;/&amp;nbsp;&lt;STRONG&gt;Save knowledge base&lt;/STRONG&gt;.&lt;img /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Foundry IQ now has a knowledge base that wraps the Azure AI Search index and can be connected to a Foundry agent.&lt;/P&gt;
&lt;H1&gt;Create the grounded SLA risk agent&lt;/H1&gt;
&lt;P&gt;Create a prompt agent named and instruct it to use the connected Foundry IQ knowledge base for SLA risk analysis.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;In the left navigation, click&amp;nbsp;&lt;STRONG&gt;Agents&lt;/STRONG&gt;. On the&amp;nbsp;&lt;STRONG&gt;Agents&lt;/STRONG&gt;&amp;nbsp;page, click&amp;nbsp;&lt;STRONG&gt;New agent&lt;/STRONG&gt; and then Build an agent.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;In the agent name field, enter a name for your agent.&lt;/LI&gt;
&lt;LI&gt;In the agent setup, select a model/deployment.&lt;/LI&gt;
&lt;LI&gt;In the&amp;nbsp;&lt;STRONG&gt;Instructions&lt;/STRONG&gt; field, entered instructions for example:
&lt;P&gt;&lt;EM&gt;You are an SLA risk triage agent for IT support operations.&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;Use the connected Foundry IQ knowledge base / Azure AI Search knowledge source to answer all SLA ticket questions.&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;Do not answer SLA ticket questions from memory. Use the connected knowledge source whenever ticket details, breach risk, categories, queues, priorities, deadlines, or summaries are requested.&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;Identify the tickets at highest risk of SLA breach. Prioritize tickets with:- Critical or High priority- Open or In progress status- Low hours_until_breach values- Clear business impact in the ticket summary&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;For each high-risk ticket, include:- Ticket ID- Priority- Status- Category- Support queue- Hours until breach, if available- Why it is risky- Recommended next action.&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;If the answer is not available in the connected knowledge base, say: “I don’t know based on the available SLA ticket knowledge base.”&lt;BR /&gt;&lt;/EM&gt;&lt;EM&gt;Cite sources where available.&lt;/EM&gt;&lt;/P&gt;
&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Click into the &lt;STRONG&gt;Knowledge&lt;/STRONG&gt;&amp;nbsp;section of the agent. Click&amp;nbsp;&lt;STRONG&gt;Add&lt;/STRONG&gt;. Choose the option to connect&amp;nbsp;&lt;STRONG&gt;Foundry IQ&lt;/STRONG&gt; knowledge. In the Foundry IQ connection dialog, select the knowledge base we created. Click &lt;STRONG&gt;Connect&lt;/STRONG&gt;.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Click &lt;STRONG&gt;Save&lt;/STRONG&gt;. The agent now exists in Foundry, has SLA triage instructions, and is grounded on the Foundry IQ knowledge base.&lt;/LI&gt;
&lt;LI&gt;Test the agent by asking: Test prompt: Using the connected SLA ticket knowledge base, summarize the tickets at highest risk of SLA breach and recommend next actions.&lt;img /&gt;&lt;img /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H1&gt;Create the daily Routine&lt;/H1&gt;
&lt;OL&gt;
&lt;LI&gt;In the left navigation, click&amp;nbsp;&lt;STRONG&gt;Agent &lt;/STRONG&gt;again and &lt;STRONG&gt;o&lt;/STRONG&gt;pen the agent we created.&lt;/LI&gt;
&lt;LI&gt;On the top right, click publish and then&lt;STRONG&gt;&amp;nbsp;create routine&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;In the routine setup form, entered a routine name and choose a suitable routine schedule and run time.&lt;/LI&gt;
&lt;LI&gt;In the routine prompt/instructions field, enter the user prompt you want the agent to be triggered with. E.g. &lt;EM&gt;Run the daily SLA risk summary from the connected SLA ticket knowledge base.&lt;/EM&gt; Click&amp;nbsp;&lt;STRONG&gt;Create &amp;amp; start&lt;/STRONG&gt;.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Wait for the routine creation to complete. Open the routine detail/run area.&lt;/LI&gt;
&lt;LI&gt;Click ‘test run’ to trigger a manual test run from the portal.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Confirm the routine dispatched successfully. Wait for the run history/status to update to ‘Completed’.&lt;img /&gt;&lt;/LI&gt;
&lt;LI&gt;Open the completed run entry / trace result by clicking on the response name.&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;We can observe that the output came from the grounded agent connected to the SLA ticket knowledge base.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;So the routine is active, and will run daily, scheduled at&amp;nbsp;&lt;STRONG&gt;9:00 AM&lt;/STRONG&gt;, and has been manually test-run once successfully.&lt;/P&gt;
&lt;H1&gt;Conclusion&lt;/H1&gt;
&lt;P data-olk-copy-source="MessageBody"&gt;You've built a production-ready SLA risk agent that runs automatically every morning, grounded on your actual ticket data. The same pattern works for any recurring analysis task e.g. inventory alerts, compliance checks, or customer health scores.&lt;/P&gt;
&lt;H2&gt;Next Steps&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Ready to go deeper?&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;📚&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/what-is-foundry-iq" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="3" data-ogsc=""&gt;Foundry IQ documentation&lt;/A&gt;&amp;nbsp;– Learn about advanced knowledge base configurations&lt;/LI&gt;
&lt;LI&gt;⏰&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/routines" target="_blank" rel="noopener"&gt; Routines in Foundry Agent Service documentation&lt;/A&gt; – Learn details about Routines&lt;/LI&gt;
&lt;LI&gt;🔧&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/search/search-get-started-portal" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="4" data-ogsc=""&gt;Azure AI Search tutorials&lt;/A&gt; – Optimize your search index for better grounding&lt;/LI&gt;
&lt;LI&gt;💬&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-category" href="https://techcommunity.microsoft.com/category/azure-ai-foundry" data-auth="NotApplicable" data-linkindex="5" data-ogsc="" data-lia-auto-title="Microsoft Foundry community" data-lia-auto-title-active="0" target="_blank"&gt;Microsoft Foundry community&lt;/A&gt; – Share your implementation and get help&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Want to extend this solution?&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Add Microsoft Teams notifications when high-risk tickets are detected&lt;/LI&gt;
&lt;LI&gt;Connect multiple data sources to your Foundry IQ knowledge base&lt;/LI&gt;
&lt;LI&gt;Build a dashboard to track SLA trends over time&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Have questions or built something cool with &lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry Routines&lt;/SPAN&gt;? Drop a comment below!&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2026 12:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/build-an-automated-sla-risk-agent-with-routines-in-microsoft/ba-p/4528103</guid>
      <dc:creator>LauraVerghote</dc:creator>
      <dc:date>2026-06-17T12:00:00Z</dc:date>
    </item>
    <item>
      <title>A Guided Tour of the New Microsoft Foundry Labs</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/a-guided-tour-of-the-new-microsoft-foundry-labs/ba-p/4527908</link>
      <description>&lt;P&gt;There's a moment every developer chases — the first time you take something genuinely at the edge of what's possible and watch it run on &lt;EM&gt;your&lt;/EM&gt; screen, against &lt;EM&gt;your&lt;/EM&gt; problem. &lt;A href="https://labs.ai.azure.com/" target="_blank"&gt;Microsoft Foundry Labs&lt;/A&gt; exists to give you that moment on repeat.&lt;/P&gt;
&lt;P&gt;This is the place where Microsoft's most ambitious AI research stops being a headline and becomes something you can click, run, fork, and ship. Frontier models for computer use, biomolecular structure prediction, real-time 3D generation, multilingual speech — they're not locked behind a paper or a waitlist. They're live, they're interactive, and a lot of them are one button away from your next project. Let's walk through what's waiting for you.&lt;/P&gt;
&lt;H2&gt;The mission: closing the gap between research and reality&lt;/H2&gt;
&lt;P&gt;Foundry Labs starts from a conviction stated plainly on its &lt;A href="https://labs.ai.azure.com/about" target="_blank"&gt;About page&lt;/A&gt;: AI advances fast, and access to it should too. Historically, the distance between a breakthrough research paper and real-world impact has been far too wide — months or years of reimplementation between the idea and anyone actually using it.&lt;/P&gt;
&lt;P&gt;Labs is built to close that gap. It takes experiments straight out of Microsoft's labs and puts them directly in the hands of builders and researchers, framed around three moves: &lt;STRONG&gt;discover&lt;/STRONG&gt; what's at the frontier, &lt;STRONG&gt;experiment&lt;/STRONG&gt; with it hands-on, and &lt;STRONG&gt;connect&lt;/STRONG&gt; with the people pushing it forward.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;And the work isn't scattered randomly — it concentrates in six domains where AI is having outsized real-world impact:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Pick the domain that maps to your work, and Labs becomes a map of the frontier in &lt;EM&gt;your&lt;/EM&gt; field.&lt;/P&gt;
&lt;H2&gt;Featured and fresh: the homepage&lt;/H2&gt;
&lt;P&gt;The homepage opens with &lt;STRONG&gt;"Breakthrough AI you can try today"&lt;/STRONG&gt; — a rotating showcase of the experiments worth knowing about right now, each with a &lt;EM&gt;Try it now&lt;/EM&gt; button straight into Microsoft Foundry and a &lt;EM&gt;Learn more&lt;/EM&gt; link for the deep dive. A quick look at the marquee:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;MAI-Image-2.5&lt;/STRONG&gt; — Microsoft AI's image-generation model with image-to-image editing and "control with preservation," a recent No. 2 debut for editing capabilities among image-model families on Arena.ai.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;MAI-Thinking-1&lt;/STRONG&gt; — Microsoft AI's first large language model, tuned for strong reasoning and math at a fraction of frontier-scale cost.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;MagenticLite&lt;/STRONG&gt; — an open-source agentic app for small models and the successor to Magentic-UI, pairing the MagenticBrain orchestrator with Fara 1.5 to run autonomously and transparently on your own hardware.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Fara 1.5&lt;/STRONG&gt; — Microsoft's agentic small language models for computer use, perceiving the screen from screenshots and predicting clicks directly.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;EO/OS Object Detection&lt;/STRONG&gt; — a first-party Earth Observation model that detects and localizes objects in satellite and aerial imagery at petabyte scale, anchoring the new GeoAI category alongside the Planetary Computer team.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Scroll on and the &lt;STRONG&gt;"Just added to Labs"&lt;/STRONG&gt; strip keeps a finger on the pulse, surfacing the newest arrivals — recently VibeVoice ASR, MAI-Image-2-Efficient, Magentic Marketplace, and BugPilot. It's the fastest way to see what shipped this week without going hunting.&lt;/P&gt;
&lt;H2&gt;Going deeper: the Innovations catalog&lt;/H2&gt;
&lt;P&gt;The homepage is the highlight reel; the &lt;A href="https://labs.ai.azure.com/innovations" target="_blank"&gt;Innovations&lt;/A&gt; tab is the whole library — &lt;STRONG&gt;50+ experiments&lt;/STRONG&gt; and growing, built from frontier research and available to you today. The images across the experience are generated using Microsoft’s in-house MAI image models.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;This is where the experience really opens up, because you're not stuck scrolling. You can &lt;STRONG&gt;filter and sort&lt;/STRONG&gt; the entire catalog to cut straight to what matters to you — narrow by the six domains, by what's newest, or by the kind of artifact you're after. Hunting for a vision model in Creative &amp;amp; Generative Media? A production-ready framework in Code &amp;amp; Software Engineering? A few clicks and the list is exactly your shortlist. Each result opens onto its own page, where the experiment goes from a name to something you can understand and try.&lt;/P&gt;
&lt;H2&gt;A closer look: TRELLIS&lt;/H2&gt;
&lt;P&gt;To see what one of those pages actually delivers, open &lt;A class="lia-external-url" href="https://labs.ai.azure.com/innovations/trellis/" target="_blank"&gt;Trellis &lt;/A&gt;— and it's a great one to start with, because it doesn't just describe the model, it lets you&amp;nbsp;&lt;EM&gt;use it on the spot.&lt;/EM&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;An interactive playground, right on the page.&lt;/STRONG&gt; TRELLIS turns a single image (or a text prompt) into a fully textured 3D asset, and the page gives you a live playground to do exactly that. Drop in an image, tune the parameters — latent and sparse-structure CFG scale, sampling steps, seed — hit &lt;STRONG&gt;Generate 3D Model&lt;/STRONG&gt;, and preview the result. When you like it, &lt;STRONG&gt;export a GLB&lt;/STRONG&gt; and pull it straight into your pipeline. No setup, no notebook, no GPU of your own.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The substance behind the demo.&lt;/STRONG&gt; Below the playground, the page lays out what makes TRELLIS notable. It's built on a novel &lt;STRONG&gt;Structured LATent (SLat)&lt;/STRONG&gt; representation with rectified-flow transformers scaled to 2 billion parameters, trained on 500,000 diverse 3D objects. Its trick is decoupling the latent from the decoder, so a single model can output &lt;STRONG&gt;meshes, radiance fields, and 3D Gaussians&lt;/STRONG&gt; — pick the format your pipeline needs without retraining. It can go from one image to a textured mesh in under 10 seconds on a single A100, supports local editing that changes specific 3D regions while preserving the rest, and was adopted by NVIDIA AI Blueprints in September 2025. (The usage counter on the page — north of 2.4 million users — tells you it's resonating.)&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Everything you need to go further.&lt;/STRONG&gt; Clear tags label it (Creative &amp;amp; Generative Media · Model · Vision · Experimental), the technology stack is spelled out (PyTorch, diffusion models, NeRF, 3D Gaussians, CUDA), and a "Ready to Explore?" section links you to the &lt;A href="https://github.com/microsoft/TRELLIS" target="_blank"&gt;GitHub repo&lt;/A&gt;, the &lt;A href="https://arxiv.org/abs/2412.01506" target="_blank"&gt;research paper&lt;/A&gt;, and the project blog. That's the full arc on one page: try it, understand it, then take the code.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;Once you've experimented in the playground and you're ready to go from prototype to production, head to &lt;A href="https://ai.azure.com/home" target="_blank"&gt;Microsoft Foundry&lt;/A&gt; to deploy the model — and turn what you just tried into part of your own application.&lt;/DIV&gt;
&lt;P&gt;Every experiment page follows this same shape — so once you've explored TRELLIS, you know how to read any of them.&lt;/P&gt;
&lt;H2&gt;Proof it works: Customer Stories&lt;/H2&gt;
&lt;P&gt;Trying a model is exciting; betting a product on it takes evidence. That's what the &lt;A href="https://labs.ai.azure.com/stories" target="_blank"&gt;Stories&lt;/A&gt; page delivers — accounts from teams who took Labs experiments into production and came back with hard numbers.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Space Intelligence — A UK-based organization mapping the world’s forests. By combining Microsoft Foundry with the Microsoft Planetary Computer, they increased data production 100× and reduced forest mapping timelines from six months to six weeks (a 75% reduction), covering 3 billion hectares across 50+ countries in a single year.&lt;/LI&gt;
&lt;LI&gt;Sight Machine — Achieved a 10% lift in manufacturing productivity using Foundry-powered capabilities.&lt;/LI&gt;
&lt;LI&gt;Commerzbank — Scaled to 30,000 customer conversations per month on Foundry Agent Service.&lt;/LI&gt;
&lt;LI&gt;MediaTek — Delivered 50% faster on-device AI performance using Phi models.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you're building a case for adoption, these are concrete signals of impact. And if you've shipped something with Labs, you can submit your own story to be featured.&lt;/P&gt;
&lt;H2&gt;Better together: Community&lt;/H2&gt;
&lt;P&gt;Great tools build communities, and the &lt;A href="https://labs.ai.azure.com/community" target="_blank"&gt;Community&lt;/A&gt; page is where 25,000+ developers and researchers gather under a simple banner: &lt;EM&gt;build together, think further.&lt;/EM&gt; It brings three things into one place:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;An events calendar&lt;/STRONG&gt; — where the Foundry team and community will be, from Microsoft Build and Ignite to the KubeCon circuit, GitHub Universe, and NeurIPS, each with a registration link so you can plan to meet up in person.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The latest from the labs&lt;/STRONG&gt; — a feed of blogs and publications straight from the teams behind the experiments, so you follow the research as it unfolds.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The conversation&lt;/STRONG&gt; — direct lines into Discord, Reddit, the Microsoft Research Blog, and Tech Community, where you can ask questions, swap ideas, and watch your feedback help steer what comes next.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Foundry Labs is the shortest path from Microsoft's most ambitious research to something running in front of you. The frontier is open, the experiments are live, and the community is already building. &lt;A href="https://labs.ai.azure.com/" target="_blank"&gt;Go explore it&lt;/A&gt; — and tell us what you create.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2026 23:41:16 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/a-guided-tour-of-the-new-microsoft-foundry-labs/ba-p/4527908</guid>
      <dc:creator>vaidyas</dc:creator>
      <dc:date>2026-06-15T23:41:16Z</dc:date>
    </item>
    <item>
      <title>Intelligent sampling in Microsoft Foundry: the science behind selecting better production traces</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/intelligent-sampling-in-microsoft-foundry-the-science-behind/ba-p/4523722</link>
      <description>&lt;H5&gt;&lt;STRONG&gt;Authors: &lt;SPAN data-teams="true"&gt;Ilya Matiach, Morteza Ziyadi, José Santos, Ali Mahmoudzadeh, Shuo Qiu, Salma Elshafey, April Kwong, Vivek Bhadauria&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H2&gt;TL;DR&lt;/H2&gt;
&lt;P&gt;Microsoft Foundry's &lt;STRONG&gt;intelligent sampling&lt;/STRONG&gt; feature (used when creating an evaluation or fine-tuning dataset from production agent traces) uses a MinHash farthest-first diversity sampler. On WildChat (the primary validation dataset, sampling 100 items from a 5,000-trace pool), diversity sampling produces &lt;STRONG&gt;+29.1% higher lexical diversity&lt;/STRONG&gt; and &lt;STRONG&gt;+44.8% larger vocabularies&lt;/STRONG&gt; than a uniform-random baseline; across five additional datasets (Dolly, No Robots, OASST2, ShareGPT-GPT4, UltraChat), vocabulary gains range from +5.7% to +86.3%. An LLM judge prefers diversity-sampled data &lt;STRONG&gt;78% of the time for evaluation and 71% for training&lt;/STRONG&gt; (268 paired judgments). By design, the technique prioritizes coverage of agent behavior over mirroring production frequencies — which is exactly what most evaluation and fine-tuning workflows benefit from. This post shares the science behind the approach, how we validated it, and where it shines.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.41/18/06_workflow_hero_v3.png" alt="06_workflow_hero.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 1. The intelligent-sampling flow at a glance: a pool of agent traces, the MinHash farthest-first selection step, and the resulting curated subset that maximizes coverage of agent behavior.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What we mean by “better.”&lt;/STRONG&gt; Throughout this post, we evaluate intelligent sampling against three operational definitions of “better”:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Lexical variety&lt;/STRONG&gt; — unigram diversity (unique tokens / total tokens) and vocabulary size of the selected subset; capturing whether the subset covers more of the input space.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;LLM-judge preference&lt;/STRONG&gt; — pairwise GPT-4.1 judgments comparing the diversity-sampled subset to a uniform-random subset, under two framings (“which is the better evaluation dataset?” and “which is the better training dataset?”).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Human-rated quality&lt;/STRONG&gt; — for three datasets with genuine human quality annotations (HelpSteer2, OASST2, OpenAI’s Summarize from Feedback), the mean human-annotator quality score of the items each method selects; confirming diversity sampling doesn’t systematically pick worse items.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Why trace selection matters&lt;/H2&gt;
&lt;P&gt;The simplest conceptual approach to picking traces is uniform random selection. It's statistically unbiased and preserves the true input distribution of your production traffic — a clear baseline that's the right choice when mirroring production frequencies matters. But uniform sampling has a well-known weakness: when your real traffic is dominated by a small number of common patterns, a uniform sample is dominated by those same patterns. Rare prompts, unusual tool-call sequences, and edge cases are systematically under-represented — exactly the cases evaluation is supposed to stress-test, and exactly the cases fine-tuning needs to learn from.&lt;/P&gt;
&lt;P&gt;Diversity sampling explicitly addresses this gap. The goal is to select a subset that covers as much of the input space as possible — including the less-frequent regions a uniform sample would systematically miss. By design, it prioritizes coverage over mirroring production frequencies, which makes it particularly well-suited for evaluation and fine-tuning workflows where breadth of behavior matters.&lt;/P&gt;
&lt;H2&gt;The technique: MinHash + farthest-first traversal&lt;/H2&gt;
&lt;P&gt;Intelligent sampling runs server-side in Foundry with no LLM calls and no external embedding-model dependencies — selection is pure hashing-based, so it adds zero per-token cost and completes in seconds-to-minutes. It combines two classic, well-understood components:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;MinHash signatures.&lt;/STRONG&gt; Each trace's user text is tokenized into shingles and hashed with 128 permutations. The resulting fixed-size signature lets us estimate the Jaccard similarity between any two traces in constant time, without storing or comparing the original text. This is the same trick that powers near-duplicate detection in search engines.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Farthest-first traversal.&lt;/STRONG&gt; Starting from a seed trace, repeatedly select the trace whose minimum similarity to any already-selected trace is smallest — in other words, the trace that is most different from everything chosen so far. This greedy algorithm is a standard approximation for the maximum-diversity subset selection problem.&lt;/P&gt;
&lt;P&gt;Diversity sampling — the MinHash farthest-first algorithm — is the core selection mechanism used when you create a dataset from traces. It runs after a small set of supporting steps: exact deduplication to remove redundant traces, hard filters to drop malformed or trivial ones, and aggregation across an agent's runs. Our validation work focuses on whether the diversity-sampling step itself drives the quality gains we measure.&lt;/P&gt;
&lt;P&gt;Selection cost is independent of trace content (no LLM or embedding-model calls), scales linearly in pool size, and is dominated by hashing — negligible compared with running the evaluation or fine-tuning job that consumes the result.&lt;/P&gt;
&lt;H2&gt;How we validated the method&lt;/H2&gt;
&lt;P&gt;A diversity sampler can be evaluated along several axes. We split our work into four complementary studies, designed to assess the technique's value from independent angles.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Intrinsic diversity metrics.&lt;/STRONG&gt; Unigram diversity (unique tokens divided by total tokens, a length-normalized measure of lexical variety) and vocabulary size (count of unique tokens across the selected subset). Both are aggregated across 5 random seeds and compared via paired t-tests.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;LLM-as-judge preference.&lt;/STRONG&gt; A blinded GPT-4.1 judge scored pairwise comparisons (diversity-sampled vs. uniform-random subsets, each containing 10 trace examples) across 268 judgments spanning three dataset sizes (1k / 5k / 10k) and five seeds. The judge is asked which subset would produce a better evaluation dataset, and separately which would produce a better training dataset, with reasoning. A second judge (GPT-5.2) was run on 50 shared comparisons to test directional robustness across judge models.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Downstream supervised fine-tuning.&lt;/STRONG&gt; We fine-tuned gpt-4.1 using the standard OpenAI fine-tuning API (supervised fine-tuning on &amp;lt;prompt, response&amp;gt; pairs) on two 80-example WildChat subsets — one diversity-sampled, one randomly selected — held out 20 additional examples for validation, and ran 3 epochs each. We measured training convergence (train loss, validation loss, token accuracy) and held-out pairwise generation quality on 48 unseen prompts judged by a blind GPT-4.1 evaluator.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Golden-dataset quality.&lt;/STRONG&gt; We re-ran the sampler against three datasets with genuine human quality annotations — HelpSteer2 (Scale AI professionals), OASST2 (13.5K crowd-sourced volunteers), and OpenAI's Summarize from Feedback (crowd workers) — to ask whether selecting for diversity systematically picks lower-quality items according to human raters.&lt;/P&gt;
&lt;P&gt;All experiments use five deterministic seeds for the random baseline, paired t-tests for statistical significance, and the same sampler configuration (num_perm=128, alpha=0.35, target=100). The primary dataset is &lt;A href="https://huggingface.co/datasets/allenai/WildChat-1M" target="_blank" rel="noopener"&gt;WildChat&lt;/A&gt; — 1k, 5k, and 10k subsets of real user / chatbot conversations — supplemented by five cross-dataset studies on &lt;A href="https://huggingface.co/datasets/databricks/databricks-dolly-15k" target="_blank" rel="noopener"&gt;Dolly&lt;/A&gt;, &lt;A href="https://huggingface.co/datasets/HuggingFaceH4/no_robots" target="_blank" rel="noopener"&gt;No Robots&lt;/A&gt;, &lt;A href="https://huggingface.co/datasets/OpenAssistant/oasst2" target="_blank" rel="noopener"&gt;OASST2&lt;/A&gt;, &lt;A href="https://huggingface.co/datasets/shibing624/sharegpt_gpt4" target="_blank" rel="noopener"&gt;ShareGPT-GPT4&lt;/A&gt;, and &lt;A href="https://huggingface.co/datasets/stingning/ultrachat" target="_blank" rel="noopener"&gt;UltraChat&lt;/A&gt;. Golden-dataset quality validation uses &lt;A href="https://huggingface.co/datasets/nvidia/HelpSteer2" target="_blank" rel="noopener"&gt;HelpSteer2&lt;/A&gt;, OASST2, and OpenAI's &lt;A href="https://github.com/openai/summarize-from-feedback" target="_blank" rel="noopener"&gt;Summarize from Feedback&lt;/A&gt;.&lt;/P&gt;
&lt;H2&gt;What the data says&lt;/H2&gt;
&lt;P&gt;Before the aggregate numbers, here is what diversity sampling actually does on real production traces. We embedded all 5,000 WildChat user prompts with a sentence-transformer model and projected them into 2D — every grey dot is one trace, and the colored markers are the 100 traces each method selected:&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.8/8/04a_semantic_scatter.png" alt="04a_semantic_scatter.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 2. Semantic view — UMAP of sentence-transformer embeddings. Each grey dot is one trace; colored markers are the 100 traces each method picked. Both methods look broadly spread in semantic space, yet they share only 1 of 100 selected traces — the methods reach genuinely different content, even where the spatial coverage looks similar.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Second, the MinHash–Jaccard space the algorithm actually optimizes on. Instead of a 2D projection (which loses the fine-scale distance information the algorithm cares about), we plot the distance distributions directly:&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.27/14/jaccard_dist_v2.png" alt="jaccard_dist_v2.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 3. Pairwise (left) and nearest-neighbour (right) Jaccard-distance distributions between the 100 selected items, on Dolly and WildChat. Random sampling produces a long left tail in nearest-neighbour distance — some random picks have near-duplicate siblings in the selected set. Diversity sampling explicitly maximises every selection’s min-distance, lifting the entire distribution to the right. On Dolly the gap is dramatic (mean +0.08, with random’s left tail reaching down to 0.3). On WildChat the gap is smaller because the pool already has high baseline diversity — there are simply fewer near-duplicates to avoid.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The two methods select almost entirely different traces (99 unique each, 1 in common) — and the visual difference foreshadows what the aggregate metrics will show. The rest of this section quantifies the gap on four independent axes.&lt;/P&gt;
&lt;H3&gt;Diversity gains are large and consistent&lt;/H3&gt;
&lt;P&gt;On the primary WildChat dataset (5k pool, 100 selected, 5 seeds), diversity sampling produces measurably and significantly more diverse subsets than a uniform-random baseline:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Metric&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity sampling&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Random&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Δ%&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;p-value&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Unigram diversity&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.307&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.238&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;+29.1%&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.010&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Vocabulary size&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;7,019&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4,849&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;+44.8%&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.003&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Vocabulary gains generalize to five additional datasets (Dolly +5.7%, No Robots +44.5%, OASST2 +55.7%, ShareGPT-GPT4 +86.3%, UltraChat +49.3%), with the strongest effects on human-authored data.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.6/3/01_cross_dataset_vocab.png" alt="01_cross_dataset_vocab.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 4. Vocabulary increase relative to a uniform-random baseline, across six datasets. Vocabulary gains are positive on every dataset but vary widely with how lexically rich the underlying pool is — ShareGPT-GPT4 (long, varied conversational turns) has the most room for the sampler to spread out, while Dolly (short, formulaically-written instructions) leaves the sampler with the least lexical variety to surface.&lt;/EM&gt;&lt;/P&gt;
&lt;H3&gt;LLM judges prefer diversity-sampled data&lt;/H3&gt;
&lt;P&gt;We ran 268 pairwise judgments using GPT-4.1 across two framings (which sample produces a better evaluation dataset, and which produces a better training dataset). Aggregated win rates:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Framing&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity sampling wins&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Random wins&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Evaluation dataset&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;78.0% (209/268)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;22.0% (59/268)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Training dataset&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;71.3% (191/268)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;28.7% (77/268)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Diversity-sampled subsets win across all three dataset sizes (1k, 5k, 10k). A secondary GPT-5.2 judge run on 50 shared comparisons reproduced the same directional result, with 76% raw agreement on the evaluation framing and 68% on training — supporting the conclusion that the gap is not specific to a single judge model.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.6/4/03_judge_wins_by_scale.png" alt="03_judge_wins_by_scale.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 5. Per-pool-size pairwise win rates for the diversity-sampled subset, across both evaluation and training framings. The 50% dashed line is the coin-flip baseline; every bar is well above it.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Vocabulary gain and judge preference measure different things — lexical variety in the selection vs. a holistic quality signal. They mostly agree, but not always:&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.30/15/07_vocab_vs_judge_scatter.png" alt="Vocabulary gain vs LLM-judge preference, per dataset" width="900" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 6. Vocabulary gain vs. LLM-judge preference, one point per dataset. Four datasets (No&amp;nbsp;Robots, OASST2, WildChat, Dolly) sit in the win-win region — diversity sampling delivers measurable lexical gains and judges prefer the curated subset. ShareGPT-GPT4 produces the largest vocabulary lift but a more modest judge preference. UltraChat is the clear outlier: a +49% vocabulary gain that doesn’t translate to any judge preference — on this fully-synthetic dataset the lexical novelty is real but judges don’t see the resulting subset as meaningfully better. Cross-dataset judge percentages come from a small (~4 judgments each) experiment, so individual-dataset values are noisy; WildChat’s 78% is the high-confidence anchor from 268 judgments.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The UltraChat result is informative on its own: on fully-synthetic datasets — where both sides of the conversation are model-generated — diversity sampling still surfaces lexical variety, but judges don’t see the resulting subset as meaningfully better, likely because the underlying conversations are already homogeneously phrased and there is less long-tail variation left to surface.&lt;/P&gt;
&lt;H3&gt;Fine-tuning: faster convergence, similar final quality&lt;/H3&gt;
&lt;P&gt;We ran a supervised fine-tuning experiment on gpt-4.1 using the standard OpenAI fine-tuning API: two 80-example WildChat subsets (diversity-sampled vs. randomly selected), 20 held-out validation examples, 3 epochs each.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Metric&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity sampling&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Random&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Δ&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Final train loss&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.547&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.908&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;−40%&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Train token accuracy&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;85.3%&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;76.7%&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;+8.6pp&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Validation loss&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.869&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.873&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;comparable&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Holdout pairwise (N=48)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;37.5% wins&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;33.3% wins&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;not significant&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Training dynamics differ sharply: the diversity-sampled model converges by epoch 2 and reaches 40% lower training loss. On held-out generation quality, both models perform comparably (37.5% vs. 33.3% pairwise wins, not statistically significant at N=48) — confirming that the diversity advantage does not hurt downstream model quality.&lt;/P&gt;
&lt;H3&gt;Quality holds up across human-annotated datasets&lt;/H3&gt;
&lt;P&gt;A natural concern with diversity sampling is that it might over-select hard, weird, or low-quality items. We tested this on three datasets with genuine human quality annotations:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Dataset&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Annotators&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Quality dimension&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Random&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Δ%&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;HelpSteer2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Scale AI pros&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Helpfulness (0–4)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2.66&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2.94&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;−9.7%&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;OASST2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;13.5K volunteers&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Quality (0–1)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.72&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0.67&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;+8.1%&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Summarize FB&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Crowd workers&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Overall (1–7)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;5.16&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.77&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;+8.3%&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.33/16/02_golden_quality_v2.png" alt="02_golden_quality.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 7. Human quality scores for diversity-sampled vs. uniform-random subsets, on each dataset’s own annotation scale. Diversity sampling produces statistically-significant quality gains on OASST2 and Summarize from Feedback, and a statistically-significant quality drop on HelpSteer2 — see the qualitative examples in the next section for why.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;On two of three datasets, diversity sampling selects items rated as higher-quality by human annotators (OASST2 +8.1%, Summarize from Feedback +8.3%). On HelpSteer2 the trend reverses: diversity sampling picks more creative-writing, roleplay, and niche-topic prompts which human annotators systematically rate as harder, and therefore lower in helpfulness. Both methods select a similar number of score-zero items; the gap is in the mid-to-high range of the scale.&lt;/P&gt;
&lt;P&gt;To see where the HelpSteer2 quality gap actually lives, here is the full distribution of helpfulness scores for items selected by each method:&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3223.33/17/05_helpsteer_distribution_v2.png" alt="05_helpsteer_distribution.png" /&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;EM&gt;Figure 8. Distribution of HelpSteer2 helpfulness scores for the 500 items each method selected across 5 seeds. Diversity sampling slightly over-picks low- and mid-scoring items (scores 0–2) and ties random at score 3, while random over-picks the highest-scoring score-4 prompts (45% vs. 35%) — the common, well-trodden requests where models already produce reliable answers. The 9.7% mean-helpfulness gap is driven almost entirely by that single score-4 bin.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Note: OpenAI's Summarize from Feedback is a summarization task (Reddit / CNN articles), not general instruction-following. We include it for annotation methodology breadth.&lt;/EM&gt;&lt;/P&gt;
&lt;H3&gt;What the sampler actually picks&lt;/H3&gt;
&lt;P&gt;To make the diversity-vs-coverage trade-off concrete, here are real examples of items selected by each method on HelpSteer2 (seed=42). Of 100 items selected by each method, only 3 overlapped — the two methods choose almost entirely different subsets.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity-sampled examples&lt;/STRONG&gt; (selected by diversity sampling, skipped by random):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;User prompt (truncated)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Helpfulness&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Correctness&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Coherence&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;I have a vacation rental website and I am looking for alliterative and descriptive headlines that are 4–5 words in length. Examples: "Get Away to Galveston", "Sleep Soundly in Seattle". Each headline should have at least 50% alliteration…&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;You are a branding consultant with a creative mind. Give me 30 naming ideas for a baby’s website for parents in a table format.&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;my table has dates and USD values. give me historical exchange rates for each date for USD to Euro. also the converted USD to EU value: 01/02/2021&amp;nbsp; 84.62 / 01/03/2021&amp;nbsp; 79.48 / 01/04/2021&amp;nbsp; 79.69 / 01/05/2021&amp;nbsp; 38.06 / 01/06/2021&amp;nbsp; 58.46…&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 4.72572%" /&gt;&lt;col style="width: 46.6086%" /&gt;&lt;col style="width: 16.2157%" /&gt;&lt;col style="width: 17.0481%" /&gt;&lt;col style="width: 15.3833%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Randomly-selected examples&lt;/STRONG&gt; (selected by random, skipped by diversity sampling):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;User prompt (truncated)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Helpfulness&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Correctness&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Coherence&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;How to cook t-bone in the oven&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Create a 4-day dumbbell and EZ-bar workout program to build over 10lbs of muscle in 3 months&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;How did life originate?&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 4.72572%" /&gt;&lt;col style="width: 46.5159%" /&gt;&lt;col style="width: 16.3084%" /&gt;&lt;col style="width: 16.9555%" /&gt;&lt;col style="width: 15.4759%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;The diversity-sampled prompts are unusual, creative, and underspecified — writing tasks with unusual constraints, open-ended branding requests, and ill-formed data tasks the model can’t reliably answer — exactly the kinds of inputs an evaluation suite or fine-tuning corpus should stress-test. The randomly-selected prompts are common, well-trodden requests where models already produce reliable, high-scoring responses. The quality-score gap (3–10%) is a direct consequence of this content shift, not a sign that diversity sampling is picking lower-quality work.&lt;/P&gt;
&lt;H2&gt;Considerations&lt;/H2&gt;
&lt;P&gt;Diversity sampling is a powerful default for evaluation and fine-tuning workloads, but a few honest considerations are worth flagging for practitioners:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Coverage, not representativeness.&lt;/STRONG&gt; By design, diversity sampling emphasizes breadth of behavior over mirroring production frequencies. For use cases that need a faithful population estimate (latency, error rates), a uniform sample is the better fit.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Synthetic data sees smaller gains.&lt;/STRONG&gt; On fully-synthetic datasets (e.g. UltraChat), the underlying conversations are already pre-diversified by the generation process, so the technique has less signal to work with.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Fine-tuned model quality is comparable, not dramatically better.&lt;/STRONG&gt; Diversity sampling accelerates training convergence and produces richer datasets, but at the scales we tested, held-out generation quality is on par with a uniform-random baseline.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The diversity-vs-quality trade-off is dataset-specific.&lt;/STRONG&gt; Two of three human-annotated datasets show higher quality scores with diversity sampling; one (HelpSteer2) shows lower scores, driven by selection of harder, more creative prompts.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Where diversity sampling shines&lt;/H2&gt;
&lt;P&gt;Across our evaluation, the technique consistently helps for workloads where broad input coverage drives downstream quality, and adds less value for workloads that depend on faithful production frequencies:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Workload&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Diversity sampling fit&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Fine-tuning data selection&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Strong — faster convergence, broader coverage&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Rubric &amp;amp; evaluator generation&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Strong — surfaces clearer evaluation themes&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Long-tail evaluation suites&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Strong — maximizes input-space coverage&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Quality-critical training&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Strong — pair diversity coverage with manual review of selected items&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Production-distribution benchmarking&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Weaker — uniform sample mirrors true frequencies&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Pre-curated or synthetic data&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Weaker — data is already diverse&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;Closing thoughts&lt;/H2&gt;
&lt;P&gt;Intelligent sampling — MinHash diversity sampling — gives Foundry developers a fast, zero-extra-cost way to curate higher-coverage datasets from their production traces. In our evaluation, the technique consistently delivered measurable diversity gains, was strongly preferred by LLM judges for both evaluation and training framings, and accelerated fine-tuning convergence — while running in under a minute on typical trace pools.&lt;/P&gt;
&lt;P&gt;It’s worth keeping the scope clear: the technique is designed for breadth of behavior, not faithful production-frequency representation. For most evaluation and fine-tuning workloads, broader coverage is exactly what you want — and that’s where this feature shines.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Try it out.&lt;/STRONG&gt; If you’re building agents on Microsoft Foundry, enable Application Insights on your project, head to the Traces tab, and try creating your first dataset from traces. See the official documentation for step-by-step instructions: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/observability/how-to/traces-to-dataset?view=foundry" target="_blank" rel="noopener"&gt;Convert agent traces into evaluation datasets (preview)&lt;/A&gt;. Two Python samples — one for evaluation, one for fine-tuning — are available in the azure-ai-projects SDK: &lt;A href="https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/datasets" target="_blank" rel="noopener"&gt;azure-sdk-for-python / datasets samples&lt;/A&gt;. We’d love to hear how it works for your workloads.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Author contributions&lt;BR /&gt;&lt;/STRONG&gt;Shuo Qiu conceptualized and implemented the core method and developed the initial LLM judge-based validation method. Ilya Matiach&amp;nbsp;&lt;SPAN data-teams="true"&gt;designed and ran the validation experiments and authored the blog post&lt;/SPAN&gt;. April Kwong and Morteza Ziyadi reviewed the core methodology. José Santos, Salma Elshafey, Ali Mahmoudzadeh, and Vivek Bhadauria contributed to methodology and validation review.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Get Started&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Start building in Microsoft Foundry:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://ai.azure.com/" target="_blank" rel="noopener"&gt;ai.azure.com&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Check out Observability BRK252 session:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://aka.ms/build26-BRK252" target="_blank" rel="noopener"&gt;aka.ms/build26-BRK252&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Read the docs:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/concepts/observability" target="_blank" rel="noopener"&gt;Foundry observability documentation&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Join the community:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://aka.ms/ai/discord" target="_blank" rel="noopener"&gt;aka.ms/ai/discord&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Build blog:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/" target="_blank"&gt;From observability to ROI for AI agents on any framework&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2026 18:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/intelligent-sampling-in-microsoft-foundry-the-science-behind/ba-p/4523722</guid>
      <dc:creator>imatiach</dc:creator>
      <dc:date>2026-06-17T18:00:00Z</dc:date>
    </item>
    <item>
      <title>Benchmarks in Microsoft Foundry (preview): Standardized model and agent quality checks</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/benchmarks-in-microsoft-foundry-preview-standardized-model-and/ba-p/4523711</link>
      <description>&lt;H5&gt;&lt;STRONG&gt;Authors:&amp;nbsp;&lt;SPAN data-teams="true"&gt;Ilya Matiach, Morteza Ziyadi, Han Che, José Santos, Ali Mahmoudzadeh, Shuo Qiu, Salma Elshafey, April Kwong, Vivek Bhadauria&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H2&gt;Introduction&lt;/H2&gt;
&lt;P&gt;Benchmarks in Microsoft Foundry (preview) make that kind of measurement a first-class part of the development workflow. You can run well-known open-source benchmarks against any model deployment or agent in your project, compare runs side by side in the evaluation group view, and drive the whole flow from the portal or the REST API.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3222.4/5/01-evaluations-list-dark.png" alt="01-evaluations-list-dark.png" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 1. Benchmarks appear in the Microsoft Foundry Evaluations list alongside your evaluations.&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;How is this different from the model leaderboard?&lt;/H2&gt;
&lt;P&gt;Microsoft Foundry already includes a model leaderboard that surfaces precomputed benchmark scores across the model catalog. The leaderboard answers, “How do these models perform in general?”&lt;/P&gt;
&lt;P&gt;Benchmarks answer a different question: “How does my deployment or my agent, with my judge model and my configuration, perform right now?” That distinction matters when you’re comparing two deployments of the same model, validating a fine-tune, checking whether an agent regressed after a tool change, or re-running a benchmark after a model version upgrade.&lt;/P&gt;
&lt;H2&gt;What a benchmark contains&lt;/H2&gt;
&lt;P&gt;A benchmark is a predefined evaluation package. Instead of asking you to upload evaluation data and define scoring criteria, the benchmark provides them for you:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Benchmark dataset — a curated, fixed-size dataset of prompts and expected answers.&lt;/LI&gt;
&lt;LI&gt;Task category — the capability being measured (reasoning, math, science, truthfulness, …).&lt;/LI&gt;
&lt;LI&gt;Evaluation logic — a built-in evaluator such as builtin.regex_match, or a benchmark-specific scorer.&lt;/LI&gt;
&lt;LI&gt;Optional judge model — for benchmarks that use model-based judging (for example, FrontierScience), you also pick a deployment to act as the grader. Most regex- or scorer-based benchmarks (such as GPQA Diamond, BBH, MuSR, BBEH, and TruthfulQA) do not need a Judge model.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The result of each run is typically a score: a percentage plus a pass count, such as 82% with 645 / 790 examples passing the benchmark metric.&lt;/P&gt;
&lt;H3&gt;Target vs. judge model&lt;/H3&gt;
&lt;P&gt;The target is what you are evaluating — a model deployment or an agent. The judge model is a separate deployment that the evaluation service uses to score outputs when a benchmark requires model-based judging. The judge model’s score does not represent its own quality; it is part of the scoring pipeline. Keep the judge model consistent across runs you want to compare, the same way you would keep a measurement instrument fixed in a scientific experiment.&lt;/P&gt;
&lt;H2&gt;Benchmarks available in preview&lt;/H2&gt;
&lt;P&gt;The initial set of benchmarks covers reasoning, math, science, and truthfulness, with example counts ranging from small smoke-test benchmarks (AIME 2025, 30 examples) to larger suites (BBEH, 4,520 examples).&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Task&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Examples&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Evaluation logic&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;AIME 2025 Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality, math&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;30&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AIME 2025&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;BBEH Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4,520&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;builtin.bbeh&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;BIG-Bench Hard Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;934&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;builtin.regex_match&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;ChemBench Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality, sciences&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2,785&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;ChemBench&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;FrontierScience Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;160&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;FrontierScience&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;GPQA Diamond Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;198&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;builtin.regex_match&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;MuSR Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;reasoning, quality&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;756&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;builtin.regex_match&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;TruthfulQA Benchmark&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;truthfulness, quality, reasoning&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;790&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;TruthfulQA&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;Common benchmark scenarios&lt;/H2&gt;
&lt;H3&gt;Comparing model deployments&lt;/H3&gt;
&lt;P&gt;Suppose you’re deciding between two model deployments — maybe two versions of the same model family, or a frontier model versus a smaller, cheaper one. With benchmarks you create a single evaluation group (for example, GPQA Diamond), then add one run per target. The group view shows each run’s score and token usage side by side, so you can weigh quality against cost in the same place.&lt;/P&gt;
&lt;P&gt;This same pattern catches regressions after a model version upgrade or a fine-tune: re-run the established benchmark, look at the group view, and the delta tells you whether things moved in the right direction.&lt;/P&gt;
&lt;H3&gt;Benchmarking an agent&lt;/H3&gt;
&lt;P&gt;Agents introduce variability that pure model evaluations miss: prompts, tools, orchestration logic, and connected data all influence outputs. Pointing benchmarks at an agent gives you a model-agnostic, reproducible signal that complements the agent-specific evaluators you may already be using (intent resolution, task adherence, tool-call accuracy).&lt;/P&gt;
&lt;P&gt;A practical pattern: pick a reasoning-heavy benchmark like GPQA Diamond or MuSR, target your agent (and, if relevant, one or two of its versions), and run it whenever you change the underlying model, the system prompt, or the tool set. The benchmark score becomes a stable yardstick across changes that would otherwise be hard to compare. From an agent’s detail page in Microsoft Foundry, the Evaluation tab opens the same wizard scoped to that agent.&lt;/P&gt;
&lt;H2&gt;Running a benchmark in the portal&lt;/H2&gt;
&lt;P&gt;From your Foundry project’s Build experience, open Evaluations and choose Create. The wizard walks through three short steps.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Target — pick Model or Agent, then select one or more deployments or agent versions.&lt;/LI&gt;
&lt;LI&gt;Data — choose Benchmarks as the dataset source, pick the benchmarks you want, and select a Judge model if the benchmark requires one.&lt;/LI&gt;
&lt;LI&gt;Review — confirm targets, datasets, and evaluators, then submit.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Because the benchmark owns its dataset, prompts, and evaluator, the wizard skips the dataset upload, prompt setup, and most evaluator configuration that custom evaluations require.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3222.4/4/03-create-evaluation-model-selected-dark.png" alt="03-create-evaluation-model-selected-dark.png" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 2. Start by selecting one or more model deployments (or an agent) as the evaluation target.&lt;/EM&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3222.4/1/09-judge-model-frontierscience-dark.png" alt="09-judge-model-frontierscience-dark.png" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 3. In the Data step, choose Benchmarks and pick one or more from the catalog. When you select a benchmark that uses model-based judging (here, FrontierScience), a Judge model dropdown appears for picking the deployment that will grade outputs.&lt;/EM&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3222.4/2/07-review-step-dark.png" alt="07-review-step-dark.png" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 4. The Review step summarizes targets, benchmark datasets, and evaluators before submission.&lt;/EM&gt;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;&lt;IMG class="lia-image-align-center" src="https://techcommunity.microsoft.com/t5/s/gxcuf89792/attachments/gxcuf89792/azure-ai-foundry-blog/3222.4/3/08-evaluation-detail-dark.png" alt="08-evaluation-detail-dark.png" /&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 5. A completed evaluation group shows each run’s status, token usage, and benchmark score.&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;Reading benchmark results&lt;/H2&gt;
&lt;P&gt;Benchmark results show up at two levels. The evaluation group page lists every run in the group with its target, dataset, status, token usage, and benchmark score — convenient for side-by-side comparisons. Open an individual run for run status, Raw JSON, Download results, Download user logs, and the overall metric breakdown including token usage and the benchmark score.&lt;/P&gt;
&lt;P&gt;A score can be expressed as a percentage (for example, 82%) and as a pass/total count (for example, 645 / 790). When detailed per-example metrics aren’t available, the overall metric result is still surfaced, and Download results gives you the row-level output for deeper analysis.&lt;/P&gt;
&lt;P&gt;Token usage is reported as a first-class metric. Benchmarks can run hundreds or thousands of examples, and model-based judging multiplies the token cost, so it pays to keep an eye on this column when scaling up.&lt;/P&gt;
&lt;H2&gt;Driving benchmarks from the REST API (preview)&lt;/H2&gt;
&lt;P&gt;Everything in the portal is also available through the Foundry evaluations REST API. You create an evaluation group that pins the benchmark (and judge model, when one is required), then add one run per target you want to evaluate. The example below shows the shape of a group-create request; the full reference — authentication, the judge-model variant, adding runs, error responses, limitations, and troubleshooting — lives on Microsoft Learn.&lt;/P&gt;
&lt;PRE class="lia-code-sample language-http" tabindex="0" contenteditable="false" data-lia-code-value="POST {project-endpoint}/openai/evals?api-version=2025-11-15-preview
Authorization: Bearer {token}
Content-Type: application/json

{
  &amp;quot;name&amp;quot;: &amp;quot;truthfulqa-benchmark-eval-group&amp;quot;,
  &amp;quot;display_name&amp;quot;: &amp;quot;TruthfulQA Benchmark&amp;quot;,
  &amp;quot;data_source_config&amp;quot;: {
    &amp;quot;type&amp;quot;: &amp;quot;azure_ai_source&amp;quot;,
    &amp;quot;scenario&amp;quot;: &amp;quot;benchmark_preview&amp;quot;,
    &amp;quot;benchmark_name&amp;quot;: &amp;quot;builtin.truthful_qa&amp;quot;,
    &amp;quot;benchmark_version&amp;quot;: &amp;quot;3&amp;quot;
  }
}"&gt;&lt;CODE&gt;POST {project-endpoint}/openai/evals?api-version=2025-11-15-preview
Authorization: Bearer {token}
Content-Type: application/json

{
  "name": "truthfulqa-benchmark-eval-group",
  "display_name": "TruthfulQA Benchmark",
  "data_source_config": {
    "type": "azure_ai_source",
    "scenario": "benchmark_preview",
    "benchmark_name": "builtin.truthful_qa",
    "benchmark_version": "3"
  }
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Each evaluation group needs at least one run — the run is what actually evaluates a target model or agent against the group’s benchmark. Add one run per target you want to compare.&lt;/P&gt;
&lt;PRE class="lia-code-sample language-http" tabindex="0" contenteditable="false" data-lia-code-value="POST {project-endpoint}/openai/evals/{evaluation-id}/runs?api-version=2025-11-15-preview
Authorization: Bearer {token}
Content-Type: application/json

{
  &amp;quot;name&amp;quot;: &amp;quot;truthfulqa-run-target-a&amp;quot;,
  &amp;quot;display_name&amp;quot;: &amp;quot;TruthfulQA - {target-deployment-a}&amp;quot;,
  &amp;quot;data_source&amp;quot;: {
    &amp;quot;type&amp;quot;: &amp;quot;azure_ai_benchmark_preview&amp;quot;,
    &amp;quot;target&amp;quot;: {
      &amp;quot;type&amp;quot;: &amp;quot;azure_ai_model&amp;quot;,
      &amp;quot;model&amp;quot;: &amp;quot;{connection-name}/{target-deployment-a}&amp;quot;
    }
  }
}"&gt;&lt;CODE&gt;POST {project-endpoint}/openai/evals/{evaluation-id}/runs?api-version=2025-11-15-preview
Authorization: Bearer {token}
Content-Type: application/json

{
  "name": "truthfulqa-run-target-a",
  "display_name": "TruthfulQA - {target-deployment-a}",
  "data_source": {
    "type": "azure_ai_benchmark_preview",
    "target": {
      "type": "azure_ai_model",
      "model": "{connection-name}/{target-deployment-a}"
    }
  }
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;See the Microsoft Learn how-to for the full reference: &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/observability/how-to/benchmark-evaluations" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/foundry/observability/how-to/benchmark-evaluations&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Best practices&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Start with one or two benchmarks to validate your setup before scaling to larger suites.&lt;/LI&gt;
&lt;LI&gt;Pin a stable judge model and a stable benchmark_version when comparing runs over time.&lt;/LI&gt;
&lt;LI&gt;Watch token usage. Large benchmarks and model-based judges add up; preflight a single run before launching a batch.&lt;/LI&gt;
&lt;LI&gt;Download results for failure analysis when an aggregate score doesn’t tell the whole story.&lt;/LI&gt;
&lt;LI&gt;Treat preview features as preview. Pin api-version, validate benchmark identifiers, and review release notes before depending on the API in production.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Final thoughts&lt;/H2&gt;
&lt;P&gt;Reliable AI applications depend on reliable measurement. As models, prompts, fine-tuning datasets, and agents continue to evolve, the teams that move fastest will be the ones that can quickly and credibly answer: “did this change make things better?”&lt;/P&gt;
&lt;P&gt;Benchmarks in Microsoft Foundry bring industry-standard measurement into the same place you build, deploy, and observe your AI systems — through a streamlined portal flow and a programmable REST API. To try it, open your Foundry project, go to Build &amp;gt; Evaluations, choose Create, and pick a small benchmark like AIME 2025 (30 examples) or GPQA Diamond (198) against a model deployment or agent. Once you have a baseline, add larger suites and additional targets, and let standardized benchmarks become part of how your team ships AI.&lt;BR /&gt;&lt;STRONG&gt;Get Started&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Start building in Microsoft Foundry:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://ai.azure.com/" target="_blank" rel="noopener"&gt;ai.azure.com&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Check out Observability BRK252 session:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://aka.ms/build26-BRK252" target="_blank" rel="noopener"&gt;aka.ms/build26-BRK252&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Read the docs:&lt;/STRONG&gt;&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/concepts/observability" target="_blank" rel="noopener"&gt;Foundry observability documentation&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Join the community:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://aka.ms/ai/discord" target="_blank" rel="noopener"&gt;aka.ms/ai/discord&lt;/A&gt;&lt;BR /&gt;&lt;STRONG&gt;Build blog:&lt;/STRONG&gt;&amp;nbsp;&lt;A class="lia-external-url" href="https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/" target="_blank"&gt;From observability to ROI for AI agents on any framework&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2026 16:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/benchmarks-in-microsoft-foundry-preview-standardized-model-and/ba-p/4523711</guid>
      <dc:creator>imatiach</dc:creator>
      <dc:date>2026-06-17T16:00:00Z</dc:date>
    </item>
    <item>
      <title>How to Measure Token Impact of MCP Tool Invocation in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/how-to-measure-token-impact-of-mcp-tool-invocation-in-microsoft/ba-p/4527338</link>
      <description>&lt;P&gt;Your MCP-enabled agent just ran, and the token counts don't add up. The API says 773 tokens. The portal trace shows 581/141. The trajectory view displays something else entirely. Before you file a bug report, here's what's actually happening—and how to build an evidence pattern that makes enterprise token accounting defensible.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What you'll get:&lt;/STRONG&gt; A reproducible A/B comparison method, visual evidence templates, and a reconciliation pattern that reduces FinOps review cycles.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Quick Reference&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;MCP (Model Context Protocol):&lt;/STRONG&gt; A standard for connecting AI models to external tools and data sources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Token:&lt;/STRONG&gt; The basic unit of text processing and billing for AI models—roughly 4 characters or three-quarters of a word.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Trajectory view:&lt;/STRONG&gt; Microsoft Foundry's visualization of the step-by-step execution path of an agent.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Token chips:&lt;/STRONG&gt; The inline token count indicators shown in Microsoft Foundry's trace UI.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;devtunnel:&lt;/STRONG&gt; A Microsoft tool for exposing local services to the internet for testing.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4&gt;The Challenge&lt;/H4&gt;
&lt;P&gt;A common assumption is that Execute Tool spans should directly expose billed token deltas for the tool call itself. In practice, this is not how the telemetry is usually interpreted in Microsoft Foundry traces.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What typically goes wrong:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Teams treat tool spans as billing boundaries.&lt;/LI&gt;
&lt;LI&gt;Teams compare numbers from different runs as if they were the same transaction.&lt;/LI&gt;
&lt;LI&gt;Teams mix thread token chips, trace table token columns, and API usage fields without reconciliation.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Example scenario:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;An MCP-enabled run shows tool activity and large total token count in API usage.&lt;/LI&gt;
&lt;LI&gt;A portal traces screenshot shows different Tokens In/Out values.&lt;/LI&gt;
&lt;LI&gt;Reviewers conclude the system is inconsistent, when the data is actually from different response IDs and run contexts.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Deep Dive&lt;/H4&gt;
&lt;P&gt;This validation used a prompt agent in Microsoft Foundry with an inline MCP tool that calls a weather MCP server.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Core components:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Microsoft Foundry prompt agent: weather-mcp-token-test-agent&lt;/LI&gt;
&lt;LI&gt;MCP server: remote endpoint over devtunnel&lt;/LI&gt;
&lt;LI&gt;Evidence sources:
&lt;UL&gt;
&lt;LI&gt;API invocation usage object (input, output, total tokens)&lt;/LI&gt;
&lt;LI&gt;Microsoft Foundry Traces table (Tokens In, Tokens Out)&lt;/LI&gt;
&lt;LI&gt;Trajectory view (Execute Tool and Chat spans)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Observed architecture behavior:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;MCP invocation is visible through &lt;CODE&gt;mcp_list_tools&lt;/CODE&gt; and &lt;CODE&gt;execute_tool&lt;/CODE&gt; spans.&lt;/LI&gt;
&lt;LI&gt;Token accounting appears in the model's response metadata—specifically the usage object that reports input tokens (what you sent) and output tokens (what the model generated).&lt;/LI&gt;
&lt;LI&gt;Tool outputs can increase subsequent model context, which increases turn-level token usage.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Visual evidence from the run:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Image 1: Agent MCP tool configuration&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Image 2: Traces table with token columns&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Image 3: Traces trajectory dialog&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Solution Approach&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;Establish two comparison paths explicitly:
&lt;UL&gt;
&lt;LI&gt;Path A: strict API A/B comparison on the same prompt.&lt;/LI&gt;
&lt;LI&gt;Path B: portal trace evidence for invocation and trace-level token columns.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Run MCP-enabled and baseline (no-tool) agents with the same prompt.&lt;/LI&gt;
&lt;LI&gt;Capture API usage values for both runs:
&lt;UL&gt;
&lt;LI&gt;MCP-enabled: input 581, output 192, total 773&lt;/LI&gt;
&lt;LI&gt;Baseline: input 57, output 57, total 114&lt;/LI&gt;
&lt;LI&gt;Delta: +659 total tokens&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Capture portal screenshots for trace-table and trajectory views, then record their row-level token values as separate run evidence:
&lt;UL&gt;
&lt;LI&gt;Example portal rows observed: 581/141, 581/141, 868/97&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Add a reconciliation statement in the report:
&lt;UL&gt;
&lt;LI&gt;API A/B totals and portal trace rows can represent different response IDs.&lt;/LI&gt;
&lt;LI&gt;They are complementary evidence and should not be forced into one exact row match.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Token variability note:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Absolute token counts can vary across runs due to differences in system instructions, tool payload length, model behavior, and response formatting.&lt;/LI&gt;
&lt;LI&gt;Preserve the same prompt and baseline conditions when producing A/B comparisons.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Key decisions:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use API usage as primary delta proof.&lt;/LI&gt;
&lt;LI&gt;Use traces and trajectory as operational proof of tool invocation and run behavior.&lt;/LI&gt;
&lt;LI&gt;Keep evidence linked by response IDs when possible.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Key Learnings / Best Practices&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Always separate token evidence sources:
&lt;UL&gt;
&lt;LI&gt;API usage fields for strict per-response accounting&lt;/LI&gt;
&lt;LI&gt;Portal trace table for run observability&lt;/LI&gt;
&lt;LI&gt;Trajectory spans for invocation semantics&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Never compare token rows across different response IDs without labeling them as separate runs.&lt;/LI&gt;
&lt;LI&gt;Treat Execute Tool spans as invocation evidence, not standalone billing truth.&lt;/LI&gt;
&lt;LI&gt;Capture both baseline and MCP-enabled runs using the exact same prompt for defensible deltas.&lt;/LI&gt;
&lt;LI&gt;Preserve screenshots and IDs together in the same evidence package.&lt;/LI&gt;
&lt;LI&gt;Add one reconciliation paragraph whenever mixed evidence sources are presented.&lt;/LI&gt;
&lt;LI&gt;For enterprise reporting, prefer clear A/B tables over narrative-only claims.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Conclusion&lt;/H4&gt;
&lt;P&gt;MCP tool invocation in Microsoft Foundry can materially increase turn-level token usage, but the increase must be measured with disciplined evidence handling. In this validation, API A/B comparison showed a +659 total-token increase for the same prompt when MCP tools were enabled.&lt;/P&gt;
&lt;P&gt;Going forward, enterprise teams should standardize an evidence pattern that combines API usage for accounting with trace/trajectory views for operational transparency.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Recommended next steps:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Engineers:&lt;/STRONG&gt; &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/overview" target="_blank" rel="noopener"&gt;Create a prompt agent in Microsoft Foundry&lt;/A&gt; and run the A/B method on one of your production prompts. Record response IDs with token evidence.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Platform/FinOps owners:&lt;/STRONG&gt; Standardize one token evidence reporting template that separates API usage evidence from portal trace evidence, then connect the results to your existing &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cost-management-billing/costs/quick-acm-cost-analysis" target="_blank" rel="noopener"&gt;Azure cost analysis&lt;/A&gt; workflow.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Launch post authors:&lt;/STRONG&gt; Reference this article's evidence pattern: include one trace screenshot, one A/B token table, and one reconciliation note to reduce review cycles.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Questions?&lt;/STRONG&gt; Join the discussion in the comments or connect with the Microsoft Foundry community on&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/answers/tags/1587/microsoft-foundry" target="_blank" rel="noopener"&gt;Microsoft Q&amp;amp;A&lt;/A&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;References&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/what-is-foundry?tabs=python" target="_blank" rel="noopener"&gt;Microsoft Foundry documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview?tabs=webapps" target="_blank" rel="noopener"&gt;Azure Monitor Application Insights overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview?tabs=webapps#getting-started" target="_blank" rel="noopener"&gt;OpenTelemetry with Application Insights&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure?tabs=global-standard&amp;amp;pivots=azure-openai" target="_blank" rel="noopener"&gt;Models available in Microsoft Foundry&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/concepts/built-in-evaluators" target="_blank" rel="noopener"&gt;Microsoft Foundry evaluations&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 12 Jun 2026 13:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/how-to-measure-token-impact-of-mcp-tool-invocation-in-microsoft/ba-p/4527338</guid>
      <dc:creator>vrajakishore</dc:creator>
      <dc:date>2026-06-12T13:00:00Z</dc:date>
    </item>
    <item>
      <title>Now in Foundry: Command A+ (W4A4), Chandra OCR 2, and GLM-OCR</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/now-in-foundry-command-a-w4a4-chandra-ocr-2-and-glm-ocr/ba-p/4526875</link>
      <description>&lt;P&gt;We are seeing two distinct trends this week. The first is around how low-bit quantization has developed to the point where large reasoning models can fit on a single accelerator with less quality loss. Second, a new wave of OCR-specialized vision-language models are redefining the accuracy-throughput frontier for document understanding.&lt;/P&gt;
&lt;P&gt;This week we are highlighting three&lt;A class="lia-external-url" href="https://aka.ms/hf/foundry-models" target="_blank"&gt; Hugging Face&lt;/A&gt; models in&amp;nbsp;&lt;A class="lia-external-url" href="https://ai.azure.com/explore/models" target="_blank"&gt;Microsoft Foundry:&lt;/A&gt;&amp;nbsp;&lt;STRONG&gt;Cohere Labs' Command A+ (W4A4&lt;/STRONG&gt;), a 218B-parameter Sparse Mixture-of-Experts (MoE) reasoning model optimized for agentic, multilingual, and reasoning-heavy tasks;&lt;STRONG&gt;&amp;nbsp;Datalab's&amp;nbsp;Chandra OCR 2&lt;/STRONG&gt;, a 5.3B vision-language model that converts images and PDFs to markdown, HTML, and JSON while preserving layout, with state-of-the-art results on the&amp;nbsp;olmOCR&amp;nbsp;benchmark and 90+ language coverage; and&amp;nbsp;&lt;STRONG&gt;Z.ai's&amp;nbsp;GLM-OCR&lt;/STRONG&gt;, a 0.9B compact OCR model—roughly 6× smaller than Chandra OCR 2—built on the GLM-V encoder–decoder architecture that ranks first on&amp;nbsp;OmniDocBench&amp;nbsp;V1.5 while serving at high concurrency.&amp;nbsp;&lt;/P&gt;
&lt;H1 aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Models of the week&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H1&gt;
&lt;H3 aria-level="1"&gt;&lt;A href="https://ai.azure.com/catalog/models/coherelabs-command-a-plus-05-2026-w4a4" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Cohere Labs: Command A+ (W4A4)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:360,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Model Specs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Parameters / size: 218B total, 25B active per token&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Context length: 128K input, 64K output&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Primary task: Text generation with vision input, reasoning, and tool use&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Why&amp;nbsp;it's&amp;nbsp;interesting&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Efficient, low compute deployment&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Command A+ is designed to run on&amp;nbsp;relatively minimal&amp;nbsp;hardware for its size while&amp;nbsp;maintaining&amp;nbsp;high performance. It achieves this through advanced quantization and optimization techniques that reduce&amp;nbsp;compute, latency, and cost. However, reasoning models are especially sensitive to quantization, as errors can accumulate over long decoding sequences. To mitigate this, the quantized student model is post-trained against the full-precision teacher’s output distribution, using fake quantization in the forward pass and straight-through estimators during backpropagation.&amp;nbsp;CohereLabs&amp;nbsp;recommends the W4A4 quantization for its strong balance of speed and latency.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Multilingual, multimodal, and reasoning focused performance gains&lt;/STRONG&gt;:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Command A+ extends to 48 different languages (previously 23) and is built for complex reasoning and multimodal tasks with&amp;nbsp;measureable&amp;nbsp;improvements across document understanding, math reasoning, and enterprise QA workflows.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Try it&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="lia-embeded-content" contenteditable="false"&gt;&lt;IFRAME src="https://coherelabs-command-a-plus-05-2026.hf.space" width="850" height="450" frameborder="0" sandbox="allow-scripts allow-same-origin allow-forms"&gt;&lt;/IFRAME&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Test this prompt in the&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;CohereLabs Hugging Face Space&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;before deploying the model in Foundry:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Sample prompt:&lt;/STRONG&gt; You are Command, a legal AI for multinational contract review with access to CONTRACT_VAULT_QUERY and POLICY_TEMPLATE_RETRIEVAL tools. Analyze the input clause by first detecting language and classifying obligation type, then use CONTRACT_VAULT to find comparable {jurisdiction} clauses and retrieve the relevant policy template. Output structured JSON with obligation classification, comparative findings, risk assessment, and English recommendations with exact document citations. Include confidence scores, similarity metrics, and a reasoning trace showing each analysis step. Handle Polish/Japanese legal terminology accurately, preserve legal precision, and ensure all citations reference actual source documents. Use chain-of-thought reasoning, stay within 128K tokens, and never hallucinate references—state limitations explicitly when tools fail.&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 aria-level="1"&gt;&lt;A href="https://ai.azure.com/catalog/models/datalab-to-chandra-ocr-2" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Datalab: Chandra OCR 2&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Model Specs&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Parameters / size: 5.3B&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Output formats: Markdown, HTML, and JSON&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Primary task: Document OCR (image-text-to-text)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Why&amp;nbsp;it's&amp;nbsp;interesting&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="3" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;State-of-the-art on the&amp;nbsp;olmOCR&amp;nbsp;benchmark&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;: Chandra OCR 2&amp;nbsp;recieved&amp;nbsp;85.9% bench score on the&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://huggingface.co/datasets/allenai/olmOCR-bench" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;olmOCR Benchmark&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;and a 77.8% multilingual bench score (12% improvement over Chandra 1).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="3" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Support for 90 world languages&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;: Indic script, European languages, and languages that read right to left say&amp;nbsp;substantial&amp;nbsp;improvemtns&amp;nbsp;based on&amp;nbsp;Datalab’s&amp;nbsp;internal benchmarking. View the full list of languages and the benchmark results here:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/datalab-to/chandra/blob/master/FULL_BENCHMARKS.md" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Chandra 2 Language List&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="3" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Better complex layout understanding&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Handles multi-level tables, nested structures, forms, math, and mixed handwriting with structured outputs (HTML/JSON/Markdown + bounding boxes), removing the need for post-OCR layout reconstruction.&amp;nbsp;Take a look here:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Try it&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Build an automated compliance intake pipeline using Chandra OCR 2 for structured extraction across complex, handwritten and form-based documents.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;In this scenario,&amp;nbsp;you’re&amp;nbsp;supporting a state election commission processing large volumes of candidate filings&amp;nbsp;submitted&amp;nbsp;as scanned forms or mobile-captured images. These documents often include mixed handwriting quality, checkbox selections, signatures, and structured fields that must be&amp;nbsp;validated&amp;nbsp;for compliance.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Chandra OCR 2 can extract both printed and handwritten fields,&amp;nbsp;identify&amp;nbsp;form structure, and capture key elements such as candidate information, filing details, checkbox states, and signed declarations in a consistent JSON format. This structured output can then be passed into a compliance workflow to&amp;nbsp;validate&amp;nbsp;completeness, detect inconsistencies, and flag filings that require manual review.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This approach helps streamline high-volume intake while improving accuracy and reducing manual processing across complex document types.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Sample prompt:&amp;nbsp;Extract all fields from this filing and return a structured JSON output including form type, candidate name, office sought, district, committee name, treasurer, filing date, checkbox states, and a transcription of the signed declaration. Include bounding boxes for each extracted field.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;&lt;A href="https://ai.azure.com/catalog/models/zai-org-glm-ocr" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Z.ai: GLM-OCR&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Model Specs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Parameters / size: 0.9B&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Languages: Chinese, English, French, Spanish, Russian, German, Japanese, Korean&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Primary task: Document OCR (image-text-to-text)&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Why&amp;nbsp;it's&amp;nbsp;interesting&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;High accuracy at a compact scale&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;GLM-OCR achieves a score of 94.62 on&amp;nbsp;OmniDocBench&amp;nbsp;V1.5, showing&amp;nbsp;strong performance&amp;nbsp;on tasks such as formula recognition, table extraction, and document parsing—even at sub-1B scale&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Designed for structured document understanding&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;The model performs well across complex document layouts, enabling extraction of tables, forms, and mixed text-image content&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Optimized training for consistency across tasks&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Uses Multi-Token Prediction (MTP) and full-task reinforcement learning to improve stability and accuracy across diverse document types&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Efficient for real-world deployment&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;Its smaller footprint makes it well suited for scalable OCR pipelines where cost, latency, and throughput matter&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Try it&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Build a high-throughput document ingestion pipeline using GLM-OCR for structured extraction across diverse document types.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Imagine&amp;nbsp;you&amp;nbsp;are&amp;nbsp;operating&amp;nbsp;a customer onboarding platform that&amp;nbsp;processes&amp;nbsp;identity documents, invoices, and proof-of-income statements across multiple languages. GLM-OCR can be used to extract key fields—such as names, ID numbers, dates, and addresses—and output them in a consistent structured format for downstream systems.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The model’s compact footprint makes it well suited for scaling high-volume OCR workflows, enabling you to process large batches of documents efficiently while&amp;nbsp;maintaining&amp;nbsp;accuracy across layouts like tables, forms, and mixed text-image content.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Sample prompt:&amp;nbsp;Extract the following fields from this document and return a structured JSON output: full name, ID number, date of birth, address, document type, and expiration date. Ensure all fields match the document exactly, including formatting.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Getting started&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Whether you are coming straight from the Hugging Face hub or are&amp;nbsp;already&amp;nbsp;in&amp;nbsp;Microsoft Foundry, deploying new&amp;nbsp;open models is getting simpler.&amp;nbsp;You can deploy&amp;nbsp;models on&amp;nbsp;Foundry by browsing the Hugging Face collection&amp;nbsp;in the&amp;nbsp;model catalog&amp;nbsp;or you can choose "Deploy on Microsoft Foundry"&amp;nbsp;on the Hugging Face website, which brings you straight into&amp;nbsp;Foundry&amp;nbsp;with secure, scalable inference already configured. Read the documentation to learn more:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A href="https://huggingface.co/docs/microsoft-azure/en/index" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Read Hugging Face on Azure docs&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A href="https://huggingface.co/docs/microsoft-azure/en/guides/one-click-deployment-foundry" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt; &lt;/LI&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;A style="font-style: normal; font-weight: 400; background-color: rgb(255, 255, 255);" href="https://ai.azure.com/catalog/publishers/hugging%20face,huggingface" target="_blank"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Explore models in Microsoft Foundry&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 10 Jun 2026 22:59:40 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/now-in-foundry-command-a-w4a4-chandra-ocr-2-and-glm-ocr/ba-p/4526875</guid>
      <dc:creator>Osi</dc:creator>
      <dc:date>2026-06-10T22:59:40Z</dc:date>
    </item>
    <item>
      <title>Transitioning from Azure Language Features to Foundry Models</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/transitioning-from-azure-language-features-to-foundry-models/ba-p/4524092</link>
      <description>&lt;P&gt;Azure AI Language in Foundry Tools will begin deprecating eight features:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Key Phrase Extraction, Entity Linking, Sentiment Analysis &amp;amp; Opinion Mining, Summarization (document &amp;amp; conversation), Conversational Language Understanding (CLU), Custom Question Answering (CQA), Orchestration Workflow, and Custom Text Classification, &lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;to be ultimately &lt;STRONG&gt;retired by March 2029&lt;/STRONG&gt; (Entity Linking even sooner, by &lt;EM&gt;September 20, 2028&lt;/EM&gt;).&lt;/P&gt;
&lt;P&gt;Microsoft’s guidance is to implement these functionalities using Microsoft Foundry-based models. This series of recommendation guides is intended to help you get started as you explore the breadth of Foundry-based options at your fingertips. Each standalone guide includes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A brief overview of the Azure AI Language in Foundry Tools feature.&lt;/LI&gt;
&lt;LI&gt;A recommended Foundry model and deployment for a similar solution.&lt;/LI&gt;
&lt;LI&gt;A tutorial to implement the new solution via Foundry’s REST API.&lt;/LI&gt;
&lt;LI&gt;A point-in-time comparison table listing alternate Foundry models you might consider for each feature.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Endpoint conventions used in this guide&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;The REST examples below use the unified Microsoft Foundry inference endpoint, which is the same Azure OpenAI path used for every model (OpenAI, Mistral, Cohere, Phi, and more): https://&amp;lt;your-resource&amp;gt;.services.ai.azure.com/openai/deployments/&amp;lt;your-deployment&amp;gt;/chat/completions?api-version=2024-10-21. Authenticate with header api-key: &amp;lt;YOUR_API_KEY&amp;gt;. &amp;lt;your-deployment&amp;gt; is the deployment name you assigned when adding the model to your Foundry resource.&lt;BR /&gt;&lt;BR /&gt;For keyless (Microsoft Entra ID) authentication, swap the URL to https://&amp;lt;your-resource&amp;gt;.openai.azure.com/openai/v1/chat/completions and use header Authorization: Bearer &amp;lt;ENTRA_TOKEN&amp;gt; (issue the token with scope https://ai.azure.com/.default).&lt;/PRE&gt;
&lt;PRE&gt;The older Azure AI Inference beta route (&amp;lt;endpoint&amp;gt;.&amp;lt;region&amp;gt;.models.ai.azure.com/chat/completions) is being retired on August 26, 2026 — use the unified path above for any new code.&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;New: Microsoft’s first-party MAI models in Foundry. &lt;/STRONG&gt;Microsoft’s in-house MAI models are now available in Microsoft Foundry. For the text and language features covered in this guide, the relevant new option is MAI-Thinking-1 — Microsoft’s first reasoning model (sparse Mixture-of-Experts, ~35B active / ~1T total parameters, 256K-token context), introduced at Build 2026 and currently in private preview. It is Chat Completions API-compatible and supports function calling, so it slots into the same call patterns shown below, and it now appears as an alternate option in the comparison tables for the reasoning- and context-heavy features. The MAI media models released in April 2026 — MAI-Transcribe-1 (speech-to-text), MAI-Voice-1 (text-to-speech), and MAI-Image-2 (image generation) — are not covered here, since they address audio and vision rather than the text features being retired.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;1. Key Phrase Extraction → Foundry with mistral-small-2503 (24B)&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Key Phrase Extraction identifies main ideas or topics in unstructured text. &lt;/STRONG&gt;For example, given the sentence &lt;EM&gt;“The food was delicious and the staff were wonderful.”&lt;/EM&gt;, it returns the phrases “food” and “staff” as the key topics. In Foundry, you can replicate this by prompting a knowledge-dense model to extract significant terms from text and return them in JSON format. We recommend trying out &lt;A href="https://ai.azure.com/catalog/models/mistral-small-2503" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;mistral-small-2503&lt;/STRONG&gt;&lt;/A&gt;, a 24-billion-parameter open model that rivals larger proprietary models on comprehension tasks.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Deploy the Mistral model in Foundry.&lt;/STRONG&gt; In your Foundry project, deploy the Mistral Small 3.1 model (&lt;A href="https://ai.azure.com/catalog/models/mistral-small-2503" target="_blank" rel="noopener"&gt;mistral-small-2503&lt;/A&gt;). This model is licensed for enterprise use and can process extremely long documents (128,000 tokens) with state-of-the-art accuracy. Once deployed, note its &lt;EM&gt;deployment name&lt;/EM&gt; and &lt;EM&gt;region&lt;/EM&gt; from the Azure portal.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Write a prompt for key phrase extraction.&lt;/STRONG&gt; Instruct the model to extract key phrases and output them as JSON by writing a prompt for key phrase extraction. For example, your prompt can say: &lt;EM&gt;“Extract the key phrases from the given text and output them in JSON as a list.”&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Call the Foundry REST API with your text.&lt;/STRONG&gt; Use your Foundry endpoint and API key to send a POST request to the model’s chat completion API (similar to how you called Azure’s API). Include a system message with extraction instructions, and a user message containing your text. For example:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; POST https://&amp;lt;your-resource&amp;gt;.services.ai.azure.com/openai/deployments/&amp;lt;your-deployment&amp;gt;/chat/completions?api-version=2024-10-21&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Headers: { "api-key": "&amp;lt;YOUR_API_KEY&amp;gt;", "Content-Type": "application/json" } &lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Body: {&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"messages": [&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"role": "system",&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"content": "Extract the key phrases (main topics) from the given text. Respond in JSON as {\"keyPhrases\": [ ... ]}."&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"role": "user",&lt;BR /&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"content": "Text: \"The food was delicious and there were wonderful staff.\"\nTask: What are the key phrases?"&lt;BR /&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Receive the JSON output.&lt;/STRONG&gt; The Mistral model will parse the text and return the key concepts in JSON form. For this example, it responds with:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"keyPhrases": [&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"food",&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"staff"&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;You can write your prompt to return a desired structured output to ensure that it returns a list of extracted key phrases. Your application can then consume the "keyPhrases" array.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Key Phrase Extraction:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/Phi-3.5-mini-instruct" target="_blank" rel="noopener"&gt;phi-3.5-mini-instruct&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3.8B-parameter LLM from Microsoft’s Phi family, offering faster inference. Use if you need low latency or on-premise deployment – it’s smaller but trained on high-quality data and outperforms other models of similar size. It’s effective for straightforward key extraction, though not as nuanced as larger models.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;claude-sonnet-4-6&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Anthropic’s advanced model with multilingual skills and extremely long context (up to 200K tokens). Choose this if summarizing or extracting topics from very large or complex documents across many languages is required, and top-tier quality is worth the higher cost.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;2. Entity Linking → Foundry with gpt-4o + Retrieval&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI Entity Linking identifies named entities in text (people, places, organizations, etc.) and links each to a reference ID or entry.&lt;/STRONG&gt; While Azure AI’s stand-alone Entity Linking will be retired, Microsoft’s officially recommended replacement is Azure AI Language in Foundry Tools’ Named Entity Recognition (NER), which continues to be supported and identifies the same entity types. If you need URL-linked entities, multilingual disambiguation, or richer reasoning beyond NER’s tag set, Foundry offers an alternative using a combination of knowledge retrieval and a generative model. One option is to try out &lt;A href="https://ai.azure.com/catalog/models/gpt-4o-mini" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;gpt-4o&lt;/STRONG&gt;&lt;/A&gt;, which excels at reasoning and handling complex instructions. By pairing it with your enterprise or public knowledge sources, you can link entities accurately.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Set up an entity knowledge base.&lt;/STRONG&gt; Prepare the reference data for your entities. For public knowledge, you might use Wikipedia; for internal terms, index the definitions using Azure AI Search. This knowledge base will allow the LLM to retrieve factual descriptions for any given entity.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Deploy&lt;/STRONG&gt; &lt;STRONG&gt;gpt-4o&lt;/STRONG&gt; &lt;STRONG&gt;in Foundry.&lt;/STRONG&gt; In your Foundry resource, deploy &lt;A href="https://ai.azure.com/catalog/models/gpt-4o-mini" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;gpt-4o&lt;/STRONG&gt;&lt;/A&gt; from the Foundry model catalog (or choose an equivalent high-accuracy model). This model is designed for broad reasoning and “agentic” tasks – meaning it can use tools and contextual data to solve problems. With strong general knowledge and reasoning, it’s ideal for disambiguating entities.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Implement retrieval + linking prompt.&lt;/STRONG&gt; When processing text, first retrieve relevant information about each potential entity. For instance, if the text says &lt;EM&gt;“Jane moved to Paris to work at Microsoft.”&lt;/EM&gt;, search your knowledge base or call an API to get info on &lt;EM&gt;“Paris”&lt;/EM&gt; and &lt;EM&gt;“Microsoft”&lt;/EM&gt;. Then, craft a prompt that provides this context to the model and asks it to output a JSON mapping of entities to their IDs/URLs. For example:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"messages": [&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "system", "content":&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"You are an entity linking assistant. Identify entities in text and link each to its official reference ID and URL. Use provided context as needed and respond with JSON."},&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "user", "content":&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Text: 'Jane moved to Paris to work at Microsoft.'\n\n" +&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Context:\n" +&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Paris – City in France (ID: Paris, URL: https://en.wikipedia.org/wiki/Paris).\n" +&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Microsoft – Technology company (ID: Microsoft, URL: https://en.wikipedia.org/wiki/Microsoft).\n\n" +&lt;BR /&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Task: List each entity with its name, id, and url in JSON format as {\"entities\": [...] }."&lt;BR /&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Get the JSON linking output.&lt;/STRONG&gt; The GPT-4o model, grounded by the context you provided, will output a JSON block such as:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"entities": [&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"name": "Paris",&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"id": "Paris",&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"url": "https://en.wikipedia.org/wiki/Paris"&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"name": "Microsoft",&lt;BR /&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"id": "Microsoft",&lt;BR /&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"url": "https://en.wikipedia.org/wiki/Microsoft"&lt;BR /&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Entity Linking:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;claude-sonnet-4-6&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Anthropic’s advanced model, which has broad world knowledge and a large context window (up to 200K tokens). Useful if you want a non-OpenAI option with strong multilingual understanding and the ability to process very long texts for entity linking. It’s comparable in accuracy to GPT-4o, with a different safety profile.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/Phi-3.5-mini-instruct" target="_blank" rel="noopener"&gt;phi-3.5-mini&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A 3.8B-parameter open model fine-tuned by Microsoft. Consider fine-tuning this model on your specific entity domain if you need a self-hosted solution. Paired with a high-quality search index, it can perform domain-specific entity linking at a fraction of the cost – ideal for smaller knowledge bases and offline use (with some tuning).&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://microsoft.ai/news/introducing-mai-thinking-1/" target="_blank" rel="noopener"&gt;MAI-Thinking-1&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Microsoft’s first in-house reasoning model (sparse Mixture-of-Experts, ~35B active parameters, 256K-token context), in private preview on Microsoft Foundry. Its step-by-step reasoning suits disambiguating entities over retrieved context, and it supports function calling for tool-based knowledge lookups. Trained only on commercially licensed data with no third-party distillation, which helps with provenance in regulated scenarios. Requires preview access approval.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;3. Sentiment Analysis &amp;amp; Opinion Mining → Foundry with phi-3.5-mini (3.8B)&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI Sentiment Analysis evaluates whether text expresses a positive, neutral, or negative tone, while Opinion Mining provides granular sentiment on specific aspects or targets in the text.&lt;/STRONG&gt; For example, in a hotel review &lt;EM&gt;“The room was great, but the staff was unfriendly,”&lt;/EM&gt; the overall sentiment might be mixed, with aspect-level results: &lt;EM&gt;room: positive; staff: negative&lt;/EM&gt;. We suggest you try a model, like &lt;A href="https://ai.azure.com/catalog/models/Phi-3.5-mini-instruct" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Phi-3.5-mini model&lt;/STRONG&gt;&lt;/A&gt;, which supports 128K context and was fine-tuned by Microsoft for quality instruction following. This model handles sentiment classification with high accuracy and speed, rivaling much larger models.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Deploy the Phi model.&lt;/STRONG&gt; In the Foundry portal model catalog, search for &lt;A href="https://ai.azure.com/catalog/models/Phi-3.5-mini-instruct" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Phi-3.5-mini-instruct&lt;/STRONG&gt;&lt;/A&gt;&lt;STRONG&gt;&lt;U&gt; (128k)&lt;/U&gt;&lt;/STRONG&gt; and deploy it. Once deployed, record the deployment name and have your API key ready for the following steps.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Prepare your sentiment analysis prompt.&lt;/STRONG&gt; Write a prompt that instructs the model to produce a sentiment analysis JSON output. Use a prompt like: &lt;EM&gt;"Analyze the following text. Provide 'sentiment' (positive, neutral, negative) and, if applicable, list 'opinions' of specific aspects and their sentiment. Respond in JSON."&lt;/EM&gt; This explicitly asks for an overall sentiment and aspects that inform the predicted sentiment.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Send the text to the model’s REST endpoint.&lt;/STRONG&gt; Using your Foundry endpoint and key, call the phi-3.5-mini chat API with the text. For an example input:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"messages": [&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "system", "content":&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Analyze the sentiment. Output JSON with overall 'sentiment' and 'opinions' for aspects mentioned."},&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "user", "content":&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Review: 'The room was great, but the staff was unfriendly and rude.'"}&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Review the output JSON. &lt;/STRONG&gt;Once you adjust your prompt to your use case, the model can provide an analysis similar to Azure’s service output:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"sentiment": "mixed",&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"confidenceScores": {&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"positive": 0.5,&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"neutral": 0.0,&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"negative": 0.5&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;},&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"opinions": [&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"aspect": "room",&lt;BR /&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"sentiment": "positive"&lt;BR /&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;BR /&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"aspect": "staff",&lt;BR /&gt;15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"sentiment": "negative"&lt;BR /&gt;16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;17&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;18&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;The Phi model’s output is factual and concise, capturing the correct tone for each element. You can adjust the prompt for more or fewer details (e.g., add confidenceScores or not) as needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Sentiment:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/mistral-small-2503" target="_blank" rel="noopener"&gt;mistral-small-2503&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A larger (24B) model that may catch subtle nuances in sentiment. Its extra capacity can help in very complex or lengthy documents with mixed sentiment signals. Use if you need maximum accuracy for multi-paragraph analysis, or multi-language sentiment detection.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;claude-sonnet-4-6&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Anthropic’s advanced model with a deep understanding of context and tone. It’s especially capable for long-form text (e.g., multi-page reviews or chat transcripts) due to its large context window. Consider this if you require highly detailed sentiment reasoning and summarization in one go (with higher compute cost).&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;4. Text &amp;amp; Conversation Summarization → Foundry with claude-sonnet-4-6&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI Summarization automatically produces a concise summary of input text.&lt;/STRONG&gt; Azure AI’s summarization includes &lt;EM&gt;extractive&lt;/EM&gt; (selecting key sentences) and &lt;EM&gt;abstractive&lt;/EM&gt; (generating new summary text) modes for documents and conversations. To continue summarizing content, we recommend &lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank"&gt;&lt;STRONG&gt;Claude Sonnet 4.6&lt;/STRONG&gt;&lt;/A&gt;, known for its strength in summarization and very long context support. Claude can ingest large documents or transcripts and generate coherent, human-like summaries.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Deploy the Claude model.&lt;/STRONG&gt; In Foundry, deploy &lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank"&gt;&lt;STRONG&gt;Claude Sonnet 4.6&lt;/STRONG&gt;&lt;/A&gt; as your model. With Claude deployed, note the deployment’s name for use in the API call.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Create a summarization prompt.&lt;/STRONG&gt; Decide on the summary style (short paragraph, bullet points, etc.). For example: &lt;EM&gt;“Summarize the following document in 2-3 sentences.”&lt;/EM&gt; For conversation transcripts, you might ask: &lt;EM&gt;“Summarize the conversation below, highlighting key issues and resolutions.”&lt;/EM&gt; Also specify that the response should be in JSON (e.g., {"summary": "..."}) for a more structured output style.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Call Claude with your content.&lt;/STRONG&gt; Use the Foundry API to send the text and prompt. For example, to summarize a document:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"messages": [&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "system", "content": "You summarize texts. Output JSON with a 'summary' field."},&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "user", "content": "Document: At any given moment, turnaround coordinators at Lufthansa CityLine monitor video feeds of airplanes at gates, ensuring each step of unloading and servicing planes happens on time. Even minutes of delay can add up, costing airlines millions of dollars a year. ...\n\nTask: Summarize the main point of this document in 2-3 sentences."}&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Inspect the summary output.&lt;/STRONG&gt; Claude will reply with a short summary, for example:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"summary": "Lufthansa CityLine uses real-time video monitoring to streamline aircraft turnarounds. By tracking tasks like unloading, refueling, and cleaning, they minimize delays and save significant costs."&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Claude’s summarization quality is excellent for both extractive and abstractive summaries. Additionally, because Claude Sonnet 4.6 can incorporate enormous context, you can feed very long content directly without splitting it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Conversation summarization: per-aspect starter prompts&lt;/STRONG&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The example above covers plain document summarization. If you are migrating the Conversation Summarization API, note that it let you request one or more summaryAspects: issue, resolution, recap, chapterTitle, and narrative — each with a distinct purpose, so a single generic summary prompt does not reproduce them. In Foundry, reproduce each aspect with its own targeted system prompt on claude-sonnet-4-6, sending the speaker-labeled transcript (for example, Agent: / Customer:) as the user message. The output field names below mirror the Azure API (summaries: [{aspect, text}]) so your integration code barely changes. Request several aspects in parallel calls, or combine them into a single call.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt; Issue — the customer’s problem (call-center focused)&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"role":"system","content":"You replicate the Azure AI Language conversation summarization 'issue' aspect. Read the agent/customer transcript and summarize ONLY the customer's primary problem — what they contacted support about — from the customer's perspective. Do NOT include any resolution or troubleshooting steps. Respond ONLY with JSON: {\"summaries\":[{\"aspect\":\"issue\",\"text\":\"&amp;lt;concise issue&amp;gt;\"}]}"}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Example output:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"summaries":[{"aspect":"issue","text":"Customer could not set up the Wi-Fi connection for their Smart Brew 300 espresso machine."}]}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt; Resolution — what fixed it, or didn’t (call-center focused)&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"role":"system","content":"You replicate the Azure AI Language conversation summarization 'resolution' aspect. Summarize ONLY the actions the agent took to resolve the customer's issue and the final outcome, including any steps that failed or remain open. Do NOT restate the issue itself. Respond ONLY with JSON: {\"summaries\":[{\"aspect\":\"resolution\",\"text\":\"&amp;lt;concise resolution&amp;gt;\"}]}"}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Example output:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"summaries":[{"aspect":"resolution","text":"The agent had the customer reset the Wi-Fi button and attempt a factory reset; neither restored the connection, so the issue was escalated for further troubleshooting."}]}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt; Recap — a brief one-paragraph summary of the whole conversation&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"role":"system","content":"You replicate the Azure AI Language conversation summarization 'recap' aspect. Condense the entire conversation into a single brief paragraph (2–4 sentences) capturing the participants, the main topic, and the outcome. Respond ONLY with JSON: {\"summaries\":[{\"aspect\":\"recap\",\"text\":\"&amp;lt;brief recap&amp;gt;\"}]}"}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;OL start="4"&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt; Chapter title — topic-based segmentation with titles&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"role":"system","content":"You replicate the Azure AI Language conversation summarization 'chapterTitle' aspect. Segment the conversation into sequential chapters by topic, in chronological order, and give each a short title (max ~10 words). For each chapter, also return the verbatim opening line so offsets can be recomputed client-side. Respond ONLY with JSON: {\"summaries\":[{\"aspect\":\"chapterTitle\",\"text\":\"&amp;lt;chapter title&amp;gt;\",\"startsAt\":\"&amp;lt;first utterance of the chapter, verbatim&amp;gt;\"}]}"}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Azure returned character offsets per chapter. Because an LLM cannot reliably count characters, this prompt returns the verbatim opening line of each chapter; map it back to an offset with a string search on the client side.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="5"&gt;
&lt;LI&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt; Narrative — detailed call or meeting notes&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;SPAN data-contrast="none"&gt;{"role":"system","content":"You replicate the Azure AI Language conversation summarization 'narrative' aspect. Produce detailed call notes / meeting notes for the conversation — more thorough than a recap — in coherent prose or structured notes, preserving the order events occurred and any decisions or follow-ups. Respond ONLY with JSON: {\"summaries\":[{\"aspect\":\"narrative\",\"text\":\"&amp;lt;detailed notes&amp;gt;\"}]}"}&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Migration note: where you previously passed summaryAspects: ["issue","resolution"] in a single call, now select the matching prompt(s) above. For multiple aspects, make parallel calls (cleanest — keeps each output focused) or combine the instructions and ask for one summaries array containing every requested aspect.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Summarization:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/gpt-4o-mini" target="_blank"&gt;gpt-4o&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;OpenAI’s frontier model, superb at concise writing. It excels in accuracy and clarity when summarizing text, often capturing nuances. Use if you prefer OpenAI’s style or need highly polished results; it’s a bit slower but sets the state-of-art for quality.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/mistral-small-2503" target="_blank"&gt;mistral-small-2503&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A strong open model (24B) that often matches larger LLMs on NLP tasks. It’s a cost-effective summarizer for moderate-length documents, or if you plan to fine-tune a summarization model. It runs on self-hosted infrastructure under an Apache license, giving flexibility and no dependency on external APIs.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://microsoft.ai/news/introducing-mai-thinking-1/" target="_blank"&gt;MAI-Thinking-1&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Microsoft’s first-party reasoning model, in private preview on Microsoft Foundry. Its 256K-token context window can ingest very long documents or transcripts in a single pass (Microsoft cites roughly a 600-page document), making it a first-party option for abstractive summarization where multi-step reasoning over the whole text matters. Chat Completions API-compatible, so it uses the same call pattern shown above. Requires preview access approval.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;5. Conversational Language Understanding (CLU) → Foundry (claude-sonnet-4-6, or fine-tuned model)&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI’s Conversational Language Understanding (CLU) allows chatbots to interpret user utterances by predicting a top intent and extracting key entities.&lt;/STRONG&gt; For example, given &lt;EM&gt;“Book a flight to Paris tomorrow”&lt;/EM&gt;, a custom CLU model might return Intent: &lt;EM&gt;BookFlight&lt;/EM&gt; and Entities like &lt;EM&gt;Destination = Paris&lt;/EM&gt;, &lt;EM&gt;Date = tomorrow&lt;/EM&gt;. Foundry offers two solutions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Approach A – Prompt-based CLU with an LLM:&lt;/STRONG&gt; Use a single advanced model (like &lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Claude Sonnet 4.6&lt;/STRONG&gt;&lt;/A&gt;) to classify intents &lt;EM&gt;zero-shot&lt;/EM&gt; and format a JSON response with the predicted intent and entities. This avoids any training and leverages the model’s high generalization ability.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Approach B – Fine-tuned custom model:&lt;/STRONG&gt; Fine-tune an open model (e.g., &lt;A href="https://ai.azure.com/catalog/models/Ministral-3B" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Ministral 3B&lt;/STRONG&gt;&lt;/A&gt;) on your existing labeled utterances to get a dedicated classifier for your scenario. This requires training (similar to CLU today) but yields maximum domain accuracy and offline deployment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Approach A: Use claude-sonnet-4-6 as an Intent &amp;amp; Entity Model&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Gather CLU schema and examples.&lt;/STRONG&gt; Assemble your existing list of intents (and what each means), plus any entity types. Include a &lt;EM&gt;None&lt;/EM&gt; intent for irrelevant utterances as CLU does. For example, suppose your bot’s intents are: &lt;EM&gt;GetWeather&lt;/EM&gt; (ask about weather), &lt;EM&gt;BookFlight&lt;/EM&gt; (book a flight), &lt;EM&gt;CancelBooking&lt;/EM&gt; (cancel a booking), and &lt;EM&gt;None&lt;/EM&gt; (no match).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Build a detailed prompt.&lt;/STRONG&gt; Incorporate the above schema into a prompt for the model. You might create a system message like: &lt;EM&gt;“&lt;/EM&gt;&lt;/P&gt;
&lt;PRE&gt;"You are a conversational NLU classifier that mimics Azure Conversational Language Understanding (CLU). Your job is to read the user's text and return a CLU-like prediction JSON for intents and entities.\n\nTASK\n1) Identify the single best matching intent from the ALLOWED INTENTS list below.\n2) Extract zero or more entities from the text using the ALLOWED ENTITY TYPES list below.\n3) Return ONLY valid JSON that follows the OUTPUT SCHEMA exactly (no markdown, no explanation, no extra keys).\n\nALLOWED INTENTS (choose exactly one)\n- GetWeather: user is asking about weather conditions or forecast.\n- BookFlight: user is requesting to book, reserve, or search for a flight.\n- CancelBooking: user is requesting to cancel an existing reservation/booking (flight/hotel/etc.).\n- None: none of the intents above apply.\n\nALLOWED ENTITY TYPES (extract only these)\n- Location: city, region, airport, or place name that is relevant to the request.\n- Date: a date or date-like phrase (e.g., \"tomorrow\", \"next Friday\", \"2026-03-24\").\n\nSCORING RULES\n- intentConfidence must be a number from 0.00 to 1.00.\n- If the intent is None, set intentConfidence between 0.00 and 0.49 and return an empty entities array.\n- If an intent matches well, set intentConfidence between 0.70 and 1.00.\n- If uncertain between two intents, pick the best one and set intentConfidence between 0.50 and 0.69.\n\nENTITY RULES\n- Each entity object MUST include: category, text, offset, length, confidenceScore.\n- offset is the 0-based character index of the first character of the entity span in the ORIGINAL user text.\n- length is the number of characters in the span.\n- confidenceScore must be a number from 0.00 to 1.00.\n- Do not invent entities that are not explicitly present in the text.\n\nOUTPUT SCHEMA (return exactly this structure)\n{\n \"query\": \"&amp;lt;original user query&amp;gt;\",\n \"prediction\": {\n \"topIntent\": \"&amp;lt;best matching intent name&amp;gt;\",\n \"intents\": {\n \"GetWeather\": {\"confidenceScore\": &amp;lt;0.00-1.00&amp;gt;},\n \"BookFlight\": {\"confidenceScore\": &amp;lt;0.00-1.00&amp;gt;},\n \"CancelBooking\": {\"confidenceScore\": &amp;lt;0.00-1.00&amp;gt;},\n \"None\": {\"confidenceScore\": &amp;lt;0.00-1.00&amp;gt;}\n },\n \"entities\": [\n {\n \"category\": \"&amp;lt;entity type, e.g. Location or Date&amp;gt;\",\n \"text\": \"&amp;lt;entity span text&amp;gt;\",\n \"offset\": &amp;lt;integer&amp;gt;,\n \"length\": &amp;lt;integer&amp;gt;,\n \"confidenceScore\": &amp;lt;0.00-1.00&amp;gt;\n }\n ]\n }\n}\n\nCONSTRAINTS\n- Output MUST be valid JSON and MUST match the schema.\n- Populate confidence scores for ALL intents; the topIntent should have the highest score.\n- If no entities are found, return an empty array for prediction.entities.\n- Do not output any additional text before or after the JSON."},&lt;EM&gt;”&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;This explicit enumeration guides Claude to stick to the known intents only.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Call the model with user utterances.&lt;/STRONG&gt; Send a chat completion request for each user query. For example, for &lt;EM&gt;“Is it going to rain in Seattle tomorrow?”&lt;/EM&gt;:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"messages": [&lt;/PRE&gt;
&lt;PRE&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "system", "content":&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Intents:\n- GetWeather: user asks about weather.\n- BookFlight: user wants to book a flight.\n- CancelBooking: user wants to cancel a booking.\n- None: input doesn't match above.\nEntities to extract: Location, Date.\nOutput JSON with {\"intent\":..., \"entities\":...} for each user query."&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "user", "content": "Is it going to rain in Seattle tomorrow?"}&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Examine the JSON response.&lt;/STRONG&gt; Claude uses your schema to infer the best intent and fills in entities:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"intent": "GetWeather",&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"entities": [&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"category": "Location",&lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"text": "Seattle"&lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"category": "Date",&lt;BR /&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"text": "tomorrow"&lt;BR /&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;]&lt;BR /&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Note that the example above is the simplified shape your application typically consumes — the prompt in Step 2 actually requires the model to return the full CLU-compatible schema (query, prediction.topIntent, prediction.intents.&amp;lt;each&amp;gt;.confidenceScore, and entities with offset, length, and confidenceScore). Extract just the fields you need on the client side, or relax the Step 2 schema to match the simplified output. With careful prompting, Claude Sonnet 4.6’s classifier can approach the accuracy of your old CLU model – without any fine-tuning – and is easy to maintain because you can tweak the schema in the prompt itself.&lt;/P&gt;
&lt;H3&gt;Approach B: Fine-tune a Custom Intent Classification Model&lt;/H3&gt;
&lt;P&gt;If you have many intents and domain-specific language, training a custom model gives the highest accuracy. Fine-tuning an open model on your utterances yields a tailor-made classifier you control and can be deployed off-cloud if needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Export CLU training data.&lt;/STRONG&gt; Collect the utterances, intent labels, and annotated entities from your CLU project. Ensure each example is labeled correctly. This will be your training dataset.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Fine-tune an open model in Foundry.&lt;/STRONG&gt; Use Foundry’s fine-tuning UI or API to train one of the supported base models — &lt;A href="https://ai.azure.com/catalog/models/Ministral-3B" target="_blank" rel="noopener"&gt;Ministral-3B&lt;/A&gt;, &lt;A href="https://ai.azure.com/catalog/models/qwen3-32b" target="_blank" rel="noopener"&gt;Qwen-32B&lt;/A&gt;, or &lt;A href="https://ai.azure.com/catalog/models/Llama-3.3-70B-Instruct" target="_blank" rel="noopener"&gt;Llama-3.3-70B-Instruct&lt;/A&gt; for open-source; &lt;A href="https://ai.azure.com/catalog/models/gpt-4.1-mini" target="_blank" rel="noopener"&gt;gpt-4.1-mini&lt;/A&gt; or &lt;A href="https://ai.azure.com/catalog/models/gpt-4o-mini" target="_blank" rel="noopener"&gt;gpt-4o-mini&lt;/A&gt; for OpenAI-managed — with your data. During training, the model learns to classify your intents and tag entities as in CLU. For more detailed information on fine-tuning in Foundry, check out &lt;A href="https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/fine-tuning?tabs=oai-sdk&amp;amp;pivots=programming-language-studio" target="_blank" rel="noopener"&gt;Customize a model with fine-tuning - Azure AI Foundry | Microsoft Learn&lt;/A&gt;. After training, you can deploy the fine-tuned model as a new endpoint.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Integrate the fine-tuned model.&lt;/STRONG&gt; Once deployed, use the Foundry REST API to send user utterances and get predictions. The response format can be customized, but you can tailor it to predict the predicted intent and possibly entity spans. For instance, if you send &lt;EM&gt;“I want to cancel my Contoso booking”&lt;/EM&gt;, the model might output &lt;EM&gt;Intent: CancelBooking; Entity: { type: Subscription, value: “Contoso” }&lt;/EM&gt; (assuming it learned those classes).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Evaluate and iterate.&lt;/STRONG&gt; Validate the new model against test data or real queries. You should find it highly accurate on your domain-specific utterances, often matching or exceeding CLU’s performance. Fine-tuning yields a model that doesn’t stray beyond your provided intents, ensuring no unexpected outputs. Additionally, you can re-train periodically with new data to continuously improve it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why fine-tune?&lt;/STRONG&gt; If your chatbot has a large set of intents or very specialized language, a fine-tuned model will be more consistent and precise than prompting a general LLM. It also runs without sending data to third-party APIs (since the model is your custom deployment). This path is ideal for enterprises needing predictable, controllable NLU – you invest once in training and get a dedicated model that aligns exactly with your taxonomy and can be packaged for offline or on-prem use.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for CLU:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/gpt-4o" target="_blank" rel="noopener"&gt;gpt-4o&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A cutting-edge OpenAI model suitable for Approach A if you prefer OpenAI’s LLM. It has superior reasoning skills and might handle extremely complex utterances or mixed-language queries more gracefully. Use it for prompt-based CLU if maximum accuracy is needed out-of-the-box (with corresponding cost).&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/Ministral-3B" target="_blank" rel="noopener"&gt;Ministral-3B&lt;/A&gt;&lt;/STRONG&gt;&lt;STRONG&gt; (fine-tune)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;One of the open-source base models that Foundry supports for fine-tuning, with 3B parameters trained by Mistral AI. Despite the smaller size, Ministral-3B is competitive on instruction-following benchmarks and produces a tailor-made NLU classifier you can host privately. Use it when you want an Apache 2.0–licensed Mistral lineage that is actually supported in the Foundry fine-tuning UI.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://microsoft.ai/news/introducing-mai-thinking-1/" target="_blank" rel="noopener"&gt;MAI-Thinking-1&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Microsoft’s first-party reasoning model (private preview on Foundry). Useful for Approach A zero-shot intent and entity classification, where careful reasoning over ambiguous utterances improves top-intent selection; multi-layered instruction following helps keep outputs on your fixed schema, and function calling lets you route intents to tools. A first-party alternative to gpt-4o or Claude. Requires preview access approval.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;6. Custom Question Answering (CQA) → Foundry with gpt-4o or Cohere Rerank + Retrieval&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI’s Custom Question Answering (CQA) allows you to upload your documents or FAQs and answer natural language queries from that content.&lt;/STRONG&gt; You can use a Retrieval-Augmented Generation (RAG) pattern: search for relevant info in your data using Azure Search (L1), then use another model to rank and generate an answer based only on your knowledge base (L2).&lt;/P&gt;
&lt;H3&gt;Approach A: Use &lt;A href="https://ai.azure.com/catalog/models/Cohere-rerank-v4.0-pro" target="_blank" rel="noopener"&gt;Cohere Rerank v4.0 Pro&lt;/A&gt; as a Deterministic L2 Ranker&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Export your existing CQA knowledge base.&lt;/STRONG&gt; Export your current Custom Question Answering knowledge base. You should retain all existing question answer pairs. No retraining or rewriting of answers is required.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Create an Azure AI Search index.&lt;/STRONG&gt; Create a new Azure AI Search index that will store your existing QA pairs. Each QA pair should be uploaded as a document containing the stored question text and stored answer text. This index will replace the internal retrieval layer previously used by the CQA runtime.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Deploy the Cohere Rerank v4.0 Pro model in Foundry.&lt;/STRONG&gt; In Microsoft Foundry, deploy the Cohere Rerank v4.0 Pro model to your project. This model will replace the confidence scoring functionality that was previously handled by the CQA service and will be used to determine which stored answer best matches the incoming user query.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Update your application’s query endpoint. &lt;/STRONG&gt;Locate the existing service call in your application that sends user questions to the CQA endpoint. Replace this call so that user queries are now sent to your Azure AI Search index instead. A minimal index schema for QA retrieval is: id (key), question (searchable text), answer (retrievable text), and questionVector (vector field populated by an embedding model such as text-embedding-3-small). Configure Azure AI Search to run a hybrid query — combining BM25 keyword match on the question field with vector similarity on questionVector — and return the top 10 candidate QA pairs for each query. The reranker in Step 5 will reduce those candidates to the single best answer.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 5: Pass search results to the Cohere Reranker. &lt;/STRONG&gt;Modify your application workflow so that the QA pairs returned from Azure AI Search are passed to the deployed Cohere Rerank model along with the user’s original query. This replaces the scoring logic that your application previously received from the CQA runtime.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 6: Select the highest ranked QA pair.&lt;/STRONG&gt; Update your application logic to 1) sort the reranked results returned by Cohere and 2) select the highest ranked result. The answer associated with that QA pair can then be returned directly to the user using the same response logic that currently exists in your application.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 7: Return the selected answer to the user.&lt;/STRONG&gt; Update your application response logic so that the answer returned to the user is taken from the top ranked QA pair produced by the reranker. This preserves the same deterministic answer selection behavior that existed in the original CQA workflow while allowing your application to operate using Foundry hosted models.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 8: Optional GPT-Based Fallback. &lt;/STRONG&gt;If needed, you can introduce an optional GPT-based validation or fallback step to evaluate the top reranked results or generate a synthesized response when no candidate answer meets your required relevance threshold. This allows you to gradually introduce retrieval augmented generation without changing the primary deterministic selection path used by your application today.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Approach B: Use GPT-4o as a Generative Answerer (RAG)&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Index your content for retrieval.&lt;/STRONG&gt; Take the knowledge bases (documents, manuals, Q&amp;amp;A pairs, etc.) you were using in CQA and load it into an index, such as Azure Search. Enable keyword vector search so that queries can find relevant passages, not just keyword matches.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Deploy the GPT-4o model.&lt;/STRONG&gt; Use Foundry to deploy &lt;A href="https://ai.azure.com/catalog/models/gpt-4o" target="_blank" rel="noopener"&gt;gpt-4o&lt;/A&gt; as your answer generator. Also consider deploying a smaller embedding model for vector search if needed (e.g., OpenAI Ada or Mistral Embeddings).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Formulate a retrieval-augmented prompt.&lt;/STRONG&gt; When a user asks a question, your application will first retrieve the top matching content from the index (e.g., a paragraph from an FAQ or a snippet from a manual). Then, you send the question along with that content to GPT-4o, instructing it to use &lt;EM&gt;only&lt;/EM&gt; the provided content to answer. For example: &lt;EM&gt;“Content: [insert sample retrieved passage] \n\n Question: [insert sample user question] \n\n If the answer is in the content, answer in one sentence; otherwise say “Answer not found”.”&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Call GPT-4o via REST API.&lt;/STRONG&gt; The user message will include both the Context (retrieved content) and the Question. GPT-4o will then generate an answer &lt;EM&gt;grounded in that context&lt;/EM&gt;. For instance, if a user asks “How do I reset my device?”, you pass in the relevant instruction from your manual as context. The model’s answer would be factual and derived from those instructions.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 5: Validate the answer output.&lt;/STRONG&gt; The model should return an accurate answer, e.g.: &lt;EM&gt;“Press and hold the reset button for 10 seconds until the device restarts.”&lt;/EM&gt; In scenarios where the context doesn’t cover the question, the model can be configured to say “Answer not found.” This ensures no hallucinated answers. You can also instruct the model to output references or citations to the source content if needed.&lt;/P&gt;
&lt;P&gt;By combining search and an LLM, you can effectively build a question answering model on Foundry: the search step finds the answer, and the model produces a user-friendly statement from it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for CQA:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;claude-sonnet-4-6&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Another excellent choice for RAG solutions. Claude’s tremendous context window (hundreds of thousands of tokens) means you can feed large documents directly, without splitting, which is useful if you prefer a single-model approach. Its output is known for being detailed and polite, which some Q&amp;amp;A use-cases prefer.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/Ministral-3B" target="_blank" rel="noopener"&gt;Ministral-3B&lt;/A&gt;&lt;/STRONG&gt;&lt;STRONG&gt; (fine-tune)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A Mistral-family open model supported by Foundry fine-tuning. With 3B parameters it’s lightweight enough to host on your own infrastructure, and after fine-tuning on your QnA dataset it can be specialized to extract answers from text with minimal deviation. Use it when you want a lower-cost, privately-hosted alternative to GPT-4o for narrow Q&amp;amp;A domains.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://microsoft.ai/news/introducing-mai-thinking-1/" target="_blank" rel="noopener"&gt;MAI-Thinking-1&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Microsoft’s first-party reasoning model (private preview on Foundry). A strong RAG answerer when responses require reasoning across multiple retrieved passages; its 256K-token context lets you pass many candidate QA pairs or long source documents at once, and instruction following helps enforce “answer only from the provided context” grounding. Requires preview access approval.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;7. Orchestration Workflow → Foundry Agent (using &lt;A href="https://ai.azure.com/catalog/models/gpt-4o" target="_blank" rel="noopener"&gt;gpt-4o&lt;/A&gt;)&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Azure AI’s orchestration workflow directs incoming requests to the correct Language sub-service (CLU, CQA, etc.) by first determining the user’s goal. &lt;/STRONG&gt;Instead of maintaining a separate orchestrator, you can now rely on a Foundry agent – essentially a single LLM that can interpret user input and decide how to fulfill it using the available tools. GPT-4o, for instance, can analyze an utterance and either answer it or format an action, as needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Define the agent’s tasks.&lt;/STRONG&gt; Identify what distinct actions or knowledge the orchestrator used to handle. For example, perhaps the bot either answered a question (using QnA) or executed a command (like scheduling). We will rely on the LLM to detect which scenario applies and respond appropriately.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Construct an agent prompt.&lt;/STRONG&gt; Write a system message that explains the assistant’s capabilities. For example: &lt;EM&gt;“You are an AI assistant. If the user asks a knowledge question, answer it using the provided content. If the user makes a request to perform an action, output a JSON specifying the action.”&lt;/EM&gt; This effectively gives GPT-4o a simple set of tools (answering vs. acting).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Provide context or tools per query.&lt;/STRONG&gt; In each conversation turn, include any info needed. For instance, if the user query is factual (“What’s our security policy?”), attach a relevant snippet from your data (like in the QnA solution). If the query is an action (“Schedule a meeting with IT”), you might not have extra context, but expect an action-oriented output.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Evaluate the outputs.&lt;/STRONG&gt; GPT-4o will analyze the input and produce either a direct answer or an action, following the instructions. For the fact question, it might output a paragraph answer; for the command, it could output a JSON like {"action": "ScheduleMeeting", "topic": "IT Support", "time": "tomorrow 10am"}. Because GPT-4o is an “agentic” model, it can reason about the user’s intent and choose the correct format on its own. This means the separate orchestration layer is no longer needed – the model’s reasoning replaces it.&lt;/P&gt;
&lt;P&gt;Using an AI agent for orchestration leads to a simpler flow: you only call the LLM (and perhaps a search for facts) rather than maintaining multiple endpoints and routing logic. Foundry’s advanced models like GPT-4o have demonstrated they can handle multi-step tasks and tool usage in a single conversation, making them a powerful replacement for manual orchestration.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Orchestration:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/claude-sonnet-4-6" target="_blank" rel="noopener"&gt;claude-sonnet-4-6&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;A strong alternative for building an AI agent. Claude is known for coherent multi-turn reasoning and decision-making. It can be used if you prefer a non-OpenAI model; it integrates well with external tools and its large context helps when orchestrating complex, multi-step processes.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://microsoft.ai/news/introducing-mai-thinking-1/" target="_blank" rel="noopener"&gt;MAI-Thinking-1&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Microsoft’s first-party reasoning model (private preview on Foundry). Built for agentic, multi-step reasoning and supports function calling, so a single MAI-Thinking-1 agent can interpret intent and route to the right tool or answer — a first-party alternative to gpt-4o for the orchestration role. Requires preview access approval.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;8. Custom Text Classification → Foundry (fine-tuned Ministral-3B)&lt;/H2&gt;
&lt;P&gt;Custom Text Classification allowed you to train a model to classify text into your custom categories (e.g., labeling customer reviews as Complaint, Suggestion, or Praise). In Foundry, you will achieve the same by &lt;STRONG&gt;fine-tuning a foundation model with your data&lt;/STRONG&gt;. The ideal base model is &lt;A href="https://ai.azure.com/catalog/models/Ministral-3B" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Ministral-3B&lt;/STRONG&gt;&lt;/A&gt;, since it is one of the open-source base models Foundry currently supports for fine-tuning, and it is an &lt;STRONG&gt;Apache 2.0 licensed&lt;/STRONG&gt; model with top-tier accuracy in language tasks. Fine-tuning this model with your training data will produce a robust classifier that matches or exceeds your previous model’s performance, with the added benefit that you can fully control and deploy it as needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Prepare training data.&lt;/STRONG&gt; Export your labeled text samples and their categories from the Language Studio (or however you stored them). Ensure you have a representative dataset for each label. Clean the data if needed (e.g., remove duplicates, ensure consistent labeling).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Fine-tune Ministral-3B in Foundry.&lt;/STRONG&gt; In the Foundry UI or using the Python SDK, start a fine-tuning job on Ministral-3B with your dataset. Foundry will handle the training process; you simply provide the data and some configuration (like number of epochs). Once training is completed, you’ll have a new model version specialized to predict your categories.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Deploy and test the model.&lt;/STRONG&gt; Deploy the fine-tuned model to get an endpoint. Then send test texts to its REST API. For example, if you send the text &lt;EM&gt;“The product arrived broken and I’m very unhappy”&lt;/EM&gt;, the model might return:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;PRE&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"category": "Complaint",&lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;"confidenceScore": 0.98&lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/PRE&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This indicates the text was classified as a “Complaint” with high confidence. The output format can mimic the original service’s JSON structure (as shown above), so integration changes are minimal.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 4: Integrate and iterate.&lt;/STRONG&gt; Update your application to call the new Foundry model endpoint for classification requests. Because the model was trained on your exact labels and examples, it will produce one of your predefined classes every time (no extraneous outputs), and often with greater accuracy than before. Over time, you can retrain with more data or adjust the model if needed, giving you full control over the classifier’s evolution.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other recommended models for Custom Classification:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rationale (When to Consider)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/gpt-oss-20b" target="_blank" rel="noopener"&gt;gpt-oss-20b (fine-tune)&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;An OpenAI-published open-weight model (20B) supported by Foundry fine-tuning. Train it on your labels for a classifier that runs on your own infrastructure or as a Foundry deployment. Use this when you want OpenAI-style quality at a fraction of GPT-4o’s inference cost and need an Apache 2.0–licensed open-weight option.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://ai.azure.com/catalog/models/Llama-3.3-70B-Instruct" target="_blank" rel="noopener"&gt;Llama-3.3-70B-Instruct (fine-tune)&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;The current open-weight Meta Llama model supported by Foundry fine-tuning. At 70B parameters it’s the largest of the open-source fine-tune options and gives you the highest ceiling for accuracy on complex or multilingual classification taxonomies. Choose this when you need maximum quality from an open model and have the budget to host a 70B deployment.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By following these guides,&lt;STRONG&gt; you’ll smoothly transition each Azure Language feature to a&lt;/STRONG&gt; &lt;STRONG&gt;Microsoft Foundry solution.&lt;/STRONG&gt; No direct replacements exist, but by leveraging &lt;STRONG&gt;foundation models + prompt engineering&lt;/STRONG&gt;, you can achieve equal or better functionality. This approach consolidates tasks onto a &lt;STRONG&gt;unified, powerful platform&lt;/STRONG&gt; with models that are continuously improving. The result is continuity for your applications and an upgrade in capabilities – from static APIs to the flexible world of &lt;STRONG&gt;Foundry’s LLMs&lt;/STRONG&gt;. Each guide above equips you to implement the change with minimal friction, ensuring your users enjoy the benefits of modern AI with the &lt;STRONG&gt;structurally similar JSON outputs&lt;/STRONG&gt; they expect.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jun 2026 18:06:43 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/transitioning-from-azure-language-features-to-foundry-models/ba-p/4524092</guid>
      <dc:creator>renaliu</dc:creator>
      <dc:date>2026-06-11T18:06:43Z</dc:date>
    </item>
    <item>
      <title>Date extraction regression: 2025-05-01-preview vs 2025-11-01 (GA) in Azure Content Understanding</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/date-extraction-regression-2025-05-01-preview-vs-2025-11-01-ga/m-p/4527106#M1471</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Issue:&lt;/STRONG&gt; When using the documentFieldExtraction scenario in Azure Content Understanding, the GA version (2025-11-01) produces significantly worse results compared to the preview version (2025-05-01-preview) for date field extraction on scanned (Dutch) medical documents.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Observed behavior:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;With 2025-05-01-preview: all date fields are extracted correctly, including dates that are split across three separate handwritten fields (day, month, year).&lt;/LI&gt;&lt;LI&gt;With 2025-11-01 (GA): multiple date fields are either not found, returned as null, or extracted with day and year swapped (e.g. 2027-12-01 instead of 2001-12-27).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Document characteristics:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Scanned PDF (not native digital)&lt;/LI&gt;&lt;LI&gt;Dutch language&lt;/LI&gt;&lt;LI&gt;4-16 pages&lt;/LI&gt;&lt;LI&gt;Dates are handwritten and split across three separate labeled fields: dag (day), maand (month), jaar (year)&lt;/LI&gt;&lt;LI&gt;Year is written as 2 digits (e.g. "26" for 2026, "01" for 2001)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Schema used:&lt;/STRONG&gt; documentFieldExtraction with type: date fields and explicit descriptions instructing the model to read day → month → year in order and expand 2-digit years to 4 digits.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Expected behavior:&lt;/STRONG&gt; The GA version should perform at least on par with the preview version when using the exact same prompts. Is this a known regression? Any recommended workarounds while waiting for a fix?&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2026 09:28:58 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/date-extraction-regression-2025-05-01-preview-vs-2025-11-01-ga/m-p/4527106#M1471</guid>
      <dc:creator>SGRoborana</dc:creator>
      <dc:date>2026-06-10T09:28:58Z</dc:date>
    </item>
    <item>
      <title>Unable to Connect Localhost MCP Server from Azure AI Foundry Hosted Agent (o4-mini)</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/unable-to-connect-localhost-mcp-server-from-azure-ai-foundry/m-p/4526974#M1470</link>
      <description>&lt;P&gt;I'm using the Azure AI Foundry Toolkit in VS Code and have configured an MCP server running on my local machine (localhost).&lt;/P&gt;&lt;P&gt;When I run my Azure AI Foundry-hosted agent (o4-mini), it fails to connect to the MCP server. Based on the error logs, it appears that the hosted agent cannot reach the localhost endpoint.&lt;/P&gt;&lt;P&gt;My understanding is that the MCP server is running correctly locally, but the hosted agent seems unable to access services running on my machine.&lt;/P&gt;&lt;P&gt;Has anyone successfully connected a locally hosted MCP server to an Azure AI Foundry-hosted agent while using the Foundry Toolkit in VS Code?&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2026 06:04:47 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/unable-to-connect-localhost-mcp-server-from-azure-ai-foundry/m-p/4526974#M1470</guid>
      <dc:creator>aayush7g</dc:creator>
      <dc:date>2026-06-10T06:04:47Z</dc:date>
    </item>
    <item>
      <title>Online Meeting Effectiveness Index (OMEI)</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/online-meeting-effectiveness-index-omei/ba-p/4526638</link>
      <description>&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-17" border="1" style="width: 99.8148%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Samidha Shikhare*, Tim Malueg*, Shreya Goel*, and Charlie Goldenberg&lt;/STRONG&gt;*&amp;nbsp;&lt;BR /&gt;Department of Business Analytics, Santa Clara University - Leavey School of Business,&amp;nbsp;&lt;BR /&gt;Santa Clara, California 95053, United States&amp;nbsp;&lt;BR /&gt;&lt;EM&gt;*These authors contributed equally.&amp;nbsp;&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;{sshikhare, tmalueg, sgoel, cgoldenberg}@scu.edu&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;This research was conducted as part of a Microsoft-sponsored Practicum Project at Santa Clara University's Leavey School of Business, in direct partnership with Microsoft's Commercial Engineering and AI Business division. The project was led by &lt;STRONG&gt;Juhi Singh, Bonnie Ao, and Suneel Innamuri,&lt;/STRONG&gt; with a focus on delivering enterprise-grade, computationally informed insights to accelerate commercial AI adoption and drive scalable business transformation across Microsoft's engineering and go-to-market strategy.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;INTRODUCTION&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;What if your meetings could tell you how to make them better? We’ve all been there: another hour-long meeting that could have been an email. Research shows 17% of meetings are perceived as ineffective — and that’s a conservative estimate. But what if AI could analyze your meetings and give you specific, actionable feedback on how to improve them? That’s exactly what we built. The Online Meeting Effectiveness Index (OMEI) uses Azure AI to score your virtual meetings across five dimensions — Participation, Engagement, Structure, Sentiment, and Tech Quality — and tells you precisely what to fix. Here’s how it works and what we learned from analyzing 68 real meetings.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;1. Introduction&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;This paper presents the methodologies and evaluation parameters we developed to define a meeting effectiveness score using large language models. Meetings are central to organizational productivity yet remain poorly measured and frequently ineffective — Cutler et al. found that 17% of meetings were perceived as ineffective[3], a significant cost in wasted employee time, while Scott et al. identified that support for meeting intentionality is surprisingly scant in existing workflow technologies[6]. We designed this project to address that gap by developing an automated, LLM-driven framework capable of evaluating meeting quality across various real-world formats. Data was drawn from 68 publicly available recorded meetings on YouTube, categorized by size into small (3 to 5 participants), medium (6 to 10), and large (11 to 20), excluding one-on-one meetings where effectiveness can be gauged directly by whether the purpose was resolved. The resulting framework consists of five umbrella metrics — Participation, Engagement, Structure, Sentiment, and Tech Quality — each containing three sub-metrics, producing fifteen measurable indicators of meeting quality and providing a structured lens through which meeting recordings can be assessed against research-validated indicators of effectiveness, inclusiveness, and participation equity.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;TL;DR:&lt;/STRONG&gt; We built an AI system that grades your meetings on 5 dimensions. After testing 68 real meetings, we found most score well on tech quality (92/100) but struggle with engagement (77/100). The fix? Clearer agendas, intentional interaction points, and explicit action items at the end.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Key Findings at a Glance: &lt;/STRONG&gt;68 meetings analyzed across corporate, government, and academic settings | Average effectiveness score: 86/100 | Biggest opportunity: Engagement (77.3) | Strongest area: Tech Quality (92.3) | Speaking for just 10% of a meeting increases perceived inclusiveness 4x (Hosseinkashi et al., 2024)&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;2. How We Measure Meeting Effectiveness&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Dimension 1: Emotional Tone &amp;amp; Psychological Safety&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Workplace collaboration is shaped by the emotional tone set during meetings. Cutler et al. found that psychological safety — people's readiness to share their thoughts — is the strongest predictor of meeting success (OR=3.3), outweighing agenda, video, and meeting size, with inclusive meetings twice as likely to be viewed positively (OR=2.1) [3]. This dynamic intensifies in remote settings, where Cao et al. found that the absence of nonverbal signals increases multitasking and distraction, making verbal tone the primary determinant of emotional climate[1]. Hosseinkashi et al. found that communication failures reduce meeting success likelihood by 90% (OR=0.1) [4], Scott et al. identified social conflict as a major productivity obstacle[6], and Chen et al. demonstrated that AI-assisted reflection can drive measurable behavior change[2]. Based on this evidence, OMEI scores sentiment tone on a 0-100 scale: 80-100 reflects a psychologically safe environment; 50-79 moderate tone with friction; 20-49 neutral tone; and 0-19 negative or dismissive tone.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Dimension 2: Active Engagement &amp;amp; Agenda Adherence&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Active participation is the strongest predictor of meeting inclusiveness and effectiveness in virtual settings. Hosseinkashi et al. after analyzing 15,000 real meetings across five multinationals, found that speaking for at least 10% of the meeting increased the likelihood of an 2 inclusive meeting fourfold (OR=4.0), with small groups of eight or fewer producing seven times higher odds of vocal engagement (OR=7.1), and effectiveness dropping by one percentage point for every two additional participants[4]. Scott et al. found that mixed-role groups and unclear objectives consistently produced off-topic discussion even when agendas were present[6], while Chen et al. observed that 7 of 12 real workplace meetings drifted from their objectives mid-discussion despite having reference materials available[2]. Based on this evidence, OMEI scores engagement across responsiveness and agenda adherence on a 0-100 scale: 80-100 reflects broad vocal participation consistent with Hosseinkashi et al.'s OR=4.0 threshold; 50-79 moderate but uneven participation; 20-49 limited participation with frequent off-topic discussion; and 0-19 one-sided communication with no agenda adherence.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Dimension 3: Who Spoke, How Much, and How Equitably&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Participation is the strongest predictor of meeting effectiveness and inclusiveness. Hosseinkashi et al. found that vocal involvement correlates with a fourfold increase in perceived inclusiveness, yet group size consistently works against this — attendees in groups under eight are seven times more likely to contribute, and effectiveness drops by roughly one percentage point for every two people added[4]. Cutler et al. found that 17% of meetings are perceived as ineffective, and that 20% of respondents felt excluded even when 83% had spoken at least twice, suggesting that how voice and time are distributed matters more than whether someone spoke at all[3]. Chen et al. adds that silent attendees are a proxy for lost psychological safety, as junior staff often stay quiet due to social hierarchies[2]. To capture this, our participation metric is built on three pillars: unique speakers, which tracks whether contribution is truly distributed; speaking time balance, operationalized through the Gini coefficient (a measure of how evenly speaking time is distributed — 0 means perfectly equal, 1 means one person dominated) and dominant speaker share; and turn-taking, which distinguishes real dialogue from disjointed monologues and reflects Cao et al.’s finding that multitasking spikes in roughly 30% of remote meetings when discussion is dominated by a few voices[1]. Together these sub-metrics create a composite score that addresses the lack of intentionality in current workflow tools[6].&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Dimension 4: Meeting Structure &amp;amp; Outcomes&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Structure is a critical determinant of meeting effectiveness. Scott et al. found that unclear goals are among the most significant obstacles to productivity, noting that current workflow tools provide surprisingly little support for meeting intentionality[6]. Their research draws a direct line between structure and outcomes — if goals set the destination, agendas outline the route, functioning as implementation intentions that consistently increase the likelihood of goal achievement. Yet Chen et al. found that even when agendas existed, fewer than half of observed meetings articulated specific goals, none allocated time per agenda item, and time management behaviors typically only emerged when participants realized they were running out of time — 3 resulting in overly long discussions in 6 of 12 meetings and under-addressed topics in 4 of 12[2]. Our structure metric evaluates three sub-dimensions that together capture whether a meeting was intentionally organized, purposefully conducted, and productively concluded. Agenda adherence assesses not merely whether an agenda was present, but whether the meeting followed it, stayed within scope, and gave appropriate time to stated items. Topic threads measure whether discussion followed a coherent arc, with topics opened and closed rather than abandoned — Chen et al. found demonstrable off-target discussion in 3 of 12 meetings and core issues left unaddressed in 4 of 12[2]. Action items assess whether the meeting translated discussion into concrete commitments with named owners, reflecting Cutler et al.'s finding that post-meeting summaries correlate directly with meeting quality and effectiveness[3].&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Dimension 5: Technical Quality as a Prerequisite&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Technical quality is a foundational prerequisite for participation in remote meetings, not a peripheral IT concern. Hosseinkashi et al. found after analyzing 15,000 meetings that reliability failures reduced participation odds by 87% (OR=0.13) and poor audio and video quality reduced meeting success likelihood by 90% (OR=0.1), drawing a useful distinction between reliability — staying connected — and quality, which determines whether a meeting is inclusive and effective[4]. Cao et al. found that camera-off behavior strongly predicts multitasking and disengagement[1], Scott et al. identified technology fragmentation as a key reason meetings lose intentionality[6], and Chen et al. found that technical failures directly disrupt goal tracking[2]. Cutler et al. established that audio and video reliability each independently increase the likelihood of an inclusive meeting (OR=1.7) — comparable in effect to using a formal agenda — with headset use alone increasing participation odds by 20% (OR=1.2) [3]. Based on this, OMEI scores technical quality on a 0-100 scale: 80-100 reflects no detected issues enabling full participation; 50-79 minor degradation without reliability failures; 20-49 significant quality barriers limiting engagement; and 0-19 reliability failures preventing meaningful participation.&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;3. The Technical Pipeline: From Video to Score&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;The project’s architecture follows a four-layer process designed to transform raw video recordings into structured meeting effectiveness insights. The pipeline consists of a data ingestion layer, perception layer, intelligence layer, and output layer. Together, these layers convert videos into transcripts, metadata, metric level LLM evaluations, and a final meeting effectiveness score.&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;3.1 High Level Architecture&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;The data ingestion layer collects meeting recordings from publicly available video sources. A YouTube URL is passed into a video download pipeline using yt-dlp and FFmpeg. The processed video file is then uploaded to Azure Blob Storage, where it becomes available for downstream analysis. &amp;nbsp;The perception layer extracts meeting information from the stored video. Azure Video Indexer is used to generate diarization, transcription, sentiment signals, and meeting metadata. These outputs provide the foundation for evaluating who spoke, what was discussed, and what interaction patterns appeared during the meeting. &amp;nbsp;The intelligence layer combines preprocessed transcript data with the Azure Video Indexer JSON output. These structured inputs are sent to Microsoft Foundry for LLM based analysis. For each OMEI metric, the system uses a customized system prompt and requests structured output, allowing the model to evaluate each of the 5 metrics in a replicable and consistent format. The output layer presents the final results through a lightweight web demo formatting the returned structured output. The interface displays the core OMEI metrics designed to help users understand not only the numerical score rating the system has provided, but also which dimensions contributed to the score as well as positive comments and potential improvements.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;3.2 LLM Methodology&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;The evaluation pipeline orchestrated with Microsoft Foundry combines a guided system prompt with an appended diarization user prompt for LLM assessment. System prompts were iteratively refined using findings from research, with each prompt aligned to its corresponding metric. GPT-5.4-mini was selected as the final model for its ability to process full transcripts within its context window while providing reliable scoring and output. Since none of the models tested were capable of complex time-based calculations from diarization data, selected quantitative fields — including total speaking time per participant, balance ratios, and silences — were preprocessed before LLM evaluation to improve consistency and reduce model error.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;4. What We Learned from 68 Meetings&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;4.1 Quality Assurance Approach&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;To ensure the integrity and objectivity of our model's outputs, we adopted a structured quality assurance approach grounded in independent human review. Seven student analysts were recruited for their unbiased perspective, having had no prior exposure to the model's outputs or scoring criteria. Analysts reviewed a standardized dataset of 14 unique corporate and institutional meeting recordings spanning government, academic, and professional contexts, evaluating only the first 30 minutes of each recording to ensure consistency. Evaluations were submitted through a structured Google Form aligned to the five core metrics, normalized to a composite scale of 100 points with Participation, Engagement, Structure, and Sentiment each contributing 22.5 points and Tech Quality contributing 10 points. The resulting dataset comprised 20 reviews yielding a global composite average of 66.84 out of 100, with Engagement the strongest metric at 18.36 points and Tech Quality the lowest at 5.5. Video-level scores ranged from 79.26 to 44.25 — a 35-point spread confirming the rubric differentiates meaningfully between strong and weak meeting performance, as illustrated in Figure 2. This human-generated dataset serves as the ground truth against which our model's outputs are measured, validated, and iteratively refined.&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;To evaluate the impact of prompt calibration, we compared original LLM scores, revised LLM scores, and human QA scores across all 16 videos, shown in Figure 3. Prior to calibration, the original LLM averaged 87.1 — approximately 20 points above the human QA average of 66.84 — confirming that generous default logic was systematically inflating scores regardless of actual meeting quality. Prompt revisions were applied across four metrics, replacing default scoring logic with explicit requirements for affirmative evidence and hard scoring caps tied to measurable signals such as dominant speaker percentage, silence ratio, and agenda adherence. Following revision, the revised LLM average dropped to 72.5, closing the gap with human reviewers from roughly 20 points to approximately 6, with the revised scores showing greater variance across videos — confirming that the model was now differentiating between stronger and weaker meetings in a way that more closely mirrors human judgment.&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;4.2 Overall OMEI Results&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Across 68 tested meetings, the average OMEI score was 86.0 out of 100 based on 340 metric scores. Overall performance was strong, with the highest averages in Tech Quality at 92.3, Sentiment at 90.3, and Structure at 86.6, as shown in Figure 4. Engagement was the lowest at 77.3, indicating the greatest area for improvement. Scores ranged from 18 to 100, with the widest variation appearing in Sentiment, suggesting that while most meetings were technically stable, the LLM was more sensitive to differences in tone than in other dimensions. By meeting type, Review Sessions scored highest at approximately 90.4, reflecting consistently strong performance across all five dimensions, while Presentations scored lowest at approximately 85.6, with Engagement at 72 and Structure at 83 likely reflecting the more one-directional nature of that format. Meeting size was indirectly captured through the Participation metric using speaker occurrences identified by Azure Video Indexer as the primary proxy for contribution patterns.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;4.3 LLM Model Performance&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;The OMEI pipeline tested GPT-5.4-mini, GPT-5.4-nano, DeepSeek-R1, GPT-5.4, and Phi-4 at the individual metric level, with all models able to process the full Azure Video Indexer JSON within their context windows. GPT-5.4-mini was selected as the final model for its best balance of scoring quality, consistency, and actionable recommendations — larger models tended to over-interpret the task while smaller models produced weaker rationales. Models performed well on qualitative assessments including tone, structure, and engagement, and generally avoided hallucinating evidence. The primary limitation was numerical reliability, as models were weaker at tracking counts, proportions, and speaker activity without preprocessing — particularly for the Participation metric, where speaker occurrences were preprocessed before scoring to improve consistency. Feedback across all models tended toward generic recommendations, likely reflecting the constraints of the rule-based evaluation framework.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;4.4 Recommendations for Meeting Organizers&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Meeting organizers can use OMEI results as a practical guide for improving meeting design and facilitation. The strongest opportunities come from clearer agendas, defined objectives, stronger facilitation, and more explicit closing practices — meetings should begin with a stated purpose and end with summarized decisions, action items, and named owners. In presentation-heavy or larger meetings, organizers should intentionally build in interaction points such as questions, discussion pauses, or participant check-ins. OMEI scores should be interpreted as directional indicators rather than definitive judgments, helping teams identify recurring patterns and improve meeting habits over time.&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;4.5 Recommendations for Software Integration&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;The current OMEI workflow is a proof of concept pipeline for native Microsoft Azure integration, deliberately designed to be lightweight and straightforward to extend. Although Azure Video Indexer was used in this prototype, Azure Speech may be better suited for production use since scoring relies primarily on transcripts, speaker activity, and timing data rather than visual analysis. Preprocessing numerical signals such as speaker counts, timing, and participation patterns before LLM scoring remains necessary to improve consistency and reduce model error, and future integration should preserve the framework's simplicity while improving transcript processing, metadata extraction, and preprocessing reliability.&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;4.6 Future Work&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Future work should expand the dataset across more organizations, sectors, and meeting types, with richer metadata including participant counts, roles, meeting purpose, and duration. Human evaluator ratings should be used to validate OMEI scores against expert judgment, while further development should focus on improving preprocessing, reducing generic feedback, and deepening analysis of turn-taking, interruptions, silence, and action item quality. Longitudinal dashboards would allow teams to track meeting quality trends over time and evaluate whether changes in meeting practices lead to measurable improvement.&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;5. Get Started with OMEI&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;Want to evaluate your own meetings? Here’s how to get started:&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;U&gt;Build your own pipeline&lt;/U&gt;: Azure Video Indexer documentation | Microsoft Foundry quickstart | Sample prompts from this research&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;&lt;U&gt;Connect with the team&lt;/U&gt;: Questions or feedback? Reach out via the Microsoft Tech Community discussion thread linked above.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;U&gt;Learn more&lt;/U&gt;: Meeting effectiveness research from Microsoft | Azure Speech Services for production implementations&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;6. Conclusion&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;This project demonstrates that large language models can transform unstructured virtual meeting data into meaningful evaluations of meeting effectiveness. Through the Online Meeting Effectiveness Index, we developed a structured framework that scores meetings across five core dimensions — Participation, Engagement, Structure, Sentiment, and Tech Quality — by combining Azure Video Indexer outputs, deterministic preprocessing, and LLM-based qualitative reasoning to generate interpretable scores and actionable recommendations. Our findings show that while most virtual meetings perform well in technical quality and sentiment, engagement and structure remain consistent areas for improvement, particularly in presentation-heavy or larger formats. OMEI serves as a proof of concept for integrating reflective meeting intelligence into platforms such as Microsoft Teams, providing organizers with targeted feedback on agendas, participation balance, discussion flow, and follow-up practices.&amp;nbsp;&lt;BR /&gt;Future work should expand the dataset, validate scores against expert human judgment, and improve analysis of turn-taking, interruptions, action items, and long-term meeting trends. With continued refinement, OMEI has the potential to help organizations move beyond simple transcription toward measurable, AI-supported improvements in collaboration, productivity, and decision-making.&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;BR /&gt;&lt;STRONG&gt;References&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;[1] Cao, H., Lee, C.-J., Iqbal, S., Czerwinski, M., Wong, P. N. Y., Rintel, S., Hecht, B., Teevan, J., &amp;amp; Yang, L. (2021). Large scale analysis of multitasking behavior during remote meetings. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Article 448. https://doi.org/10.1145/3411764.3445243&amp;nbsp;&lt;BR /&gt;[2] Chen, X., Tankelevitch, L., Vanukuru, R., Scott, A. E., Panda, P., &amp;amp; Rintel, S. (2025). Are we on track? AI-assisted active and passive goal reflection during meetings. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Article 705. https://doi.org/10.1145/3706598.3714052&amp;nbsp;&lt;BR /&gt;[3] Cutler, R., Hosseinkashi, Y., Pool, J., Filipi, S., Aichner, R., Tu, Y., &amp;amp; Gehrke, J. (2021). Meeting effectiveness and inclusiveness in remote collaboration. Proceedings of the ACM on Human-Computer Interaction, 5, (CSCW1), Article 173. https://doi.org/10.1145/3449247&amp;nbsp;&lt;BR /&gt;[4] Hosseinkashi, Y., Tankelevitch, L., Pool, J., Cutler, R., &amp;amp; Madan, C. (2024). Meeting effectiveness and inclusiveness: Large-scale measurement, identification of key features, and prediction in real-world remote meetings. Proceedings of the ACM on Human-Computer Interaction, 8, (CSCW1), Article 93. https://doi.org/10.1145/3637370&amp;nbsp;&lt;BR /&gt;[5] Park, G. W., Panda, P., Tankelevitch, L., &amp;amp; Rintel, S. (2024). The CoExplorer technology probe: A generative AI-powered adaptive interface to support intentionality in planning and 12 running video meetings. Proceedings of the 2024 ACM Designing Interactive Systems Conference, (pp. 1638–1657). https://doi.org/10.1145/3643834.3661507&amp;nbsp;&lt;BR /&gt;[6] Scott, A. E., Tankelevitch, L., &amp;amp; Rintel, S. (2024). Mental models of meeting goals: Supporting intentionality in meeting technologies. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Article 1025. https://doi.org/10.1145/3613904.3642670&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jun 2026 17:26:07 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/online-meeting-effectiveness-index-omei/ba-p/4526638</guid>
      <dc:creator>juhisingh</dc:creator>
      <dc:date>2026-06-09T17:26:07Z</dc:date>
    </item>
    <item>
      <title>Driving Stop-and-Go Business Processes to Closure with Foundry Hosted Agents</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/driving-stop-and-go-business-processes-to-closure-with-foundry/ba-p/4524135</link>
      <description>&lt;H2 data-olk-copy-source="MessageBody"&gt;Why This Matters&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Your AI agent just forgot everything.&lt;/STRONG&gt;&amp;nbsp;Again.&lt;/P&gt;
&lt;P&gt;If you've built agents that need to track work across days or weeks—not just single conversations—you've hit this wall: the agent loop is stateless. Each turn reasons over what it's handed, then forgets. But real business workflows don't fit in one conversation.&lt;/P&gt;
&lt;P&gt;This post shows how to build agents that remember, using Foundry's per-project sandbox VMs to maintain state across weeks of discontinuous work. No external database required.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What you'll learn:&lt;/STRONG&gt; - How to persist agent state in Foundry microVMs - Session handling patterns for long-running workflows - When to choose hosted vs. local agent architecture&lt;/P&gt;
&lt;P&gt;&lt;STRONG data-olk-copy-source="MessageBody"&gt;Quick glossary for this post:&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;MAF (Microsoft Agent Framework)&lt;/STRONG&gt;: The framework for building AI agents that can use tools and orchestrate sub-agents&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry&lt;/STRONG&gt;: Microsoft's platform for hosting and running AI agents in production&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;MicroVM&lt;/STRONG&gt;: A lightweight, isolated virtual machine that Foundry provisions per project—think of it as your agent's persistent workspace&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;MCP (Model Context Protocol)&lt;/STRONG&gt;: A standard for connecting AI models to external tools and data sources&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;WorkIQ&lt;/STRONG&gt;: Our custom component that exposes M365 capabilities as MCP endpoints&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;The business workflow and the core challenge&lt;/H2&gt;
&lt;P&gt;The reference workflow here is Statement of Work (&lt;STRONG&gt;SOW) response orchestration&lt;/STRONG&gt;. A SOW Owner kicks off an RFP response and the agent runs the whole engagement across Microsoft 365: it finds the RFP, drafts a project charter, extracts owners and tasks from the kickoff meeting, allocates the work, fans out kickoff briefs (after the owner approves), then polls email and chat to track each task to closure and drafts nudges for stragglers. The catch is that this isn't a single conversation — it plays out over&amp;nbsp;&lt;STRONG&gt;days or weeks&lt;/STRONG&gt;, with people contributing at their own pace and the agent waking up intermittently to advance the work. That is the core challenge:&amp;nbsp;&lt;STRONG&gt;an agent loop is stateless&lt;/STRONG&gt; — each turn reasons over what it's handed and then forgets — yet the process demands continuity across long, discontinuous gaps. Something has to remember every outstanding task, every submission, and every next step, and reconstruct that picture flawlessly each time the loop wakes.&lt;/P&gt;
&lt;H2&gt;The solution approach&lt;/H2&gt;
&lt;P&gt;Rather than push that burden onto the client,&amp;nbsp;&lt;STRONG&gt;we let the Foundry hosted agent carry its own state in a per-project sandbox VM (microVM).&lt;/STRONG&gt;&amp;nbsp;The high-level idea:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;State lives in the sandbox, not the client.&lt;/STRONG&gt;&amp;nbsp;Each project gets its own Foundry microVM with a durable&amp;nbsp;$HOME. The agent writes everything it needs there — the charter, a structured project log of tasks and status, channel cursors, an activity trail.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Every wake-up reconstructs the world.&lt;/STRONG&gt;&amp;nbsp;The loop doesn't rely on a dangling in-memory pointer. On each turn it&amp;nbsp;&lt;STRONG&gt;reads its own state files back into context&lt;/STRONG&gt;, rebuilds the full status of the engagement, advances exactly the phase that's due, and&amp;nbsp;&lt;STRONG&gt;writes the updated state back&lt;/STRONG&gt;. Whether the human returns in five minutes or five days, the agent recreates an authoritative picture from disk.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The client stays thin.&lt;/STRONG&gt;&amp;nbsp;Because the substance is server-side, the desktop app only needs to remember&amp;nbsp;&lt;EM&gt;which sandbox belongs to which project&lt;/EM&gt;&amp;nbsp;— a session id — and nothing sensitive ever lands on the endpoint.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This turns a stateless loop into a genuinely stateful, long-running workflow without an external database and without parking business data on the local machine.&lt;/P&gt;
&lt;H2&gt;Architecture and design&lt;/H2&gt;
&lt;img /&gt;
&lt;P&gt;The system is&amp;nbsp;&lt;STRONG&gt;two components&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;1) The&amp;nbsp;&lt;STRONG&gt;Foundry Hosted Agent&lt;/STRONG&gt; is the brain: built using the &lt;STRONG&gt;Microsoft Agent Framework (MAF)&lt;/STRONG&gt;, reasons with&amp;nbsp;&lt;STRONG&gt;OpenAI models on Foundry&lt;/STRONG&gt;, reaches Microsoft 365 through a single &lt;STRONG&gt;Foundry Toolbox&lt;/STRONG&gt;&amp;nbsp;fronting the&amp;nbsp;&lt;STRONG&gt;WorkIQ MCP endpoints&lt;/STRONG&gt;, and owns the agent loop, skills, memory, session, state, and orchestration — calling M365&amp;nbsp;&lt;EM&gt;as the signed-in user&lt;/EM&gt;&amp;nbsp;via&amp;nbsp;&lt;STRONG&gt;Foundry OAuth Identity Passthrough&lt;/STRONG&gt;, so it never holds a user token.&lt;/P&gt;
&lt;P&gt;The workflow is built as an &lt;STRONG&gt;orchestrator skill (sow-response) with specialist sub-skills&lt;/STRONG&gt;&amp;nbsp;(RFP search, charter draft, kickoff extract, task allocate, reply poll); the orchestrator inspects state, decides the current phase, and delegates to the right sub-agent.&lt;/P&gt;
&lt;P&gt;Every skill is a &lt;STRONG&gt;warm MAF&amp;nbsp;&lt;SPAN class="lia-text-color-21"&gt;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/Agent"&gt;Agent&lt;/A&gt;&lt;/SPAN&gt;&amp;nbsp;built once at boot, all sharing one&amp;nbsp;FoundryChatClient&lt;/STRONG&gt;&amp;nbsp;— so sub-agent invocations reason through the same model client. The&amp;nbsp;&lt;STRONG&gt;skill-based design is what makes it extensible&lt;/STRONG&gt;: a skill is just a folder with a&amp;nbsp;SKILL.md; drop a new one in and the loader builds it an agent and starts routing to it by its description — a brand-new business process, no framework changes.&lt;/P&gt;
&lt;P&gt;The agent runs &lt;STRONG&gt;locally during development and deployed as a Foundry hosted agent in production&lt;/STRONG&gt;, identical code either way.&lt;/P&gt;
&lt;P&gt;2) The second component, the&amp;nbsp;&lt;STRONG&gt;Desktop App&lt;/STRONG&gt;, is a deliberately thin shim: it signs the user in, attaches their bearer to the call, remembers a session id per project, polls in the background so the workflow advances unattended, and renders the result — holding session pointers and a non-sensitive UI cache, nothing more. The two talk over the&amp;nbsp;&lt;STRONG&gt;OpenAI Responses API&lt;/STRONG&gt;. MAF supplies the loop itself — the per-turn reasoning, tool dispatch over MCP, and sub-agent orchestration that drives every phase to completion.&lt;/P&gt;
&lt;H2&gt;Session and state handling in code&lt;/H2&gt;
&lt;P&gt;On a project's first turn, the Desktop client asks Foundry to&amp;nbsp;&lt;STRONG&gt;mint a session&lt;/STRONG&gt;. The returned&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#361%2C44"&gt;agent session id&lt;/A&gt;&amp;nbsp;&lt;EM&gt;is&lt;/EM&gt; the handle to the distinct microVM provisioned for that project, and the client persists it immediately:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;def _ensure_session_sdk(self, project_dict, isolation_key) -&amp;gt; str:
    # project_dict: this project's persisted record (label, ids, etc.).
    # isolation_key: a STABLE client-owned key — we pass the project_id — that
    #   tells Foundry "all turns with this key belong to the same session/microVM".

    # If we already minted a session for this project on a previous run, reuse it.
    # session_id is the handle to the project's durable sandbox VM.
    sid = project_dict.get("session_id")
    if sid:
        return sid                                   # sandbox already exists — reattach later

    # First turn for this project: ask Foundry to provision a new session.
    # create_session spins up a dedicated microVM whose $HOME will hold ALL
    # of this project's state. The platform MINTS the id; we don't choose it.
    session = self._get_project_client().beta.agents.create_session(
        agent_name=agent_name,                       # which deployed agent to bind to
        isolation_key=isolation_key,                 # = project_id, pins one session per project
    )
    sid = session.agent_session_id                   # the server-minted sandbox handle

    # Persist the id BEFORE the first request goes out, so a crash mid-turn
    # never strands the project without a pointer to its sandbox.
    project_dict["session_id"] = sid
    _save_projects(self._projects_data)              # write the local pointer file
    return sid&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From then on,&amp;nbsp;&lt;STRONG&gt;every&lt;/STRONG&gt;&amp;nbsp;request embeds that session id (reattaching the sandbox) over the&amp;nbsp;&lt;STRONG&gt;Responses API&lt;/STRONG&gt;&amp;nbsp;surface —&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#853%2C21"&gt;responses.create(stream=True, ...)&lt;/A&gt;&amp;nbsp;— plus a&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#175%2C9"&gt;previous_response_id&lt;/A&gt; to chain the transcript:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# oc: the OpenAI-compatible Responses client for the hosted agent. It's obtained
#     from the Foundry project client (AIProjectClient.get_openai_client(...)) and
#     is what actually puts the request on the wire to the agent's /responses endpoint.
oc = self._get_openai_client()

# Build the payload for one turn against the OpenAI Responses API surface.
create_kwargs = {
    "input": prompt,                                 # the user/auto message for this turn
    "stream": True,                                  # stream SSE events back as they happen
    # agent_session_id is the load-bearing field: it tells Foundry to REATTACH
    # this project's existing microVM (and its $HOME state) for this turn,
    # instead of starting fresh. This is what makes the loop stateful.
    "extra_body": {"agent_session_id": _session_id},
}

# previous_response_id chains the in-memory transcript from the prior turn so the
# model has recent conversational context. It's best-effort (the transcript can
# roll/expire); the durable state always comes from $HOME, not from this id.
if previous_response_id:
    create_kwargs["previous_response_id"] = previous_response_id

# Fire the turn. The hosted agent mounts the right sandbox, runs the agent loop,
# and streams events (text deltas, tool calls, completion) back to the client.
stream = oc.responses.create(**create_kwargs)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#361%2C44"&gt;agent_session_id&lt;/A&gt;&amp;nbsp;pins the durable sandbox (survives restarts and idle); the&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#175%2C9"&gt;previous_response_id&lt;/A&gt;&amp;nbsp;chains the in-memory transcript (best-effort). Inside the sandbox, the agent rehydrates and updates its own state through&amp;nbsp;&lt;STRONG&gt;system state tools&lt;/STRONG&gt; exposed alongside the Toolbox — chiefly a JSON-patch operation so each skill updates only the keys it owns:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;@tool  # exposed to the agent as a callable tool, alongside the WorkIQ MCP tools
def state_patch_json(path: str, patch: dict[str, Any]) -&amp;gt; str:
    """Merge top-level keys into a JSON file without overwriting unrelated
    fields. Used for every project_log update.

    path:  a path RELATIVE to the sandbox $HOME (e.g. "project_log.json").
           state.patch_json validates it stays inside $HOME — no escaping out.
    patch: the keys to merge in. Only these keys are touched; everything else
           in the file is preserved. This lets one skill update, say, a task's
           status without clobbering fields another skill owns.
    """
    return state.patch_json(path, patch)             # read-modify-write, atomic&lt;/LI-CODE&gt;
&lt;P&gt;The whole "reconstruct the world on every wake-up" guarantee rests on these reads and patches against $HOME.&lt;/P&gt;
&lt;H2&gt;Telemetry, the easy way&lt;/H2&gt;
&lt;P&gt;Because the agent runs on Foundry with MAF,&amp;nbsp;&lt;STRONG&gt;connecting the Foundry project to Application Insights gives you the entire agent loop as OpenTelemetry traces, turnkey&lt;/STRONG&gt;&amp;nbsp;— model calls, every MCP&amp;nbsp;tools/call, every sub-agent invocation, every state patch. The only custom wiring is a span processor that stamps&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/sansri/AppData/Local/Programs/Microsoft%20VS%20Code%20Insiders/4459d58b54/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank" rel="noopener" data-href="file:///c%3A/Users/sansri/WorkIQ-Sample-Agents/charter-agent/desktop-client/charter_client/bridge.py#547%2C9"&gt;project.id&lt;/A&gt; on each span at boot; the platform owns the exporter. You see the full multi-week loop end to end with no bespoke logging pipeline to build.&lt;/P&gt;
&lt;H2&gt;The alternative: run the loop on the client&lt;/H2&gt;
&lt;P&gt;The other option is to&amp;nbsp;&lt;STRONG&gt;run the agent loop and its state on the local machine&lt;/STRONG&gt;. The trade-offs:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Data residency.&lt;/STRONG&gt;&amp;nbsp;&lt;EM&gt;Hosted:&lt;/EM&gt;&amp;nbsp;sensitive substance stays in an Azure-governed sandbox; the endpoint holds only session ids — ideal where compliance limits business data on local devices.&amp;nbsp;&lt;EM&gt;Local:&lt;/EM&gt;&amp;nbsp;charter, artifacts, and task ledger all live on the laptop, expanding the DLP and vulnerability footprint.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Scale &amp;amp; sharing.&lt;/STRONG&gt;&amp;nbsp;&lt;EM&gt;Hosted:&lt;/EM&gt;&amp;nbsp;per-project sandboxes are provisioned by the platform and any signed-in teammate can attach.&amp;nbsp;&lt;EM&gt;Local:&lt;/EM&gt;&amp;nbsp;bounded by one user's machine; multi-user collaboration is awkward.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Simplicity for a single user.&lt;/STRONG&gt;&amp;nbsp;&lt;EM&gt;Local:&lt;/EM&gt;&amp;nbsp;no session plumbing, no cold-start latency, fully offline-capable — genuinely simpler for one power-user on a permissive setup.&amp;nbsp;&lt;EM&gt;Hosted:&lt;/EM&gt;&amp;nbsp;a little more wiring (session minting, resume handling) in exchange for the governance and scale.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;When to choose what:&lt;/STRONG&gt; go hosted for teams, multi-project portfolios, or any regulated environment; keep it local for a single power-user on a data-permissive machine who values zero roundtrips.&lt;/P&gt;
&lt;H2&gt;See it in action&lt;/H2&gt;
&lt;div data-video-id="https://youtu.be/0gUSkJpvD6Q/1780129499957" data-video-remote-vid="https://youtu.be/0gUSkJpvD6Q/1780129499957" class="lia-video-container lia-media-is-center lia-media-size-large"&gt;&lt;iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F0gUSkJpvD6Q%3Ffeature%3Doembed&amp;amp;display_name=YouTube&amp;amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D0gUSkJpvD6Q&amp;amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F0gUSkJpvD6Q%2Fhqdefault.jpg&amp;amp;type=text%2Fhtml&amp;amp;schema=youtube" allowfullscreen="" style="max-width: 100%"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;H2 data-olk-copy-source="MessageBody"&gt;Get Started&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Ready to build your own stateful agent?&lt;/STRONG&gt;&amp;nbsp;Here are three paths forward:&lt;/P&gt;
&lt;P&gt;🚀&amp;nbsp;&lt;STRONG&gt;Explore the code&lt;/STRONG&gt;: The complete Charter Agent source is on GitHub: &lt;A href="https://github.com/MSFT-Innovation-Hub-India/charter-agent" target="_blank"&gt;Charter-Agent-Repo&lt;/A&gt;&amp;nbsp;— clone it, run it locally, and adapt the patterns for your workflow.&lt;/P&gt;
&lt;P&gt;📚&amp;nbsp;&lt;STRONG&gt;Learn the fundamentals&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/agent-framework/hosting/foundry-hosted-agent?pivots=programming-language-python" target="_blank"&gt;Microsoft Agent Framework&lt;/A&gt; documentation — understand the agent loop and skill architecture&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/deploy-hosted-agent" target="_blank"&gt;Foundry hosted agents&lt;/A&gt; quickstart — deploy your first agent in minutes&lt;/LI&gt;
&lt;LI&gt;Working with &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/manage-hosted-sessions?pivots=python" target="_blank"&gt;Sessions &lt;/A&gt;in Foundry Hosted Agents&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/tools/toolbox?pivots=python" target="_blank"&gt;Foundry Toolbox&lt;/A&gt; overview — connect your agent to M365 and other services&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;What workflow would you build with a stateful agent?&lt;/STRONG&gt; Drop a comment below.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jun 2026 16:30:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/driving-stop-and-go-business-processes-to-closure-with-foundry/ba-p/4524135</guid>
      <dc:creator>srikantan</dc:creator>
      <dc:date>2026-06-08T16:30:00Z</dc:date>
    </item>
    <item>
      <title>A Multi-Region Microsoft Foundry Pattern for Enterprise Private Networking</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/a-multi-region-microsoft-foundry-pattern-for-enterprise-private/ba-p/4525696</link>
      <description>&lt;P&gt;Enterprise AI projects do not move to production on model quality alone. They move when architecture, networking, observability, and evaluation all work together in a way that fits existing enterprise constraints. That is why a multi-region Microsoft Foundry pattern matters: it lets teams deploy Foundry where capacity and platform readiness are available, while continuing to use existing private resources in another region without breaking security or operational expectations.&lt;/P&gt;
&lt;H2&gt;The architecture pattern: Foundry in one region, enterprise resources in another&lt;/H2&gt;
&lt;P&gt;In this pattern, a Foundry account and project are deployed in the region that best fits platform availability, quota, or rollout needs, while enterprise-owned services such as Azure AI Search, Azure Cosmos DB, Key Vault, storage, and Application Insights remain in their approved home region. The two sides are connected through private networking, private DNS, managed identity, and controlled outbound access. The result is an architecture that preserves existing landing-zone investments while still enabling modern agent scenarios in Foundry.&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Why this pattern is useful in real enterprise environments&lt;/H2&gt;
&lt;P&gt;This approach is especially useful when organizations already have production resources, compliance approvals, and operational processes anchored in a specific region. Instead of rebuilding every dependency for a new AI project, teams can place Foundry where they have capacity and feature readiness, then securely connect back to the resources they already trust. This is what makes the pattern practical: it reduces migration friction while keeping the deployment model realistic for enterprise platform teams.&lt;/P&gt;
&lt;H2&gt;What this architecture proves: tools, evaluations, and tracing still work&lt;/H2&gt;
&lt;P&gt;The most important question for customers is not whether the network diagram looks clean on paper. It is whether the key platform experiences still function when Foundry is deployed in one region and the data plane dependencies remain somewhere else. This architecture demonstrates that agent tools can still execute against private resources, that prompts and hosted agents can still access enterprise systems through approved paths, and that the overall solution remains usable under real enterprise constraints.&lt;/P&gt;
&lt;P&gt;Just as importantly, the architecture shows that critical operational workflows continue to work as expected. Batch and agent evaluations can run with Foundry orchestrating execution while reaching private dependencies across regions. Telemetry can still flow into Application Insights for logs and diagnostics. Conversation history, tool activity, and operational traces remain visible through the supported observability surfaces. For enterprise teams, that is the real validation: not just that the deployment succeeds, but that the development, testing, and troubleshooting lifecycle still holds together.&lt;/P&gt;
&lt;P&gt;The screenshot below shows how a prompt agent can use Knowledge IQ (powered by Azure AI search)&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;And tool call details can be seen in foundry&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;And the agent invocation details in the configured application insights&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The foundry agent could also be quickly evaluated using the evaluation feature of the Foundry platform&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Core architectural components behind the pattern&lt;/H2&gt;
&lt;P&gt;At the center is a Foundry project deployed in the region selected for platform readiness. Around it are the services most enterprise agent solutions already depend on: search for grounding and retrieval, Cosmos DB for conversation or application state, Key Vault for secrets, storage for artifacts and datasets, and Application Insights for diagnostics. These services do not have to be recreated in the Foundry region. Instead, the architecture connects to them through private endpoints, private DNS, and identity-based authorization.&lt;/P&gt;
&lt;P&gt;That combination is what makes the pattern credible for production use. Standard service names resolve privately, public ingress can remain disabled on sensitive resources, and agent runtimes communicate only through approved paths. The architecture is designed so the question is not whether enterprise controls must be relaxed for AI adoption, but how Foundry can fit cleanly into the controls that already exist.&lt;/P&gt;
&lt;P&gt;This pattern is already in use by enterprise teams running production Foundry workloads. If your organization has been waiting for a way to adopt Foundry without rebuilding your existing infrastructure, the templates and guidance below will get you started.&lt;/P&gt;
&lt;H2&gt;Next Steps&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Want to learn more first?&lt;/STRONG&gt;&lt;BR /&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/azure-skills/blob/main/skills/microsoft-foundry/resource/private-network/private-network.md" target="_blank" rel="noopener"&gt;Private networking skill documentation&lt;/A&gt; - Deep dive into the networking concepts.&lt;BR /&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/what-is-foundry" target="_blank" rel="noopener"&gt;Microsoft Foundry documentation&lt;/A&gt; - Explore the full platform capabilities.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Ready to deploy?&lt;/STRONG&gt;&lt;BR /&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/19-private-network-agent-tools" target="_blank" rel="noopener"&gt;Deploy the multi-region pattern with Template 19&lt;/A&gt; - Bicep templates to help you get started quickly.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 17:30:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/a-multi-region-microsoft-foundry-pattern-for-enterprise-private/ba-p/4525696</guid>
      <dc:creator>pratyushmishra</dc:creator>
      <dc:date>2026-06-05T17:30:00Z</dc:date>
    </item>
    <item>
      <title>Azure AI Foundry Agent Unable to Use Credentials Stored in Key Vault Through Playwright MCP Tool</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/azure-ai-foundry-agent-unable-to-use-credentials-stored-in-key/m-p/4525755#M1468</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I am trying to understand how Azure AI Foundry agents interact with Azure Key Vault when using custom MCP tools, and I would appreciate any guidance from the community.&lt;/P&gt;&lt;P&gt;My Setup&lt;/P&gt;&lt;P&gt;- Created an Azure AI Foundry agent.&lt;BR /&gt;- Created an Azure Key Vault and configured all permissions according to Microsoft's official documentation.&lt;BR /&gt;- Stored the required website credentials (username and password) in the Key Vault.&lt;BR /&gt;- Deployed the official Playwright MCP Docker image.&lt;BR /&gt;- Exposed the MCP server using ngrok and verified that the endpoint is accessible.&lt;BR /&gt;- Connected the MCP endpoint as a Custom MCP Tool in Azure AI Foundry.&lt;BR /&gt;- Performed all configuration through the Azure portal, Foundry UI, and Playground only (no SDK or custom application code involved).&lt;/P&gt;&lt;P&gt;The Issue&lt;/P&gt;&lt;P&gt;The agent can access and use the Playwright MCP tool. However, when I ask it to log in to a website using credentials that are already stored in Key Vault, it does not populate the username and password fields.&lt;/P&gt;&lt;P&gt;My expectation was that the agent would be able to retrieve the secrets from Key Vault and provide them to the Playwright tool during execution.&lt;/P&gt;&lt;P&gt;Questions&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is there currently a supported mechanism for Azure AI Foundry agents to automatically retrieve Key Vault secrets and pass them to a Custom MCP tool?&lt;/LI&gt;&lt;LI&gt;Does the Playwright MCP Docker image have any built-in integration with Azure Key Vault?&lt;/LI&gt;&lt;LI&gt;When using only the Foundry UI (without SDK code), can a Foundry agent securely inject Key Vault secrets into MCP tool calls?&lt;/LI&gt;&lt;LI&gt;Are additional configurations required beyond Key Vault permissions and agent connections?&lt;/LI&gt;&lt;LI&gt;Has anyone successfully implemented a similar setup where a Foundry agent uses credentials stored in Key Vault to perform browser automation through Playwright MCP?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any clarification on the expected architecture and whether this scenario is currently supported in Azure AI Foundry would be greatly appreciated.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 07:22:22 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/azure-ai-foundry-agent-unable-to-use-credentials-stored-in-key/m-p/4525755#M1468</guid>
      <dc:creator>vaibhav01</dc:creator>
      <dc:date>2026-06-05T07:22:22Z</dc:date>
    </item>
    <item>
      <title>Connecting Claude Clients with Azure API Management and Claude Models in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/connecting-claude-clients-with-azure-api-management-and-claude/ba-p/4525212</link>
      <description>&lt;H1&gt;Summary&lt;/H1&gt;
&lt;P&gt;You want to give your engineering org&amp;nbsp;&lt;A href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"&gt;Claude Code&lt;/A&gt;&amp;nbsp;without handing out Anthropic API keys, without per-developer billing sprawl, and without losing visibility into who is spending what. This post shows a battle-tested pattern:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Claude models run in Microsoft Foundry&lt;/STRONG&gt;, billed through your Azure subscription — no Anthropic contract or keys required.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure API Management (APIM)&lt;/STRONG&gt;&amp;nbsp;sits in front as an&amp;nbsp;&lt;STRONG&gt;LLM gateway&lt;/STRONG&gt;: it authenticates each developer with&amp;nbsp;&lt;STRONG&gt;Entra ID&lt;/STRONG&gt;, enforces&amp;nbsp;&lt;STRONG&gt;per-user rate limits and token quotas&lt;/STRONG&gt;, and emits&amp;nbsp;&lt;STRONG&gt;per-user usage metrics&lt;/STRONG&gt;&amp;nbsp;for chargeback.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry lives in its own Azure subscription&lt;/STRONG&gt;, and APIM authenticates to it with a&amp;nbsp;&lt;STRONG&gt;Foundry API key&lt;/STRONG&gt;&amp;nbsp;— so there's&amp;nbsp;&lt;STRONG&gt;no cross-subscription RBAC&lt;/STRONG&gt;&amp;nbsp;to untangle.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Developers hold only short-lived Entra tokens.&lt;/STRONG&gt;&amp;nbsp;The Foundry key never leaves APIM.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Everything below is grounded in the&amp;nbsp;&lt;A href="https://docs.claude.com/en/docs/claude-code/llm-gateway" target="_blank" rel="noopener"&gt;Claude Code LLM gateway requirements&lt;/A&gt;&amp;nbsp;and Azure API Management's GenAI gateway policies. All command-line steps are shown in&amp;nbsp;&lt;STRONG&gt;PowerShell&lt;/STRONG&gt; for Windows developers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;The problem&lt;/H2&gt;
&lt;P&gt;Claude Code is a terminal- and IDE-native coding agent that talks to Claude over the&amp;nbsp;&lt;STRONG&gt;Anthropic Messages API&lt;/STRONG&gt;. Pointing it directly at Anthropic (or even directly at Foundry) creates three headaches for any organization beyond a handful of users:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Key sprawl and billing.&lt;/STRONG&gt;&amp;nbsp;Direct API keys mean either a shared key (no per-user attribution, a rotation nightmare) or many keys (procurement and offboarding overhead).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No throttle.&lt;/STRONG&gt;&amp;nbsp;Claude Code is&amp;nbsp;&lt;EM&gt;token-heavy&lt;/EM&gt;&amp;nbsp;— it reads files, plans, and edits in long loops. One runaway session or an over-enthusiastic team can produce a surprising bill with nothing standing between the developer and the model.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No visibility.&lt;/STRONG&gt;&amp;nbsp;Finance wants to know cost per team. Security wants to know who is calling what. A raw key gives you neither.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The fix is a&amp;nbsp;&lt;STRONG&gt;gateway&lt;/STRONG&gt;&amp;nbsp;that every request flows through — one that knows&amp;nbsp;&lt;EM&gt;who&lt;/EM&gt;&amp;nbsp;the developer is (Entra ID), enforces&amp;nbsp;&lt;EM&gt;how much&lt;/EM&gt;&amp;nbsp;they can use (APIM GenAI policies), and records&amp;nbsp;&lt;EM&gt;what&lt;/EM&gt; they used (Azure Monitor). Claude Code supports exactly this through its gateway configuration.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Architecture&lt;/H2&gt;
&lt;P&gt;Claude Code on a developer laptop authenticates to Azure API Management with an Entra ID bearer token; APIM validates the token, applies per-user token and request limits, swaps in the Foundry API key, and forwards the Anthropic Messages request to Claude in Microsoft Foundry in a separate subscription; per-user token usage is emitted to Application Insights.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The request path:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Developer laptop  (Claude Code CLI / VS Code)
   |   Authorization: Bearer &amp;lt;Entra access token for the APIM app&amp;gt;
   v
Azure API Management   (the LLM gateway)              [Subscription A]
   |  1. validate-jwt            confirm Entra identity, audience, app role
   |  2. extract oid             per-user counter key
   |  3. llm-token-limit         per-user tokens/min + monthly token quota
   |  4. rate-limit-by-key       per-user requests/min
   |  5. strip Authorization; set api-key from secret named value
   |  6. llm-emit-token-metric   per-user usage to App Insights
   v   (forwards Anthropic Messages format; anthropic-* headers preserved)
Microsoft Foundry  https://{resource}.services.ai.azure.com/anthropic/v1/messages
   v                                                    [Subscription B]
Claude deployments   (Sonnet 4.6 / Haiku 4.5 / Opus 4.6)&lt;/LI-CODE&gt;
&lt;P&gt;The key idea:&amp;nbsp;&lt;STRONG&gt;developer-facing auth and backend auth are independent.&lt;/STRONG&gt; Developers always authenticate as themselves with Entra ID at the gateway. How the gateway authenticates to Foundry is a separate decision — and you have two good options.&lt;/P&gt;
&lt;H3&gt;Choosing how the gateway authenticates to Foundry&lt;/H3&gt;
&lt;P&gt;Both options below are independent of the developer-facing Entra ID auth, and both work whether Foundry is in the&amp;nbsp;&lt;STRONG&gt;same&lt;/STRONG&gt;&amp;nbsp;subscription as APIM or a&amp;nbsp;&lt;STRONG&gt;different&lt;/STRONG&gt;&amp;nbsp;one. The only hard constraint for managed identity is that both resources live in the&amp;nbsp;&lt;STRONG&gt;same Entra tenant&lt;/STRONG&gt;.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&amp;nbsp;&lt;/th&gt;&lt;th&gt;&lt;STRONG&gt;Option A — Foundry API key&lt;/STRONG&gt;&lt;/th&gt;&lt;th&gt;&lt;STRONG&gt;Option B — Managed identity&lt;/STRONG&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;How APIM authenticates&lt;/td&gt;&lt;td&gt;api-key&amp;nbsp;header from a secret named value&lt;/td&gt;&lt;td&gt;Entra token from APIM's managed identity, in the&amp;nbsp;Authorization&amp;nbsp;header&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Setup&lt;/td&gt;&lt;td&gt;Read the key once, store it in APIM&lt;/td&gt;&lt;td&gt;Enable APIM's identity, assign&amp;nbsp;&lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;&amp;nbsp;on Foundry&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Same subscription&lt;/td&gt;&lt;td&gt;Works&lt;/td&gt;&lt;td&gt;Works&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cross-subscription&lt;/td&gt;&lt;td&gt;Works — no RBAC crosses the boundary&lt;/td&gt;&lt;td&gt;Works — role assignment spans subscriptions in the same tenant&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cross-tenant&lt;/td&gt;&lt;td&gt;Works&lt;/td&gt;&lt;td&gt;Not supported — use a key&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Shared secret to rotate&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;None&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Best for&lt;/td&gt;&lt;td&gt;Fastest start; cross-tenant; key-only environments&lt;/td&gt;&lt;td&gt;Production; eliminates the shared secret&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 24.9228%" /&gt;&lt;col style="width: 34.4702%" /&gt;&lt;col style="width: 40.5761%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This guide builds the&amp;nbsp;&lt;STRONG&gt;key-based path&lt;/STRONG&gt;&amp;nbsp;end to end, then shows the&amp;nbsp;&lt;STRONG&gt;managed-identity swap inline&lt;/STRONG&gt; at each step (Parts 3 and 4). Pick one — you don't need both.&lt;/P&gt;
&lt;H3&gt;What this design achieves&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Goal&lt;/th&gt;&lt;th&gt;How it's met&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Developers use Claude Code with no Anthropic billing or keys&lt;/td&gt;&lt;td&gt;Claude runs in Microsoft Foundry, billed through your Azure subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Foundry can live in a different subscription&lt;/td&gt;&lt;td&gt;APIM reaches Foundry by URL + API key only — no cross-subscription RBAC&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Every developer authenticates as themselves&lt;/td&gt;&lt;td&gt;Entra ID tokens validated at the APIM gateway&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Per-developer rate limits and quotas&lt;/td&gt;&lt;td&gt;rate-limit-by-key&amp;nbsp;+&amp;nbsp;llm-token-limit&amp;nbsp;keyed on the Entra&amp;nbsp;oid&amp;nbsp;claim&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Per-developer usage and cost tracking&lt;/td&gt;&lt;td&gt;llm-emit-token-metric&amp;nbsp;→ Application Insights / Log Analytics&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;No Foundry keys on developer laptops&lt;/td&gt;&lt;td&gt;The Foundry key lives only inside APIM; developers hold short-lived Entra tokens&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;Prerequisites&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Two Azure subscriptions&lt;/STRONG&gt;, both pay-as-you-go.&amp;nbsp;&lt;STRONG&gt;Subscription A&lt;/STRONG&gt;&amp;nbsp;holds APIM;&amp;nbsp;&lt;STRONG&gt;Subscription B&lt;/STRONG&gt;&amp;nbsp;holds Foundry. (Foundry Claude does not run on free, trial, sponsored, or CSP subscriptions.)&lt;/LI&gt;
&lt;LI&gt;A&amp;nbsp;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt;&amp;nbsp;resource (Subscription B) in a region where Claude is available — currently&amp;nbsp;&lt;STRONG&gt;East US 2&lt;/STRONG&gt;&amp;nbsp;or&amp;nbsp;&lt;STRONG&gt;Sweden Central&lt;/STRONG&gt;&amp;nbsp;— with Claude deployments created and&amp;nbsp;&lt;STRONG&gt;at least one API key&lt;/STRONG&gt;&amp;nbsp;under&amp;nbsp;&lt;EM&gt;Keys and Endpoint&lt;/EM&gt;.&lt;/LI&gt;
&lt;LI&gt;An&amp;nbsp;&lt;STRONG&gt;API Management&lt;/STRONG&gt;&amp;nbsp;instance (Subscription A).&amp;nbsp;&lt;STRONG&gt;Developer&lt;/STRONG&gt;&amp;nbsp;SKU is fine for a pilot;&amp;nbsp;&lt;STRONG&gt;Standard v2&lt;/STRONG&gt;&amp;nbsp;or&amp;nbsp;&lt;STRONG&gt;Premium&lt;/STRONG&gt;&amp;nbsp;for production and VNet integration.&lt;/LI&gt;
&lt;LI&gt;Permission to read the Foundry key in Subscription B, contributor on the APIM instance, and the ability to register Entra apps.&lt;/LI&gt;
&lt;LI&gt;Developers on&amp;nbsp;&lt;STRONG&gt;Windows&amp;nbsp;&lt;/STRONG&gt;with&amp;nbsp;&lt;STRONG&gt;PowerShell&lt;/STRONG&gt;&amp;nbsp;(5.1 built-in, or 7), the&amp;nbsp;&lt;STRONG&gt;Azure CLI&lt;/STRONG&gt;&amp;nbsp;(winget install Microsoft.AzureCLI), and the&amp;nbsp;&lt;STRONG&gt;Claude Code CLI&lt;/STRONG&gt;&amp;nbsp;installed.&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;Pilot costs: APIM Developer SKU ~$50/month + Claude usage based on token consumption.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Option A (key):&lt;/STRONG&gt;&amp;nbsp;no cross-subscription role assignment — the only cross-subscription action is reading the Foundry key once (Part 3), which you can also do from the Foundry portal.&amp;nbsp;&lt;STRONG&gt;Option B (managed identity):&lt;/STRONG&gt;&amp;nbsp;one cross-subscription role assignment (&lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;), supported as long as APIM and Foundry share an Entra tenant.&lt;/P&gt;
&lt;H2&gt;Part 1 — Deploy Claude in Foundry (Subscription B)&lt;/H2&gt;
&lt;OL&gt;
&lt;LI&gt;In the Foundry portal, open&amp;nbsp;&lt;STRONG&gt;Model catalog&lt;/STRONG&gt;, search&amp;nbsp;&lt;STRONG&gt;Claude&lt;/STRONG&gt;, and deploy the models Claude Code uses. Name each deployment to match its model ID so the gateway can pass the&amp;nbsp;model&amp;nbsp;field through unchanged:
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Deployment name (recommended)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Primary (general coding)&lt;/td&gt;&lt;td&gt;claude-sonnet-4-6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Fast (file reads, small edits, background tasks)&lt;/td&gt;&lt;td&gt;claude-haiku-4-5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Extended thinking (optional)&lt;/td&gt;&lt;td&gt;claude-opus-4-6&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pin versions&lt;/STRONG&gt;&amp;nbsp;— select a specific version, not&amp;nbsp;&lt;EM&gt;auto-update to latest&lt;/EM&gt;. Without pinning, a new model release can break every developer at once.&lt;/LI&gt;
&lt;LI&gt;On the resource's&amp;nbsp;&lt;STRONG&gt;Keys and Endpoint&lt;/STRONG&gt; blade, copy the endpoint and one of the two API keys. The Anthropic endpoint base is:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="markup"&gt;https://{resource}.services.ai.azure.com/anthropic&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Critical:&lt;/STRONG&gt;&amp;nbsp;Foundry's Claude endpoint is the&amp;nbsp;&lt;STRONG&gt;Anthropic surface&lt;/STRONG&gt;&amp;nbsp;(/anthropic/v1/messages),&amp;nbsp;&lt;EM&gt;not&lt;/EM&gt;&amp;nbsp;the OpenAI surface (/openai/deployments/.../chat/completions?api-version=...). When you build the APIM API, do&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt;&amp;nbsp;apply the OpenAI policy template, do&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt;&amp;nbsp;add an&amp;nbsp;api-version&amp;nbsp;query parameter, and do&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt; rewrite to an&amp;nbsp;/openai/...&amp;nbsp;path. Any of these produces the "not supported" or "resource not found" errors people commonly hit.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;✅&amp;nbsp;&lt;STRONG&gt;Checkpoint:&lt;/STRONG&gt;&amp;nbsp;You now have Claude deployed in Foundry.&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fai.azure.com%2F&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541567930%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=EeBk%2BQ5y81EI6jHR4kMo5egCD0QgnHJw0pD7bWkF1gw%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="9" data-ogsc=""&gt;Verify your deployment&lt;/A&gt; before continuing to Part 2.&lt;/P&gt;
&lt;H2&gt;Part 2 — Entra ID app registration (developer-facing)&lt;/H2&gt;
&lt;P&gt;This registration lives in Subscription A's tenant. It defines the&amp;nbsp;&lt;STRONG&gt;audience&lt;/STRONG&gt;&amp;nbsp;developers' tokens are issued for, and what APIM validates. It's unaffected by where Foundry lives.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;App registrations → New registration&lt;/STRONG&gt;&amp;nbsp;→ name it e.g.&amp;nbsp;Claude Code Gateway.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Expose an API&lt;/STRONG&gt;&amp;nbsp;→ set the Application ID URI, e.g.&amp;nbsp;api://claude-code-gateway. Add a scope&amp;nbsp;access_as_user&amp;nbsp;(admin + user consent).&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;(Optional, for tiering)&lt;/EM&gt;&amp;nbsp;&lt;STRONG&gt;App roles&lt;/STRONG&gt;&amp;nbsp;→ add roles such as&amp;nbsp;Claude.Standard&amp;nbsp;and&amp;nbsp;Claude.Premium. Assign developers or groups under&amp;nbsp;&lt;STRONG&gt;Enterprise applications → this app → Users and groups&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Note the&amp;nbsp;&lt;STRONG&gt;Application (client) ID&lt;/STRONG&gt;, the&amp;nbsp;&lt;STRONG&gt;Application ID URI&lt;/STRONG&gt;, and your&amp;nbsp;&lt;STRONG&gt;Tenant ID&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Developers request tokens for this app's audience; APIM validates aud = api://claude-code-gateway.&lt;/P&gt;
&lt;H2&gt;Part 3 — Provision the APIM API and Foundry backend (Subscription A)&lt;/H2&gt;
&lt;H3&gt;3.1 Option A — Store the Foundry API key in APIM&lt;/H3&gt;
&lt;P&gt;First read the key from Foundry in Subscription B (use --subscription so you don't have to switch your active context):&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Read a Foundry key from Subscription B
$FOUNDRY_KEY = az cognitiveservices account keys list `
  --name &amp;lt;foundry-resource&amp;gt; `
  --resource-group &amp;lt;foundry-rg&amp;gt; `
  --subscription &amp;lt;SUBSCRIPTION_B_ID&amp;gt; `
  --query key1 -o tsv&lt;/LI-CODE&gt;
&lt;P&gt;Then store it as a&amp;nbsp;&lt;STRONG&gt;secret named value&lt;/STRONG&gt; in APIM (Subscription A). The policy references it as&amp;nbsp;{{foundry-api-key}}:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Create a secret named value in APIM holding the Foundry key
az apim nv create -g &amp;lt;apim-rg&amp;gt; --service-name &amp;lt;apim-name&amp;gt; `
  --named-value-id foundry-api-key `
  --display-name foundry-api-key `
  --value "$FOUNDRY_KEY" `
  --secret true&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Hardening:&lt;/STRONG&gt;&amp;nbsp;instead of the raw key in APIM, put it in&amp;nbsp;&lt;STRONG&gt;Key Vault&lt;/STRONG&gt;&amp;nbsp;and create a&amp;nbsp;&lt;STRONG&gt;Key Vault-backed&lt;/STRONG&gt;&amp;nbsp;named value, so rotation lives in one place. APIM needs a managed identity with&amp;nbsp;&lt;EM&gt;Get/List&lt;/EM&gt; secret access on that vault — but the vault is in Subscription A alongside APIM, so this is still not a cross-subscription role assignment.&lt;/P&gt;
&lt;H3&gt;3.2 Option B — Give APIM a managed identity instead&lt;/H3&gt;
&lt;P&gt;If you'd rather not manage a shared key,&amp;nbsp;&lt;STRONG&gt;skip 3.1&lt;/STRONG&gt;&amp;nbsp;and give APIM an identity that Foundry trusts. This works in the&amp;nbsp;&lt;STRONG&gt;same subscription&lt;/STRONG&gt;&amp;nbsp;and&amp;nbsp;&lt;STRONG&gt;across subscriptions&lt;/STRONG&gt; alike, as long as both resources are in the same Entra tenant.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Enable a system-assigned managed identity on APIM (Subscription A)
az apim update -g &amp;lt;apim-rg&amp;gt; --name &amp;lt;apim-name&amp;gt; `
  --set identity.type=SystemAssigned

# Get the identity's principal (object) ID
$APIM_MI = az apim show -g &amp;lt;apim-rg&amp;gt; --name &amp;lt;apim-name&amp;gt; `
  --query identity.principalId -o tsv

# Get the Foundry resource ID (Subscription B)
$FOUNDRY_ID = az cognitiveservices account show `
  --name &amp;lt;foundry-resource&amp;gt; --resource-group &amp;lt;foundry-rg&amp;gt; `
  --subscription &amp;lt;SUBSCRIPTION_B_ID&amp;gt; `
  --query id -o tsv

# Grant Cognitive Services User on the Foundry resource (works cross-subscription)
az role assignment create `
  --assignee-object-id $APIM_MI `
  --assignee-principal-type ServicePrincipal `
  --role "Cognitive Services User" `
  --scope $FOUNDRY_ID&lt;/LI-CODE&gt;
&lt;P&gt;The&amp;nbsp;&lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;&amp;nbsp;role (a97b65f3-24c7-4388-baec-2e87135dc908) grants data-plane access to call the model without key-management rights. Role assignments can take a few minutes to propagate. A&amp;nbsp;&lt;STRONG&gt;user-assigned&lt;/STRONG&gt;&amp;nbsp;identity works too — assign it to APIM and reference its client ID in the policy (Part 4, Option B). On this path there is&amp;nbsp;&lt;STRONG&gt;no&amp;nbsp;foundry-api-key&amp;nbsp;named value&lt;/STRONG&gt; to create or rotate.&lt;/P&gt;
&lt;H3&gt;3.3 Create the backend and API&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;# Named backend pointing at the Foundry Anthropic endpoint (Subscription B URL)
az apim backend create -g &amp;lt;apim-rg&amp;gt; --service-name &amp;lt;apim-name&amp;gt; `
  --backend-id foundry-claude `
  --url "https://&amp;lt;foundry-resource&amp;gt;.services.ai.azure.com/anthropic" `
  --protocol http

# API with NO path suffix so callers hit /v1/messages at the gateway root
az apim api create -g &amp;lt;apim-rg&amp;gt; --service-name &amp;lt;apim-name&amp;gt; `
  --api-id claude-anthropic --display-name "Claude (Foundry)" `
  --path="" --protocols https `
  --service-url "https://&amp;lt;foundry-resource&amp;gt;.services.ai.azure.com/anthropic"&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;PowerShell + empty strings:&lt;/STRONG&gt;&amp;nbsp;write&amp;nbsp;--path=""&amp;nbsp;(joined with&amp;nbsp;=),&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt; --path ""&amp;nbsp;as two tokens. PowerShell strips a bare&amp;nbsp;""&amp;nbsp;before the&amp;nbsp;az&amp;nbsp;wrapper sees it, so the CLI reports&amp;nbsp;argument --path: expected one argument. The&amp;nbsp;=&amp;nbsp;form keeps it a single token (--path=) that&amp;nbsp;az&amp;nbsp;reads as an empty string. The same trick applies to any empty-string value you pass to&amp;nbsp;az&amp;nbsp;from PowerShell.&lt;/P&gt;
&lt;P&gt;Add the operations Claude Code calls (a wildcard covers them all):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;POST /v1/messages&lt;/LI&gt;
&lt;LI&gt;POST /v1/messages/count_tokens&lt;/LI&gt;
&lt;LI&gt;GET /v1/models&amp;nbsp;&lt;EM&gt;(only if you enable gateway model discovery — see Part 5.3)&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;az apim&amp;nbsp;can't apply XML policies. Apply the Part 4 policy via the portal (&lt;STRONG&gt;APIs → Claude (Foundry) → Inbound processing → policy editor&lt;/STRONG&gt;) or via Bicep/ARM.&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;Part 4 — The APIM policy (auth + rate limiting + metering)&lt;/H2&gt;
&lt;P&gt;Apply this at the API level. Replace the tenant ID and audience. The policy below is the&amp;nbsp;&lt;STRONG&gt;key-based (Option A)&lt;/STRONG&gt;&amp;nbsp;version — its step 6&amp;nbsp;&lt;STRONG&gt;removes the developer's Authorization header&lt;/STRONG&gt;&amp;nbsp;and sets the&amp;nbsp;&lt;STRONG&gt;api-key&lt;/STRONG&gt;&amp;nbsp;header from the secret named value. For&amp;nbsp;&lt;STRONG&gt;managed identity (Option B)&lt;/STRONG&gt;, swap step 6 as shown immediately after the policy; every other step is identical.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;&amp;lt;policies&amp;gt;
  &amp;lt;inbound&amp;gt;
    &amp;lt;base /&amp;gt;
     &amp;lt;!-- On the client, Bearer token is generated and passed as x-api-key --&amp;gt;
    &amp;lt;set-header name="Authorization" exists-action="skip"&amp;gt;
            &amp;lt;value&amp;gt;@("Bearer " + context.Request.Headers.GetValueOrDefault("x-api-key",""))&amp;lt;/value&amp;gt;
    &amp;lt;/set-header&amp;gt;
    &amp;lt;!-- 1. Validate the developer's Entra ID token --&amp;gt;
    &amp;lt;validate-jwt header-name="Authorization"
                  failed-validation-httpcode="401"
                  failed-validation-error-message="Unauthorized: invalid or missing Entra token."&amp;gt;
      &amp;lt;openid-config url="https://login.microsoftonline.com/{{tenant-id}}/v2.0/.well-known/openid-configuration" /&amp;gt;
      &amp;lt;audiences&amp;gt;
        &amp;lt;audience&amp;gt;{{gateway-audience}}&amp;lt;/audience&amp;gt;
      &amp;lt;/audiences&amp;gt;
      &amp;lt;issuers&amp;gt;
        &amp;lt;issuer&amp;gt;https://login.microsoftonline.com/{{tenant-id}}/v2.0&amp;lt;/issuer&amp;gt;
        &amp;lt;!-- This is needed as  Claude Code's Foundry Mode is looking for scope as https://cognitiveservices.azure.com/.default and audience cannot be changed to APIM Audience (api://...) --&amp;gt;
        &amp;lt;issuer&amp;gt;https://sts.windows.net/{{tenant-id}}/&amp;lt;/issuer&amp;gt;
      &amp;lt;/issuers&amp;gt;
      &amp;lt;required-claims&amp;gt;
        &amp;lt;claim name="roles" match="any"&amp;gt;
          &amp;lt;value&amp;gt;Claude.Standard&amp;lt;/value&amp;gt;
          &amp;lt;value&amp;gt;Claude.Premium&amp;lt;/value&amp;gt;
        &amp;lt;/claim&amp;gt;
      &amp;lt;/required-claims&amp;gt;
    &amp;lt;/validate-jwt&amp;gt;

        &amp;lt;!-- 2. Per-developer key from the stable object id --&amp;gt;
        &amp;lt;set-variable name="callerId" value="@{
        var jwt = context.Request.Headers
            .GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt();
        return jwt.Claims.GetValueOrDefault("oid", "unknown");
    }" /&amp;gt;
        &amp;lt;!-- 3. Tier from app role --&amp;gt;
        &amp;lt;set-variable name="tier" value="@{
        var jwt = context.Request.Headers
            .GetValueOrDefault("Authorization","").Split(' ').Last().AsJwt();
        return jwt.Claims.GetValueOrDefault("roles","").Contains("Claude.Premium") ? "premium" : "standard";
    }" /&amp;gt;
        &amp;lt;set-variable name="modelName" value="@{
      var body = context.Request.Body.As&amp;lt;JObject&amp;gt;(preserveContent: true);
      return body?["model"]?.ToString() ?? "unknown";
    }" /&amp;gt;
        &amp;lt;!-- 4. Token-based throttle per developer (controls LLM cost) --&amp;gt;
        &amp;lt;choose&amp;gt;
            &amp;lt;when condition="@(((string)context.Variables["tier"]) == "premium")"&amp;gt;
                &amp;lt;llm-token-limit counter-key="@((string)context.Variables["callerId"])" tokens-per-minute="200000" estimate-prompt-tokens="true" remaining-tokens-header-name="x-tokens-remaining" token-quota="20000000" token-quota-period="Monthly" /&amp;gt;
                &amp;lt;rate-limit-by-key calls="300" renewal-period="60" counter-key="@((string)context.Variables["callerId"])" retry-after-header-name="retry-after" remaining-calls-header-name="x-ratelimit-remaining" /&amp;gt;
            &amp;lt;/when&amp;gt;
            &amp;lt;otherwise&amp;gt;
                &amp;lt;llm-token-limit counter-key="@((string)context.Variables["callerId"])" tokens-per-minute="50000" estimate-prompt-tokens="true" remaining-tokens-header-name="x-tokens-remaining" token-quota="5000000" token-quota-period="Monthly" /&amp;gt;
                &amp;lt;rate-limit-by-key calls="100" renewal-period="60" counter-key="@((string)context.Variables["callerId"])" retry-after-header-name="retry-after" remaining-calls-header-name="x-ratelimit-remaining" /&amp;gt;
            &amp;lt;/otherwise&amp;gt;
        &amp;lt;/choose&amp;gt;
         &amp;lt;!-- 5. Request-rate throttle per developer --&amp;gt;
        &amp;lt;llm-emit-token-metric namespace="claudecode"&amp;gt;
            &amp;lt;dimension name="UserId" value="@((string)context.Variables["callerId"])" /&amp;gt;
            &amp;lt;dimension name="Tier" value="@((string)context.Variables["tier"])" /&amp;gt;
            &amp;lt;dimension name="Model" value="@((string)context.Variables["modelName"])" /&amp;gt;
        &amp;lt;/llm-emit-token-metric&amp;gt;
        &amp;lt;!-- 6. Authenticate to Foundry with its API key (secret named value) --&amp;gt;
        &amp;lt;!-- Strip the developer's Entra token so Foundry never sees it --&amp;gt;
        &amp;lt;set-header name="Authorization" exists-action="delete" /&amp;gt;
        &amp;lt;set-header name="x-api-key" exists-action="override"&amp;gt;
            &amp;lt;value&amp;gt;{{foundry-api-key}}&amp;lt;/value&amp;gt;
        &amp;lt;/set-header&amp;gt;
        &amp;lt;set-backend-service backend-id="foundry-claude" /&amp;gt;
    &amp;lt;/inbound&amp;gt;
    &amp;lt;backend&amp;gt;
        &amp;lt;base /&amp;gt;
    &amp;lt;/backend&amp;gt;
    &amp;lt;outbound&amp;gt;
        &amp;lt;base /&amp;gt;
    &amp;lt;/outbound&amp;gt;
    &amp;lt;on-error&amp;gt;
        &amp;lt;base /&amp;gt;
    &amp;lt;/on-error&amp;gt;
&amp;lt;/policies&amp;gt;&lt;/LI-CODE&gt;
&lt;H3&gt;Option B — authenticate to Foundry with managed identity&lt;/H3&gt;
&lt;P&gt;If you chose the managed-identity path (3.2), replace&amp;nbsp;&lt;STRONG&gt;step 6&lt;/STRONG&gt; above with the block below. Instead of injecting an&amp;nbsp;api-key, APIM acquires an Entra token for its own identity and forwards it as the&amp;nbsp;Authorization&amp;nbsp;bearer token. Token validation, rate limits, and metering are unchanged.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;  &amp;lt;!-- 6 (Option B). Authenticate to Foundry with APIM's managed identity --&amp;gt;
    &amp;lt;!-- Replace the developer's token with an MI token scoped to AI Services --&amp;gt;
    &amp;lt;authentication-managed-identity
        resource="https://cognitiveservices.azure.com"
        output-token-variable-name="msi-token" /&amp;gt;
    &amp;lt;set-header name="Authorization" exists-action="override"&amp;gt;
      &amp;lt;value&amp;gt;@("Bearer " + (string)context.Variables["msi-token"])&amp;lt;/value&amp;gt;
    &amp;lt;/set-header&amp;gt;

    &amp;lt;set-backend-service backend-id="foundry-claude" /&amp;gt;&lt;/LI-CODE&gt;
&lt;P&gt;The token audience for Azure AI Services / Foundry is&amp;nbsp;https://cognitiveservices.azure.com. For a&amp;nbsp;&lt;STRONG&gt;user-assigned&lt;/STRONG&gt; identity, add&amp;nbsp;client-id="&amp;lt;uami-client-id&amp;gt;"&amp;nbsp;to the&amp;nbsp;authentication-managed-identity&amp;nbsp;element. There's no&amp;nbsp;api-key&amp;nbsp;named value and no secret to rotate on this path — which is exactly why it's the preferred production posture.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Policy notes&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Stripping the developer's&amp;nbsp;Authorization&amp;nbsp;header&lt;/STRONG&gt;&amp;nbsp;before forwarding (step 6) matters: that Entra token is for APIM only. Foundry must receive&amp;nbsp;&lt;EM&gt;only&lt;/EM&gt;&amp;nbsp;the&amp;nbsp;api-key&amp;nbsp;header.&lt;/LI&gt;
&lt;LI&gt;{{tenant-id}},&amp;nbsp;{{gateway-audience}}, and&amp;nbsp;{{foundry-api-key}}&amp;nbsp;are APIM named values. Mark&amp;nbsp;foundry-api-key&amp;nbsp;as&amp;nbsp;&lt;STRONG&gt;secret&lt;/STRONG&gt;; the first two can be plain named values.&lt;/LI&gt;
&lt;LI&gt;llm-token-limit&amp;nbsp;and&amp;nbsp;llm-emit-token-metric&amp;nbsp;are APIM's&amp;nbsp;&lt;STRONG&gt;GenAI gateway&lt;/STRONG&gt;&amp;nbsp;policies — they understand the Anthropic/OpenAI message formats and parse token usage, so you meter&amp;nbsp;&lt;EM&gt;tokens&lt;/EM&gt;, not just requests. That's the right cost lever for token-heavy Claude Code.&lt;/LI&gt;
&lt;LI&gt;These counters are&amp;nbsp;&lt;STRONG&gt;per-region per-gateway&lt;/STRONG&gt;. With multi-region APIM, limits are enforced per region.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Part 5 — Configure Claude Code, Claude Desktop and Cowork on developer machines&amp;nbsp;&lt;/H2&gt;
&lt;P&gt;Developers point Claude Code, Claude Desktop and Cowork at APIM (Anthropic Messages gateway mode) and authenticate with their own Entra token. The backend-auth swap is invisible to clients.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;H3&gt;5.1 Entra token helper (per-developer, auto-refreshing)&lt;/H3&gt;
&lt;P&gt;Create %USERPROFILE%\.claude\get-claude-gateway-token.ps1:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Returns a short-lived Entra access token for the APIM gateway audience.
# Supports CLAUDE_HELPER_CONTEXT for Claude Desktop Cowork/Code silent refresh.

$context = $env:CLAUDE_HELPER_CONTEXT

# For non-interactive contexts, try silent token acquisition only.
# If it fails, exit non-zero so Cowork prompts the user instead of blocking.
if ($context -and $context -ne 'interactive' -and $context -ne 'setup-test') {
    try {
        $token = az account get-access-token `
            --resource "api://claude-code-gateway" `
            --query accessToken -o tsv 2&amp;gt;$null
        if ($LASTEXITCODE -ne 0 -or -not $token) {
            Write-Error "Silent token refresh failed (context: $context)"
            exit 1
        }
        Write-Output $token
        exit 0
    } catch {
        Write-Error "Silent token refresh failed: $_"
        exit 1
    }
}

# Interactive or setup-test context: allow az CLI to prompt if needed.
$token = az account get-access-token `
    --resource "api://claude-code-gateway" `
    --query accessToken -o tsv

if ($LASTEXITCODE -ne 0 -or -not $token) {
    Write-Error "Token acquisition failed. Run 'az login' first."
    exit 1
}

Write-Output $token
exit 0
&lt;/LI-CODE&gt;
&lt;P&gt;PowerShell scripts need no chmod. If execution policy blocks the helper, allow local scripts for your user once:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned&lt;/LI-CODE&gt;
&lt;H3&gt;5.2 Claude Code settings (%USERPROFILE%\.claude\settings.json)&lt;/H3&gt;
&lt;P&gt;In enabling configuration of below environment variables in settings.json under .claude folder allows its usage for all Claude Code sessions (VS Code, terminal CLI, JetBrains, etc.)&lt;/P&gt;
&lt;LI-CODE lang=""&gt;{
  "env": {
       "ANTHROPIC_BASE_URL": "https://&amp;lt;apim-name&amp;gt;azure-api.net",
       "ANTHROPIC_MODEL": "claude-opus-4-8",
       "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-8",
       "CLAUDE_CODE_API_KEY_HELPER_TTL_MS": "600000"
 },
  "apiKeyHelper": "powershell -NoProfile -ExecutionPolicy Bypass -File C:\\Users\\&amp;lt;you&amp;gt;\\.claude\\get-claude-gateway-token.ps1"
}&lt;/LI-CODE&gt;
&lt;P&gt;In JSON, backslashes must be doubled — hence&amp;nbsp;C:\\Users\\.... Use&amp;nbsp;pwsh&amp;nbsp;in place of&amp;nbsp;powershell&amp;nbsp;if you run PowerShell 7.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;apiKeyHelper&amp;nbsp;output is sent as the&amp;nbsp;Authorization&amp;nbsp;(and&amp;nbsp;X-Api-Key) header, validated by APIM's&amp;nbsp;validate-jwt. The developer never holds the Foundry key.&lt;/LI&gt;
&lt;LI&gt;CLAUDE_CODE_API_KEY_HELPER_TTL_MS=3600000&amp;nbsp;refreshes the token hourly (Entra access tokens last ~60–90 minutes).&lt;/LI&gt;
&lt;LI&gt;Pinning the three&amp;nbsp;ANTHROPIC_DEFAULT_*_MODEL&amp;nbsp;IDs ensures Claude Code sends model names that match your Foundry deployment names, so the gateway passes&amp;nbsp;model&amp;nbsp;through untouched.&lt;/LI&gt;
&lt;LI&gt;Other Anthropic models like Sonnet and Haiku can be provided. Default model to be used is provided with ANTHROPIC_MODEL.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Then developers run claude from their project folder.&lt;/P&gt;
&lt;H3&gt;5.3 Optional — model discovery&lt;/H3&gt;
&lt;P&gt;To list gateway models in the /model picker, expose GET /v1/models on the API and set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 (Claude Code v2.1.129+). Only IDs starting with claude or anthropic appear.&lt;/P&gt;
&lt;H3&gt;5.4 VS Code extension&lt;/H3&gt;
&lt;P&gt;Settings.json in .claude folder will control both VS Code Extension and Claude Code CLI.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;🚀 &lt;STRONG&gt;Ready to test?&lt;/STRONG&gt; Jump to Part 7 to validate your setup, or run&amp;nbsp;claude&amp;nbsp;from your project folder to try it live.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;5.5 Claude Desktop and Cowork&lt;/H3&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://claude.com/docs/cowork/3p/installation" target="_blank" rel="noopener"&gt;Install the Claude Desktop and Cowork application&lt;/A&gt; on your desktop. Enable the Developer mode in Help -&amp;gt; Troubleshooting -&amp;gt; Enable Developer Mode to reveal the Developer menu. Then go to Developer -&amp;gt; &lt;A class="lia-external-url" href="https://claude.com/docs/cowork/3p/gateway" target="_blank" rel="noopener"&gt;Configure third-party inference&lt;/A&gt; to open the configuration.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the connection section, Inference Provider will be Gateway and will point to APIM endpoint. You can provide Model list that has been exposed. You will need to use credential helper script by point to PowerShell script that step 5.1 above that will enable authentication using Entra ID.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Administrators can follow &lt;A class="lia-external-url" href="https://claude.com/docs/cowork/3p/installation" target="_blank" rel="noopener"&gt;Export and deploy via MDM to distribute this configuration&lt;/A&gt; once they have tried it on their machine.&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Part 6 — Rate-limiting and usage-tracking design&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Per-developer keying.&lt;/STRONG&gt;&amp;nbsp;Everything is keyed on the Entra&amp;nbsp;oid&amp;nbsp;claim — stable and unique per user, unlike email or&amp;nbsp;upn&amp;nbsp;which can change. For service accounts or CI, key on&amp;nbsp;appid&amp;nbsp;instead.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Two enforcement layers:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;llm-token-limit&amp;nbsp;— tokens/min plus a monthly token quota. The real cost control.&lt;/LI&gt;
&lt;LI&gt;rate-limit-by-key&amp;nbsp;— requests/min. Guards against runaway loops.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Tiering&lt;/STRONG&gt;&amp;nbsp;is driven by Entra app roles (Claude.Standard&amp;nbsp;/&amp;nbsp;Claude.Premium) read from the JWT — no separate APIM subscription management needed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Usage tracking&lt;/STRONG&gt; flows from&amp;nbsp;llm-emit-token-metric&amp;nbsp;into Application Insights with&amp;nbsp;UserId,&amp;nbsp;Tier, and&amp;nbsp;Model&amp;nbsp;dimensions. Example Log Analytics query for per-user monthly token spend:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;customMetrics
| where name == "Total Tokens" and customDimensions.namespace == "claudecode"
| extend UserId = tostring(customDimensions.UserId), Model = tostring(customDimensions.Model)
| summarize Tokens = sum(valueSum) by UserId, Model, bin(timestamp, 1d)
| order by Tokens desc&lt;/LI-CODE&gt;
&lt;P&gt;Foundry doesn't return Anthropic's standard rate-limit headers, so manage and observe limits through APIM (the headers above) and Azure Monitor rather than relying on upstream headers.&lt;/P&gt;
&lt;H2&gt;Part 7 — Test and validate&lt;/H2&gt;
&lt;LI-CODE lang=""&gt;# 1. Get a token as a developer
$TOKEN = az account get-access-token --resource "api://claude-code-gateway" --query accessToken -o tsv

# 2. Call the gateway directly in Anthropic Messages format
$body = @{
  model      = "claude-sonnet-4-6"
  max_tokens = 64
  messages   = @(@{ role = "user"; content = "Say hello in one word." })
} | ConvertTo-Json

Invoke-RestMethod -Method Post `
  -Uri "https://&amp;lt;apim-name&amp;gt;.azure-api.net/v1/messages" `
  -Headers @{
    "Authorization"     = "Bearer $TOKEN"
    "anthropic-version" = "2023-06-01"
    "content-type"      = "application/json"
  } `
  -Body $body&lt;/LI-CODE&gt;
&lt;P&gt;Invoke-RestMethod returns the parsed body but hides response headers. To see x-tokens-remaining / x-ratelimit-remaining, use Invoke-WebRequest with -ResponseHeadersVariable resp (then read $resp), or call curl.exe -i (the real curl, not PowerShell's curl alias).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Validation checklist&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No token / expired token →&amp;nbsp;&lt;STRONG&gt;401&lt;/STRONG&gt;&amp;nbsp;from&amp;nbsp;validate-jwt&amp;nbsp;(confirm before trusting rate limits).&lt;/LI&gt;
&lt;LI&gt;Valid token →&amp;nbsp;&lt;STRONG&gt;200&lt;/STRONG&gt;&amp;nbsp;with a Claude completion; response carries&amp;nbsp;x-tokens-remaining&amp;nbsp;/&amp;nbsp;x-ratelimit-remaining.&lt;/LI&gt;
&lt;LI&gt;401 from Foundry on a valid developer token → the&amp;nbsp;api-key&amp;nbsp;named value is wrong or not injected (see&amp;nbsp;&lt;EM&gt;Troubleshooting&lt;/EM&gt;).&lt;/LI&gt;
&lt;LI&gt;Exceed the limit →&amp;nbsp;&lt;STRONG&gt;429&lt;/STRONG&gt;&amp;nbsp;with&amp;nbsp;retry-after.&lt;/LI&gt;
&lt;LI&gt;App Insights →&amp;nbsp;customMetrics&amp;nbsp;shows token counts dimensioned by&amp;nbsp;UserId.&lt;/LI&gt;
&lt;LI&gt;Then run claude end to end from a project folder.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Part 8 — Operations and hardening&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Key rotation (Option A).&lt;/STRONG&gt;&amp;nbsp;Foundry gives you two keys. Rotate by updating the&amp;nbsp;foundry-api-key&amp;nbsp;named value to&amp;nbsp;key2, then regenerating&amp;nbsp;key1&amp;nbsp;— zero downtime. A Key Vault-backed named value makes this a one-place change.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Prefer managed identity in production (Option B).&lt;/STRONG&gt;&amp;nbsp;If you started on the key path, switch to managed identity (Parts 3.2 and 4, Option B) to remove the shared secret entirely. Because the&amp;nbsp;&lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;&amp;nbsp;role assignment works across subscriptions in the same tenant, the cross-subscription topology doesn't block this upgrade — and developers see no change, since their side of the contract is always&amp;nbsp;&lt;EM&gt;authenticate to the gateway as yourself&lt;/EM&gt;.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Private networking.&lt;/STRONG&gt;&amp;nbsp;Put APIM in internal VNet mode and reach Foundry over a&amp;nbsp;&lt;STRONG&gt;Private Endpoint&lt;/STRONG&gt;; disable Foundry public network access so the gateway is the only path in. Cross-subscription private endpoints are supported.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Resilience.&lt;/STRONG&gt;&amp;nbsp;Deploy Claude in two regions and use APIM's load-balanced backend pool with retry on 429 and 5xx.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cost guardrails.&lt;/STRONG&gt;&amp;nbsp;Pair per-user&amp;nbsp;llm-token-limit&amp;nbsp;quotas with an&amp;nbsp;&lt;STRONG&gt;Azure Budget&lt;/STRONG&gt; and alert on the Foundry resource in Subscription B.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Troubleshooting&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Symptom&lt;/th&gt;&lt;th&gt;Cause / fix&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;404 resource not found&lt;/STRONG&gt;&amp;nbsp;from Foundry&lt;/td&gt;&lt;td&gt;Backend URL or path wrong, or an OpenAI-style rewrite applied. Backend must end in&amp;nbsp;/anthropic; callers hit&amp;nbsp;/v1/messages. Remove any&amp;nbsp;/openai/...&amp;nbsp;rewrite and&amp;nbsp;api-version&amp;nbsp;query param.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;401 from Foundry&lt;/STRONG&gt;&amp;nbsp;(developer token is valid) —&amp;nbsp;&lt;EM&gt;Option A&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;The&amp;nbsp;api-key&amp;nbsp;header is missing/wrong, or the&amp;nbsp;foundry-api-key&amp;nbsp;named value wasn't saved as expected. Confirm the named value, and that the policy deletes the developer&amp;nbsp;Authorization&amp;nbsp;header and sets&amp;nbsp;api-key.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;401 / 403 from Foundry&lt;/STRONG&gt;&amp;nbsp;—&amp;nbsp;&lt;EM&gt;Option B (managed identity)&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;The role assignment is missing or hasn't propagated yet, or the token audience is wrong. Confirm APIM's identity has&amp;nbsp;&lt;STRONG&gt;Cognitive Services User&lt;/STRONG&gt;&amp;nbsp;on the Foundry resource, wait a few minutes, and ensure the policy requests&amp;nbsp;resource="https://cognitiveservices.azure.com". For a user-assigned identity, confirm the&amp;nbsp;client-id&amp;nbsp;is set.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Managed identity works same-sub but not cross-sub&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;The two resources are in different Entra&amp;nbsp;&lt;STRONG&gt;tenants&lt;/STRONG&gt;. Cross-tenant managed identity isn't supported — use the API key (Option A) instead.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;401 at the gateway&lt;/STRONG&gt;&amp;nbsp;even with a token&lt;/td&gt;&lt;td&gt;aud&amp;nbsp;or issuer mismatch. Confirm the token's&amp;nbsp;aud = api://claude-code-gateway&amp;nbsp;and you used the v2.0 OIDC config and issuer.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;403 from Foundry&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;The key belongs to a different Foundry resource, or the resource disabled key auth. Re-copy a key from&amp;nbsp;&lt;EM&gt;Keys and Endpoint&lt;/EM&gt;, or re-enable local/key auth.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Reduced Claude Code functionality&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Gateway stripped&amp;nbsp;anthropic-beta&amp;nbsp;/&amp;nbsp;anthropic-version. Ensure both headers pass through.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Model not available&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Claude Code's model ID doesn't match the Foundry deployment name. Align names, or rewrite the body&amp;nbsp;model&amp;nbsp;field in policy.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;ChainedTokenCredential authentication failed&lt;/STRONG&gt;&amp;nbsp;(client side)&lt;/td&gt;&lt;td&gt;Developer not logged in. Run az login so the helper has a usable Azure credential.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Wrapping up&lt;/H2&gt;
&lt;P&gt;With about an afternoon of setup you get a gateway that every Claude Code request flows through:&amp;nbsp;&lt;STRONG&gt;Entra ID&lt;/STRONG&gt;&amp;nbsp;proves who the developer is,&amp;nbsp;&lt;STRONG&gt;APIM GenAI policies&lt;/STRONG&gt;&amp;nbsp;cap how much each person can spend, and&amp;nbsp;&lt;STRONG&gt;Application Insights&lt;/STRONG&gt;&amp;nbsp;tells you exactly where the tokens went. For the APIM → Foundry hop you pick what fits: a&amp;nbsp;&lt;STRONG&gt;Foundry API key&lt;/STRONG&gt;&amp;nbsp;held only inside APIM (fastest start, works cross-tenant) or a&amp;nbsp;&lt;STRONG&gt;managed identity&lt;/STRONG&gt;&amp;nbsp;with no shared secret at all (the production posture). Either way Claude can live in its own subscription, and developers hold nothing more sensitive than a short-lived Entra token.&lt;/P&gt;
&lt;P&gt;When you're ready to tighten the screws, the upgrade path is clean: if you started on the key, move it into Key Vault, then graduate to&amp;nbsp;&lt;STRONG&gt;managed identity&lt;/STRONG&gt; to eliminate the secret entirely, and put the whole path on a private network.&amp;nbsp;&lt;/P&gt;
&lt;P data-olk-copy-source="MessageBody"&gt;'None of those steps disrupt developers, because their side of the contract — authenticate to the gateway as yourself — never changes.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Start your pilot today:&lt;/STRONG&gt;&amp;nbsp;Deploy a Developer-tier APIM instance, connect it to Foundry, and have your first developer running Claude Code through the gateway by end of day. The&amp;nbsp;&lt;A href="https://outlook.office.com/mail/inbox/id/AAkALgAAAAAAHYQDEapmEc2byACqAC%2FEWg0AP3d1VUAgUUuVO2mJolMPogALOul9OgAA?nativeVersion=1.2026.601.200#x_prerequisites" target="_blank" rel="noopener" data-linkindex="10" data-ogsc=""&gt;Prerequisites&lt;/A&gt; section has everything you need to begin.'&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;All command-line steps target Windows with PowerShell 5.1 or 7. Model IDs and Foundry regions reflect availability at time of writing; check the Foundry model catalog for current options.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-olk-copy-source="MessageBody"&gt;Share Your Experience&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;Share your setup and implementation approach based on this pattern in the comments.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2 data-olk-copy-source="MessageBody"&gt;Next Steps&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Get started now:&lt;/STRONG&gt;&amp;nbsp;-&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fai.azure.com%2Fexplore%2Fmodels&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541464956%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=%2BDZH3sBLUuwuNBuhyKLvD0pQDG4xwqoCNvAXhKIJr6Q%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="3" data-ogsc=""&gt;Deploy Claude models in Microsoft Foundry&lt;/A&gt;&amp;nbsp;— browse the model catalog and create your first deployment -&amp;nbsp;&lt;A href="https://portal.azure.com/#create/Microsoft.ApiManagement" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="4" data-ogsc=""&gt;Create an API Management instance&lt;/A&gt;&amp;nbsp;— spin up a Developer SKU for your pilot&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Go deeper:&lt;/STRONG&gt;&amp;nbsp;-&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.claude.com%2Fen%2Fdocs%2Fclaude-code%2Fllm-gateway&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541485470%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=NfUyUYoiG9Bcb7PhOTYzED9dy3f2WPYkOBlvvveLFVk%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="5" data-ogsc=""&gt;Claude Code LLM gateway requirements&lt;/A&gt;&amp;nbsp;— full specification for gateway compatibility -&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fazure%2Fapi-management%2Fllm-token-limit-policy&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541505712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=VnU%2Bhh%2FjvrST5t9ow0XtrFJi1NyIe%2FhE4zsO5TIg4gg%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="6" data-ogsc=""&gt;APIM GenAI gateway policies reference&lt;/A&gt;&amp;nbsp;— all available token and rate limiting options&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Get help:&lt;/STRONG&gt;&amp;nbsp;- Questions? Post in the&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fazure-ai%2Fbd-p%2FAzureAI&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541526232%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=bx8tOScQMgxsFgDe2AxCt13KIE7cpbxXDbKYS1ENeZs%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="7" data-ogsc=""&gt;Azure AI Community&lt;/A&gt;&amp;nbsp;with tag #ClaudeCode - Found an issue with this guide?&amp;nbsp;&lt;A href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-api-management-samples&amp;amp;data=05%7C02%7Cmurkum%40microsoft.com%7C71ba4dccfe1449b82d0108dec1ef439b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639161430541547310%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;amp;sdata=36Z6Q%2FaB29EY6Ix8NXoG5rElf9YQDnMW1aeNDDUpdYw%3D&amp;amp;reserved=0" target="_blank" rel="noopener" data-auth="NotApplicable" data-linkindex="8" data-ogsc=""&gt;Open a GitHub issue&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2026 20:02:12 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/connecting-claude-clients-with-azure-api-management-and-claude/ba-p/4525212</guid>
      <dc:creator>MuraliKumanduri</dc:creator>
      <dc:date>2026-06-10T20:02:12Z</dc:date>
    </item>
    <item>
      <title>Introducing the New Browser Automation Tool with Toolboxes in Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-the-new-browser-automation-tool-with-toolboxes-in/ba-p/4522790</link>
      <description>&lt;P&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;&lt;SPAN data-teams="true"&gt;Your AI agent just analyzed 500 invoices—but it can't click the 'Submit' button to process them. AI agents can reason through complex problems, but they hit a wall when workflows require interacting with web interfaces&lt;/SPAN&gt; The Browser Automation Tool bridges that gap, giving your agents the ability to interact with any website the same way a human would. Here's what's new in our latest release with Toolboxes support in hosted agents.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Browser Automation Tool is available as an MCP (Model Context Protocol) tool powered by Playwright workspaces as its headless browser infrastructure layer. This gives teams a faster path from idea to automation, with better visibility into what is happening in real time and more control when workflows hit complex edge cases. If you’re working on web automation, this release is for you—it lets you use any CDP (Chrome Dev Tools Protocol) supported browser framework or reasoning SDK to automate your workflows with ease.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Why this release matters&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Today’s AI agents are powerful at reasoning but fall short when workflows move beyond APIs.&lt;/P&gt;
&lt;P&gt;Enterprise reality is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Critical systems live behind web interfaces, not APIs&lt;/LI&gt;
&lt;LI&gt;Workflows require navigation, decision-making, and multi-step interaction&lt;/LI&gt;
&lt;LI&gt;Automation breaks at the last mile of execution&lt;/LI&gt;
&lt;LI&gt;Private and internal websites are difficult to validate safely at scale.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The revised Browser Automation Tool with Toolboxes in Foundry is designed to close those gaps with a practical, operator-friendly workflow.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What is new&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;MCP-native Browser Automation Tool in &lt;/STRONG&gt;&lt;STRONG&gt;Toolboxes&lt;/STRONG&gt;: &lt;SPAN data-olk-copy-source="MessageBody"&gt;Browser Automation Tool (Public Preview) is now available as an MCP tool in Toolboxes, working seamlessly with hosted agents in Foundry Agent Service. This makes it easier to adopt browser automation in your existing agentic workflows. Teams get standardized browser tasks with built-in Microsoft Entra ID authentication&lt;/SPAN&gt; (formerly Azure AD).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Playwright workspaces as the infrastructure layer&lt;/STRONG&gt;: The underlying execution model is powered by Playwright workspaces (generally available) which brings robust browser automation foundations and reliable runtime behavior for modern web applications.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Live View for issue detection&lt;/STRONG&gt;: Live View (Public preview) helps customers detect issues as they happen. You can observe browser actions and quickly identify - selector drift&amp;nbsp;&lt;SPAN data-olk-copy-source="MessageBody"&gt;(when UI elements change and automation can no longer find them)&lt;/SPAN&gt;, timing issues, navigation failures, unexpected UI states. This improves debugging velocity.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Take Control for edge cases:&lt;/STRONG&gt; When automation reaches a non-deterministic path (&lt;SPAN data-olk-copy-source="MessageBody"&gt;unpredictable user flows like CAPTCHAs or dynamic content)&lt;/SPAN&gt;, customers can use Take Control (Public preview) to intervene and steer execution through edge-case conditions. This blends automation scale with human judgment when it is needed most.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Support for private website browsing [&lt;/STRONG&gt;&lt;STRONG&gt;Private Preview&lt;/STRONG&gt;&lt;STRONG&gt;]: &lt;/STRONG&gt;Customers can browse and automate private websites in hosted environments, enabling more realistic enterprise workflows such as internal portals, authenticated research paths, and secured form flows. &lt;A href="https://forms.office.com/pages/responsepage.aspx?id=v4j5cvGGr0GRqy180BHbR2JAEqXo33NGjKiFLvs8i7pUQTJRMFdDOTZTVTA3RlZFVEhVRTAwNU80SiQlQCN0PWcu&amp;amp;route=shorturl" target="_blank" rel="noopener"&gt;Playwright Workspaces Private Website Feature - Private Preview Enrollment&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Built-in Observability &amp;amp; Optimizations&lt;/STRONG&gt;: Logs and evaluations are integrated within observability in Foundry Control Plane and full traceability is available for debugging and reliability&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Choose Orchestration l&lt;/STRONG&gt;&lt;STRONG&gt;ayer&lt;/STRONG&gt;: Customers can choose from wide variety of open-source reasoning layer for their use case while writing the hosted agents.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;STRONG&gt;Get Started&amp;nbsp;&lt;/STRONG&gt;Here are the samples: &lt;A class="lia-external-url" href="https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/responses/browser-automation" target="_blank" rel="noopener"&gt;Sample 1 (&lt;/A&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/responses/browser-automation" target="_blank" rel="noopener"&gt;BYO)&lt;/A&gt; and &lt;A class="lia-external-url" href="https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/agent-framework/responses/14-browser-automation-agent" target="_blank" rel="noopener"&gt;Sample 2 (Microsoft Agent Framework) &lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Sample Scenarios could be&amp;nbsp;&lt;STRONG&gt;Form filling&lt;/STRONG&gt; (automate repetitive, high-volume form interactions while maintaining traceability and visibility during execution) or&amp;nbsp;&lt;STRONG&gt;Web research&lt;/STRONG&gt; (Scale browser-based research flows across multiple sources, while retaining the option to step in manually when pages or data structures vary)&lt;/P&gt;
&lt;P&gt;How teams can use Browser Automation Tool in practice :&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Define the desired browser workflow in your Hosted Agent setup.&lt;/LI&gt;
&lt;LI&gt;Run Browser Automation Tool execution with Live View enabled.&lt;/LI&gt;
&lt;LI&gt;Detect and resolve issues quickly as they appear.&lt;/LI&gt;
&lt;LI&gt;Use Take Control on complex or unexpected branches.&lt;/LI&gt;
&lt;LI&gt;Continue automation and capture outcomes for iteration.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;What this unlocks&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;With &lt;STRONG&gt;h&lt;/STRONG&gt;&lt;STRONG&gt;osted &lt;/STRONG&gt;&lt;STRONG&gt;a&lt;/STRONG&gt;&lt;STRONG&gt;gents with Browser Automation Tool&amp;nbsp; &lt;/STRONG&gt;, you can now build:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;✅ &lt;STRONG&gt;End-to-end agents&lt;/STRONG&gt; that complete real workflows with reliable automation&lt;/LI&gt;
&lt;LI&gt;✅ &lt;STRONG&gt;Enterprise-grade systems&lt;/STRONG&gt; with identity, compliance, and observability&lt;/LI&gt;
&lt;LI&gt;✅ &lt;STRONG&gt;Human-in-the-loop experiences&lt;/STRONG&gt; when precision matters&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This unlocks a new class of applications across sectors (finance &amp;amp; operations, back office automation, testing, web-based data workflows) where teams want to automate with confidence and not just speed.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Learn more&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/tools/browser-automation?pivots=python" target="_blank" rel="noopener"&gt;Browser Automation Tool documentation&lt;/A&gt; and &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/app-testing/playwright-workspaces/overview-what-is-microsoft-playwright-workspaces" target="_blank" rel="noopener"&gt;Playwright workspaces documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/hosted-agents" target="_blank" rel="noopener"&gt;Hosted agents &lt;/A&gt;and &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/tools/toolbox?pivots=python" target="_blank" rel="noopener"&gt;Toolboxes&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="http://aka.ms/foundry/discord" target="_blank" rel="noopener"&gt;Discord&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This update is part of a broader investment in practical, agent-driven browser automation. We are focused on making automation more adaptable to real-world website behavior. If your workflows depend on forms, web navigation, and research at scale, now is a great time to evaluate the revised Browser Automation Tool experience in hosted agents in Foundry Agent Service.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2026 20:05:18 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-the-new-browser-automation-tool-with-toolboxes-in/ba-p/4522790</guid>
      <dc:creator>NandiniMuralidharan</dc:creator>
      <dc:date>2026-06-10T20:05:18Z</dc:date>
    </item>
    <item>
      <title>Azure Speech at Build 2026: Powering Voice Agents with Real-Time and Life-like Experiences</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/azure-speech-at-build-2026-powering-voice-agents-with-real-time/ba-p/4524638</link>
      <description>&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Voice is rapidly becoming the default interface for AI. At Build 2026, Azure Speech is making it dramatically easier to ship production-grade voice experiences that feel real, responsive, and global. From agentic, low-latency Voice Live experiences built natively into Foundry Agent Service, to a new generation of LLM powered Speech to text and Text to speech voices - every layer of the Speech stack is getting faster, more expressive, and more customizable. With a new unified speech experience in the latest Foundry update that brings playgrounds and self-service fine-tuning to every Speech capability, developers now have a clear path from prototype to production for real-time, multilingual, and truly agentic voice applications - faster and more scalable than ever before.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H5 aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Build&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Real-Time Voice&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Agent&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Voice Live&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;&amp;nbsp;and Foundry Agent&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Service&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:360,&amp;quot;335559739&amp;quot;:80,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H5&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;As developers move beyond traditional chatbots, they are building a new class of real-time agents that can listen, reason, take&amp;nbsp;actions, and respond naturally in live conversations.&amp;nbsp;From customer support&amp;nbsp;agents&amp;nbsp;and virtual assistants to healthcare intake, retail concierge, field operations,&amp;nbsp;in-car assistants&amp;nbsp;and multilingual employee support, voice agents are becoming a key interface for how people interact with AI.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;At Build 2026,&amp;nbsp;we’re&amp;nbsp;announcing major updates&amp;nbsp;that support&amp;nbsp;building&amp;nbsp;enterprise-ready, voice agents at scale.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Voice Live for Foundry Prompt Agents is now generally available&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;.&lt;/STRONG&gt; This is a strong fit for developers that want enterprise-ready capabilities with minimal operational overhead. Voice Live brings the essential pieces of voice interaction into a single API, from Speech to Text and Text to Speech to turn detection, interruption handling, avatars, and other conversational capabilities. Customers can combine real-time speech-to-speech interaction with managed agent orchestration, knowledge, memory, enterprise governance, observability, and scalable deployment - all within a single developer workflow (&lt;A class="lia-external-url" href="https://aka.ms/VoiceAgent" target="_blank" rel="noopener"&gt;learn more&lt;/A&gt;).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;&lt;STRONG&gt;Hosted agents with Voice Live is available in Public Preview&lt;/STRONG&gt;.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Some customers need full control over their agent’s runtime, orchestration framework, and execution model. For those scenarios, Microsoft Foundry supports hosted agents with Voice Live&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;(public preview), so developers can build with the frameworks they prefer and deploy on managed infrastructure. Whether using Microsoft Agent Framework, LangChain, or a custom orchestration stack, they can host those agents on Foundry Agent Service and connect them directly to Voice Live. Both Response API and Invocations Protocol are supported (&lt;A class="lia-external-url" href="https://aka.ms/VoiceLive-HostedAgent" target="_blank" rel="noopener"&gt;learn more&lt;/A&gt;).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;div data-video-id="https://www.youtube.com/watch?v=NJETeSgi95k/1780627447904" data-video-remote-vid="https://www.youtube.com/watch?v=NJETeSgi95k/1780627447904" class="lia-video-container lia-media-is-center lia-media-size-large"&gt;&lt;iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FNJETeSgi95k%3Ffeature%3Doembed&amp;amp;display_name=YouTube&amp;amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DNJETeSgi95k&amp;amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FNJETeSgi95k%2Fhqdefault.jpg&amp;amp;type=text%2Fhtml&amp;amp;schema=youtube" allowfullscreen="" style="max-width: 100%"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Hosted agents also adds support for &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;real-time voice interfaces such as WebSocket and WebRTC&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;, which allows developers to deploy real-time voice workloads as managed containers while continuing to use frameworks such as Microsoft Voice Live, Pipecat, and LiveKit etc. These interfaces are bidirectional and full duplex, which makes them well suited to both cascaded pipelines and native speech-to-speech models (&lt;A class="lia-external-url" href="https://aka.ms/deploy-voice-agent" target="_blank" rel="noopener"&gt;learn more&lt;/A&gt;).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;In addition,&amp;nbsp;&lt;STRONG&gt;we are advancing the Voice Live API with the following enhancements &lt;/STRONG&gt;that&amp;nbsp;developers&amp;nbsp;can integrate into their agents:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;New&amp;nbsp;all-in-one&amp;nbsp;speech-to-speech&amp;nbsp;models&amp;nbsp;to help&amp;nbsp;developers&amp;nbsp;build&amp;nbsp;highly responsive voice experiences.&amp;nbsp;These include GPT-Realtime 1.5 and&amp;nbsp;the&amp;nbsp;new&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Azure-Realtime model&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;(public preview), which delivers more natural voice output across multiple languages and accents, including en-US, zh-CN, es-ES, fr-FR, de-DE, hi-IN and more. This is a strong option for customers prioritizing speed, simplicity, and natural conversational quality in multilingual voice experience (&lt;/SPAN&gt;&lt;A href="https://aka.ms/azure-realtime" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;learn more)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Integration with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;MAI Transcribe-1&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;(public&amp;nbsp;preview)&amp;nbsp;for more&amp;nbsp;accurate&amp;nbsp;multilingual speech input,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Neural HD V3 voices&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;for&amp;nbsp;more&amp;nbsp;conversational&amp;nbsp;and realistic voice experience, and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;four&amp;nbsp;new&amp;nbsp;full-body&amp;nbsp;standard avatars&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;(public&amp;nbsp;preview)&amp;nbsp;to make the voice agent more engaging.&amp;nbsp;More details about the models can be found in next sections.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Full integration with speech customization/fine-tuning capabilities&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;in Foundry,&amp;nbsp;including custom speech for better recognition accuracy, custom voice for branded voice experience and custom avatar for&amp;nbsp;one-of-a-kind&amp;nbsp;visual representations of the agent.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;More details about the features can be found in next sections.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;WebRTC&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;(Web Real-Time Communication) connection&amp;nbsp;as public preview, enabling&amp;nbsp;low&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;latency,&amp;nbsp;real&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;time&amp;nbsp;voice interactions directly from web and mobile clients&amp;nbsp;(&lt;/SPAN&gt;&lt;A href="https://go.microsoft.com/fwlink/?LinkId=2358915%20" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;learn more&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="5" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;solution template&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://aka.ms/voice-live-call-center-accelerator" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;call center voice agent&lt;/SPAN&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;&amp;nbsp;accelerator&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;now&amp;nbsp;expands&amp;nbsp;the&amp;nbsp;telephony&amp;nbsp;capabilities&amp;nbsp;by&amp;nbsp;integrating&amp;nbsp;more&amp;nbsp;third-party&amp;nbsp;providers such&amp;nbsp;as&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://www.twilio.com/docs/voice/media-streams" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Twilio&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;and&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://www.infobip.com/docs/voice-and-video/calls" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Infobip&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;,&amp;nbsp;giving&amp;nbsp;customers&amp;nbsp;greater flexibility&amp;nbsp;to&amp;nbsp;connect&amp;nbsp;with&amp;nbsp;their&amp;nbsp;preferred telephony&amp;nbsp;infrastructures.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="32" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="6" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;A new&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Voice Live Evaluation&amp;nbsp;Harness&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; gives developers a one-command-deployable pipeline to score their voice agents on 13 Foundry evaluators - intent resolution, task adherence, task completion, response completeness, and more - using pre-recorded multi-turn audio in Push-toTalk (PTT), Voice-Activity-Detection (VAD), or Foundry Agent mode (&lt;/SPAN&gt;&lt;A href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/evaluate-before-you-ship-introducing-the-voice-live-evaluation-harness/4523064?previewMessage=true" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;learn more&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;).&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5 aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Next Generation&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Speech&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Models&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;in&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Azure Speech&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___1et8vw6" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;0f0d037d-8f53-515a-a209-887e9f044733|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;3&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777843,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,268442635,&amp;quot;24&amp;quot;,469775450,&amp;quot;___1et8vw6&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;1et8vw6&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559705,&amp;quot;2052&amp;quot;,335551547,&amp;quot;4105&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;We're&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___1et8vw6"&gt;&amp;nbsp;advancing&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___1et8vw6"&gt;Azure&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___1et8vw6"&gt;Speech to text&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___1et8vw6"&gt;&amp;nbsp;with a new generation of LLM-powered recognition models that raise accuracy, expand language coverage, and give developers more control across both batch and real-time scenarios.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:180,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="41" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;multilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Strong"&gt;LLM Speech API&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;22c9a4d2-b11d-5cb2-b3b0-a7134d689ea5|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[469777841,&amp;quot;Times New Roman&amp;quot;,469777842,&amp;quot;Times New Roman&amp;quot;,469777844,&amp;quot;Times New Roman&amp;quot;,469769226,&amp;quot;Times New Roman&amp;quot;,335559740,&amp;quot;240&amp;quot;,201341983,&amp;quot;0&amp;quot;,201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;3&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777843,&amp;quot;Times New Roman&amp;quot;,201341986,&amp;quot;1&amp;quot;,268442635,&amp;quot;24&amp;quot;,469775450,&amp;quot;___ka489f0&amp;quot;,201340122,&amp;quot;2&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;ka489f0&amp;quot;,335572020,&amp;quot;1&amp;quot;,335559705,&amp;quot;2052&amp;quot;,335551547,&amp;quot;4105&amp;quot;,134233118,&amp;quot;true&amp;quot;,134233117,&amp;quot;true&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt; &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;is&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;now generally available&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;in Azure speech&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;for LLM-powered transcription and translation of audio files&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;(&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://aka.ms/llm-speech" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;learn more&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;: 25 languages / 90+ locales with locale hint, renewed speech-LLM model with better context and entity recognition and reduced hallucination, up to 5-hour long-form audio, prompt-tuning with 20,000-character input and 2,000 phrase-list entries, and broader regional availability. This model&lt;SPAN data-teams="true"&gt; achieves industry-leading accuracy, ranking No.1 across all models on the Open ASR Leaderboard. &lt;/SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;We also upgraded the&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Strong"&gt;MAI-Transcribe&lt;/SPAN&gt;&lt;SPAN data-ccp-charstyle="Strong"&gt;&amp;nbsp;model from 1.0 to&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-charstyle="Strong"&gt;1.5&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;phrase list support&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;and verbatim mode&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="___ka489f0"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt; &amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;We're upgrading the TTS&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;and TTS avatars&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;in Azure Speech.&amp;nbsp;With&amp;nbsp;flexible instruction controls&amp;nbsp;brought&amp;nbsp;into&amp;nbsp;the HD voices, upgraded recipes in personal voice and&amp;nbsp;new&amp;nbsp;TTS avatar&amp;nbsp;capabilities,&amp;nbsp;customers&amp;nbsp;can&amp;nbsp;build voice agents that feel real,&amp;nbsp;human-like&amp;nbsp;and personalized.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="40" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Neural HD V3 (En-US Ava-Preview/Andrew-Preview/Serena-Preview) &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;is now in public preview, delivering best-in-class quality with prompt-level instruction control. We also upgraded the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;MAI-Voice&amp;nbsp;from 1.0 to&amp;nbsp;2.0&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt; in public preview with 10+ languages support. &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;Personal Voice&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;is upgraded the OmniHD and MAI-Voice-2, optimized for conversational AI, creative applications, and long-form narration with emotion and style control.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="40" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt; Avatar updates include &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;Photo Avatar and Custom Photo Avatar are in &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;generally available (&lt;/SPAN&gt;&lt;A href="https://youtu.be/rm2u56Zi1I4" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;demo&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;). Also, four new &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;full-body standard avatars&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt; are now in public preview in the Foundry Voice Live and Text-to-Speech Avatar playground.&amp;nbsp;&lt;/SPAN&gt;Kobie Burrell, Director of Development of Optimal Blue, sharing: &lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="none"&gt;"The photo avatar and speech service made it incredibly easy for our team bring our Virtual Economist to life. The photo avatars in particular helped us create something that feels human and intuitive - giving our users the experience of engaging with an economist, not just an interface to a set of powerful models."&lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-contrast="none"&gt; &lt;/SPAN&gt;&lt;SPAN style="color: rgb(30, 30, 30);" data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5 aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Azure Speech&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;&amp;amp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Customization&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;experience&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;in&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt; Microsoft Foundry&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:360,&amp;quot;335559739&amp;quot;:80,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P aria-level="3"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;Speech &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;P&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;laygrounds&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;are now available&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;for every Speech capability&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 3"&gt;in Microsoft Foundry&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:120,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Every Azure Speech capability now has a hands-on playground in one place, so developers can try different models, compare them and prototype in the different speech capabilities such as speech-to-text, text-to-speech, avatars, Voice Live, and speech translation - no code required - and go from experimentation to production without ever leaving Foundry. &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-features-foundry" target="_blank" rel="noopener" data-lia-auto-title-active="1"&gt;try it here&amp;nbsp;&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Azure Speech&amp;nbsp;is fine-tunable through&amp;nbsp;the&amp;nbsp;new&amp;nbsp;Foundry&amp;nbsp;experience&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:120,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;For the first time in the Microsoft Foundry, custom speech, voice and avatar allow the developers to tailor models to their own domain vocabulary, brand identity, and visual presence so their agents sound, understand, and look distinctly their own.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:180,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="28" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Custom Speech:&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;adapt speech-to-text to domain vocabulary and acoustic conditions.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="28" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Custom Voice:&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt; train brand voices with Professional Voice, or zero-shot cloning with Personal voice, including Omni and MAI-Voice-1 and 2 models.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="28" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Custom Avatar:&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;create&amp;nbsp;high quality&amp;nbsp;avatars&amp;nbsp;using&amp;nbsp;video,&amp;nbsp;or&amp;nbsp;a quick&amp;nbsp;avatar&amp;nbsp;with&amp;nbsp;a single&amp;nbsp;image.&amp;nbsp;See the self-serving photo avatar creation in the foundry experience as follows:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:180,&amp;quot;335559739&amp;quot;:180,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 1"&gt;Get started today  &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:360,&amp;quot;335559739&amp;quot;:80,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;The easiest way to explore is through the &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://ai.azure.com/" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Microsoft Foundry portal&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt; and the Foundry Tools catalog. From there you can follow the &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/azure/ai-foundry/" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt; and &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/plans/34mi6tezkd7em" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Microsoft Learn courses&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;, and start building with &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://azure.microsoft.com/en-us/products/ai-foundry/tools/speech" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Speech&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt; referring to &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/?view=foundry-classic" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Speech Documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 03:14:41 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/azure-speech-at-build-2026-powering-voice-agents-with-real-time/ba-p/4524638</guid>
      <dc:creator>DONGLI</dc:creator>
      <dc:date>2026-06-05T03:14:41Z</dc:date>
    </item>
    <item>
      <title>Foundry Labs @ Build 2026</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/foundry-labs-build-2026/ba-p/4524581</link>
      <description>&lt;P class="lia-align-justify"&gt;We launched&amp;nbsp;&lt;A href="https://aka.ms/foundrylabs" target="_blank" rel="noopener"&gt;Foundry Labs&lt;/A&gt; last year as the home for cutting-edge AI experiments from across Microsoft. It is meant to be a place where developers could try, fork, and build with our earliest prototypes. The response from the community has been tremendous, and we're doubling down. Today at Build 2026, we're rolling out a refreshed experience for &lt;A href="https://aka.ms/foundrylabs" target="_blank" rel="noopener"&gt;Foundry Labs&lt;/A&gt;. Here's what's new.&lt;/P&gt;
&lt;div data-video-id="https://www.youtube.com/watch?v=fLZ56u1ftl4/1780384465679" data-video-remote-vid="https://www.youtube.com/watch?v=fLZ56u1ftl4/1780384465679" class="lia-video-container lia-media-is-center lia-media-size-large"&gt;&lt;iframe src="https://cdn.embedly.com/widgets/media.html?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DfLZ56u1ftl4&amp;amp;type=text%2Fhtml&amp;amp;schema=google&amp;amp;display_name=YouTube&amp;amp;src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FfLZ56u1ftl4" allowfullscreen="" style="max-width: 100%"&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;H2&gt;A refreshed home for Microsoft’s AI experiments&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;We’ve rebuilt &lt;STRONG&gt;labs.ai.azure.com&lt;/STRONG&gt; as a central hub for every AI experiment coming out of Microsoft. The catalog is now searchable and easier to scan whether you’re hunting for a specific model or just curious about what’s shipping next.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;One theme we heard loud and clear from developers: more connection to the community. So, we added two new sections that put that front and center:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;Stories&lt;/STRONG&gt; — a dedicated space for the stories from the teams who are building real systems with experiments from Foundry Labs. The first stories are live, and we're always looking for more — if you're building with Foundry Labs, we'd love to hear from you.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;Communities&lt;/STRONG&gt; — a page for the places where our team and our developer community gather. The current calendar includes Microsoft events as well as conferences we'll be attending and sponsoring. If you're going to be at any of them, come find us!&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H2&gt;Six areas where AI is having significant real-world impact&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;Over the past year, we’ve watched the same pattern repeat: AI breaking out of demos and into the hands of builders, in six specific domains. We’ve reorganized Foundry Labs around those six areas, so you can navigate by the problem you’re solving for:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;STRONG&gt;Biomedical Sciences&lt;/STRONG&gt; — accelerating pathology, drug discovery, and clinical research with multimodal AI (ex. GigaTIME, RosettaFold3)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Chemistry &amp;amp; Materials&lt;/STRONG&gt; — generating and screening new compounds before they ever exist in a lab (ex. MatterGen, Skala, RetroChimera)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Code &amp;amp; Software Engineering&lt;/STRONG&gt; — debugging, refactoring, and reasoning about codebases at agentic scale (ex. BugPilot, Debug-gym)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Creative &amp;amp; Generative Media&lt;/STRONG&gt; — text-to-image, voice, and 3D generation with enterprise-grade control (ex. MAI-Image-2, MAI-Voice-1, Trellis)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Geospatial &amp;amp; Earth Science&lt;/STRONG&gt; — turning petabytes of satellite and overhead imagery into usable signal (ex. EO/OS Object Detection). This was developed in collaboration with the &lt;A href="https://planetarycomputer.microsoft.com/" target="_blank" rel="noopener"&gt;Microsoft Planetary Computer&lt;/A&gt; team.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Robotics &amp;amp; Physical AI&lt;/STRONG&gt; — translating language into action for embodied systems (ex. Rho-Alpha)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;We’re also rolling out these categories into &lt;A href="https://ai.azure.com/catalog/models" target="_blank" rel="noopener"&gt;Microsoft Foundry Model Catalog&lt;/A&gt;, making it easier for you to find and choose models specific to your use case.&lt;/P&gt;
&lt;img /&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;New innovations from the Lab: MAI models in Foundry across text, image, voice, and speech&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;We're continuing the momentum of frontier-quality, cost-efficient models from Microsoft AI (MAI) with the availability of new releases across 4 modalities:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;STRONG&gt;Text/Reasoning: &lt;/STRONG&gt;MAI-Thinking-1 is our first large language model, designed to deliver strong reasoning, math, and general intelligence at a fraction of the cost of frontier-scale models.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Image: &lt;/STRONG&gt;MAI-Image-2.5 is an updated image generation model that adds image-to-image editing and a suite of "control with preservation" capabilities, once again debuting at No. 3 on Arena.ai for image generation model families. We also have MAI-Image-2.5 Flash for a faster and more efficient option.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Voice:&lt;/STRONG&gt; MAI-Voice-2 is an updated multilingual text-to-speech model that brings voice cloning and voice prompting to more than 10 languages. We also have MAI-Voice-2 Flash for a faster and more efficient option coming soon.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Speech:&lt;/STRONG&gt; MAI-Transcribe-1.5 is an updated speech-to-text model that adds content biasing and improved accuracy.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;These are the same models already powering experiences across Copilot, Bing, OneDrive, PowerPoint, and Azure Speech, and now they're available in Foundry for developers to build with.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;A href="https://aka.ms/mai-build-2026-foundryblog" target="_blank" rel="noopener"&gt;Learn more about the new MAI models in Foundry here&lt;/A&gt;.&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H2&gt;Catch us at Build 2026&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;If you are at Build 2026, two sessions are especially relevant if you want to dig deeper into AI experiments from Microsoft in Foundry Labs:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;STRONG&gt;BRK230 — Build smarter AI systems in Foundry as models and costs evolve.&lt;/STRONG&gt; How to quickly choose, integrate, and validate AI models inside Microsoft Foundry: navigating thousands of model options, benchmarking performance, and streamlining the workflow with deep IDE support.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;LTG419 — Turn ideas into AI applications with Microsoft Foundry Labs. &lt;/STRONG&gt;A lightning talk covering the new Foundry Labs website with interactive demos of the latest releases from Microsoft Research. On-site only in San Francisco.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;Come find us in the &lt;STRONG&gt;expo hall –&lt;/STRONG&gt; we’d love to connect!&lt;/P&gt;
&lt;DIV class="lia-align-justify"&gt;
&lt;H2&gt;What’s Next&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-justify"&gt;Foundry Labs is where Microsoft’s most ambitious AI research becomes accessible to builders. Whether you’re forecasting hurricanes, building voice agents, designing molecules, or shipping multimodal pipelines — the tools are here, and the next wave is already in the lab.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Stay tuned — there’s more coming soon:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Explore more AI innovations on &lt;A href="https://labs.ai.azure.com/" target="_blank" rel="noopener" data-lia-auto-title-active="1"&gt;https://labs.ai.azure.com/&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Join the &lt;A href="https://discord.com/invite/microsoftfoundry" target="_blank" rel="noopener"&gt;Microsoft Foundry Discord&lt;/A&gt; community to shape the future of AI together&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 03 Jun 2026 15:30:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/foundry-labs-build-2026/ba-p/4524581</guid>
      <dc:creator>Saumil-Shrivastava</dc:creator>
      <dc:date>2026-06-03T15:30:00Z</dc:date>
    </item>
  </channel>
</rss>

