<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>rss.livelink.threads-in-node</title>
    <link>https://techcommunity.microsoft.com/t5/microsoft-foundry/ct-p/azure-ai-foundry</link>
    <description>rss.livelink.threads-in-node</description>
    <pubDate>Sun, 26 Apr 2026 09:56:57 GMT</pubDate>
    <dc:creator>azure-ai-foundry</dc:creator>
    <dc:date>2026-04-26T09:56:57Z</dc:date>
    <item>
      <title>Deploying an Agentic Service to Microsoft 365 Copilot with Delegated OBO Access</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/deploying-an-agentic-service-to-microsoft-365-copilot-with/ba-p/4514197</link>
      <description>&lt;H1&gt;&lt;A class="lia-anchor" name="_Toc227832835" target="_blank"&gt;&lt;/A&gt;&lt;/H1&gt;
&lt;P&gt;You've built an agentic service. It works. It uses the framework you chose — maybe LangChain, maybe Semantic Kernel, maybe the Microsoft Agent Framework, maybe something entirely your own. It talks to your LLM, calls your downstream APIs, and manages multi-turn conversations the way you designed it.&lt;/P&gt;
&lt;P&gt;Now someone asks: &lt;EM&gt;"Can we surface this in Microsoft 365 Copilot?"&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The obvious path is to rebuild the agent using M365 Copilot Agents (declarative or custom engine). But that means giving up control — over your orchestration logic, your framework choices, your session management, and the way you call downstream services on behalf of the signed-in user.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;This guide is for the other path.&lt;/STRONG&gt; The one where you keep your existing agentic service largely intact and place an M365 Gateway in front of it — a stateless service that handles the Bot Framework protocol, validates channel-facing tokens, performs the first On-Behalf-Of token exchange, and translates Copilot conversations into your service's native API. Your service can stay Bot- and Teams-agnostic, but if it owns downstream delegated access it should still validate the inbound service token at its own boundary and use that validated assertion for its own OBO chain.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;When to use this pattern:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;You have (or want to build) your own agentic implementation in your preferred technology and framework&lt;/LI&gt;
&lt;LI&gt;M365 Copilot Agents — declarative or custom engine — don't give you enough control over orchestration, tool calling, or downstream auth&lt;/LI&gt;
&lt;LI&gt;You want total ownership of the agentic logic: pro-code, your LLM, your prompts, your tools, your session store&lt;/LI&gt;
&lt;LI&gt;You need user-delegated access to downstream services (databases, APIs, MCP servers) via chained OBO flows, and you want to own that token chain end-to-end&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;What follows is a general development guideline — framework-agnostic, language-agnostic — for deploying any agentic service behind Azure Container Apps, exposing it to M365 Copilot through an M365 Gateway, and using chained OBO flows to call downstream services as the signed-in user.&lt;/P&gt;
&lt;H2&gt;&lt;A class="lia-anchor" name="_Toc227832836" target="_blank"&gt;&lt;/A&gt;Architecture Overview&lt;/H2&gt;
&lt;P&gt;The architecture separates concerns into two independently deployable services:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Layer&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Responsibility&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;State&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;M365 Copilot&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;User identity, SSO, conversation UX&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;None (platform)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Gateway&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Bot protocol adapter, channel auth, OBO #1 token exchange&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Stateless&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Agentic Service&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Business logic, service-token validation, downstream OBO, session memory&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Stateful&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Downstream&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Data, APIs, MCP servers — called as the delegated user&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;External&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832837" target="_blank"&gt;&lt;/A&gt;Why two services?&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Separation of trust boundaries&lt;/STRONG&gt; — The gateway handles Bot Framework protocol and channel auth. The service owns business/data access, validates the service bearer it receives, and never needs Bot credentials.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Independent scaling&lt;/STRONG&gt; — The gateway can scale to many replicas. The service (with in-memory sessions) runs as a single replica for MVP.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Framework freedom&lt;/STRONG&gt; — The service can use any agentic framework, LLM provider, or custom logic. The gateway is just an HTTP forwarder.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Adapt without modifying&lt;/STRONG&gt; — If you already have an agentic service, you can bring it to M365 Copilot by writing only the gateway + service client adapter. The existing service stays untouched.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832838" target="_blank"&gt;&lt;/A&gt;How reusable is the gateway?&lt;/H3&gt;
&lt;P&gt;In practice, most of the gateway can be reused across multiple agentic services that need to surface in M365 Copilot.&lt;/P&gt;
&lt;P&gt;Reusable across services:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;the POST /api/messages host and Bot adapter bootstrap&lt;/LI&gt;
&lt;LI&gt;Bot/channel auth wiring&lt;/LI&gt;
&lt;LI&gt;OBO #1 token acquisition for the downstream service scope&lt;/LI&gt;
&lt;LI&gt;conversation ID to service session ID mapping&lt;/LI&gt;
&lt;LI&gt;long-running turn handling and delayed acknowledgement behavior&lt;/LI&gt;
&lt;LI&gt;busy-turn rejection for overlapping requests in the same conversation&lt;/LI&gt;
&lt;LI&gt;generic seller-safe auth and transient failure handling&lt;/LI&gt;
&lt;LI&gt;ACA, Azure Bot, and app-package deployment shape&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Usually service-specific:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;the downstream service client&lt;/LI&gt;
&lt;LI&gt;activity-to-service payload translation&lt;/LI&gt;
&lt;LI&gt;the target service scope and app-registration values&lt;/LI&gt;
&lt;LI&gt;service-specific telemetry labels and fallback copy&lt;/LI&gt;
&lt;LI&gt;any session bootstrap contract such as create_session() versus direct first-turn POSTs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The Daily Account Planner MVP follows exactly this split: the wrapper is mostly generic, while the planner client and a small amount of message shaping remain service-specific.&lt;/P&gt;
&lt;H2&gt;&lt;A class="lia-anchor" name="_Toc227832839" target="_blank"&gt;&lt;/A&gt;Token Flow Deep Dive&lt;/H2&gt;
&lt;P&gt;The user's identity flows end-to-end through two chained OBO exchanges. No service ever stores or caches user passwords.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832840" target="_blank"&gt;&lt;/A&gt;Key principles&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;The user token never leaves the OBO chain.&lt;/STRONG&gt; Each service receives a scoped assertion and exchanges it for the next hop. No service sees the original password.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;JWT validation is non-negotiable — and split by boundary.&lt;/STRONG&gt; The gateway must validate the Bot/channel-facing token it receives from Microsoft 365. The service must validate the inbound service-scoped bearer before using it for OBO, session ownership, or user scoping.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Internal-only service ingress is recommended hardening, not the only safe option.&lt;/STRONG&gt; In ACA, private ingress is a strong default. If the service remains externally reachable, it must still enforce its own bearer validation and authorization checks exactly the same way.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;ContextVar carries the validated assertion.&lt;/STRONG&gt; The service binds the validated JWT assertion to an async-safe ContextVar in middleware, so any downstream OBO call within the same request automatically picks it up.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;&lt;A class="lia-anchor" name="_Toc227832841" target="_blank"&gt;&lt;/A&gt;Entra ID App Registrations&lt;/H2&gt;
&lt;P&gt;You need &lt;STRONG&gt;two&lt;/STRONG&gt; Entra app registrations plus knowledge of your downstream API's resource app ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832842" target="_blank"&gt;&lt;/A&gt;Step-by-step registration&lt;/H3&gt;
&lt;H4&gt;1. Agentic Service app&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Setting&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Value&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Display name&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;your-service-api&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Identifier URI&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://&amp;lt;service-client-id&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Sign-in audience&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AzureADMyOrg (single tenant)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;requestedAccessTokenVersion&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Exposed scope&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;access_as_user (delegated, User consent)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Required permissions&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Downstream API user_impersonation (delegated)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Client secret&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Yes — needed for MSAL ConfidentialClient OBO&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4&gt;2. Gateway / Bot app&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Setting&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Value&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Display name&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;your-bot&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Identifier URI&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://botid-&amp;lt;bot-client-id&amp;gt; (Bot SSO convention)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Sign-in audience&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;AzureADMyOrg&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;requestedAccessTokenVersion&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Exposed scope&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;access_as_user (delegated)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Required permissions&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Service app access_as_user (delegated)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Redirect URI&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;https://token.botframework.com/.auth/web/redirect&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Preauthorized clients&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Teams Desktop (1fec8e78-...) and Teams Web (5e3ce6c0-...)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Client secret&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Yes — needed for Bot Framework auth&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4&gt;3. Admin consent (mandatory)&lt;/H4&gt;
&lt;P&gt;Both delegated permission grants require tenant admin consent:&lt;/P&gt;
&lt;P&gt;Gateway/Bot app&amp;nbsp; → Service app access_as_user&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ← admin consent&lt;BR /&gt;Service app&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; → Downstream user_impersonation&amp;nbsp;&amp;nbsp;&amp;nbsp; ← admin consent&lt;/P&gt;
&lt;P&gt;Without admin consent, OBO calls return AADSTS65001: The user or administrator has not consented to use the application.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Operational guidance:&lt;/STRONG&gt; if you package this into an operator bootstrap, treat missing admin consent as a &lt;STRONG&gt;blocking failure&lt;/STRONG&gt;. Do not continue with service deployment and hope sign-in will work later. Failing fast here produces a much more repeatable operator experience.&lt;/P&gt;
&lt;H4&gt;4. Azure Bot OAuth connection&lt;/H4&gt;
&lt;P&gt;Create an OAuth connection on the Azure Bot resource so the Microsoft Agents SDK can perform silent token exchange for the gateway-to-service hop:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;az bot authsetting create \
  -g $RESOURCE_GROUP \
  -n $BOT_RESOURCE_NAME \
  -c SERVICE_CONNECTION \
  --service Aadv2 \
  --client-id $BOT_APP_ID \
  --client-secret $BOT_APP_PASSWORD \
  --provider-scope-string "$SERVICE_API_SCOPE offline_access openid profile" \
  --parameters TenantId="$TENANT_ID" TokenExchangeUrl="api://botid-$BOT_APP_ID"&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;A class="lia-anchor" name="_Toc227832843" target="_blank"&gt;&lt;/A&gt;Component Implementation Guide&lt;/H2&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832844" target="_blank"&gt;&lt;/A&gt;Component 1: The M365 Gateway (Stateless, Protocol Adapter)&lt;/H3&gt;
&lt;P&gt;The gateway's job is to &lt;STRONG&gt;bridge&lt;/STRONG&gt; the Bot Framework protocol into whatever HTTP contract your agentic service exposes. It receives Bot activities, validates the channel-facing token path, exchanges the user's SSO token for a service-scoped token (OBO #1), and translates the call into the service's native API shape.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Key insight:&lt;/STRONG&gt; If you already have a stateful agentic service — even one that wasn't designed for M365 — the gateway can adapt to it. You do &lt;STRONG&gt;not&lt;/STRONG&gt; need to modify your existing service to conform to a specific API contract. The gateway absorbs the M365 protocol translation. The service should still validate the service-scoped bearer that reaches its own API boundary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Token validation responsibilities in the Gateway&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;This is a hard requirement, not optional.&lt;/STRONG&gt; The gateway is the public-facing trust boundary for the Bot/channel request. If channel-token validation is skipped or incomplete, unauthenticated callers may be able to reach your forwarding path.&lt;/P&gt;
&lt;P&gt;The gateway must validate the Bot/channel-facing token according to the SDK/Bot Framework requirements for the channel it serves. After OBO #1 produces a service-scoped bearer, the gateway may inspect that token for diagnostics, routing, or optional metadata forwarding, but the service must remain the authoritative validator for that service token.&lt;/P&gt;
&lt;P&gt;Good gateway behavior:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Validate inbound channel traffic&lt;/STRONG&gt; using the Bot/Agents SDK middleware or equivalent issuer/signature/audience checks for the incoming channel token&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Acquire the service-scoped token&lt;/STRONG&gt; for your service API with OBO #1&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Forward the service bearer as-is&lt;/STRONG&gt; in Authorization: Bearer ...&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Optionally extract claims&lt;/STRONG&gt; for logging or convenience headers:&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; oid (user object ID)&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tid (tenant ID)&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; upn or preferred_username&lt;/P&gt;
&lt;OL start="5"&gt;
&lt;LI&gt;&lt;STRONG&gt;Treat forwarded headers as non-authoritative metadata&lt;/STRONG&gt;. They can help with telemetry, correlation, or debugging, but the service should derive identity from the validated bearer it receives.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Adapting to your service's API contract&lt;/H4&gt;
&lt;P&gt;The gateway contains a &lt;STRONG&gt;service client&lt;/STRONG&gt; — a small HTTP adapter class that translates between the Copilot conversation model and your service's native API. This is where you map:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Copilot concept&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Your service's equivalent&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Mapped in the service client&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;conversation.id&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Session/thread/chat ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Map on create or first message&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;User message text&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Request body field (could be text, message, prompt, input, etc.)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Reshape the request payload&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Reply text&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Response field (could be reply, response, output, content, etc.)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Extract from response payload&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Authentication&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Bearer token, API key, or custom header&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Attach the service-scoped delegated token in the right header/field&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Example: adapting to different service shapes&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Service client for a service with POST /chat/{thread_id}
class MyServiceClient:
    async def send_turn(self, session_id: str, text: str, token: str) -&amp;gt; str:
        resp = await self.http.post(
            f"{self.base_url}/chat/{session_id}",
            json={"prompt": text, "stream": False},
            headers={"Authorization": f"Bearer {token}"},
        )
        return resp.json()["response"]  # extract from service's response shape

# Service client for a service with POST /v1/conversations/{id}/messages
class AnotherServiceClient:
    async def send_turn(self, session_id: str, text: str, token: str) -&amp;gt; str:
        resp = await self.http.post(
            f"{self.base_url}/v1/conversations/{session_id}/messages",
            json={"content": text, "role": "user"},
            headers={"X-Api-Token": token},
        )
        return resp.json()["choices"][0]["message"]["content"]&lt;/LI-CODE&gt;
&lt;P&gt;The gateway message handler remains the same regardless of the service client shape:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;async def handle_message(context):
    token = await agent_auth.get_token(context)
    session_id = context.activity.conversation.id
    reply = await service_client.send_turn(session_id, context.activity.text, token)
    await context.send_activity(reply)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;What the gateway needs from any service&lt;/H4&gt;
&lt;P&gt;As long as the service supports these three capabilities — in whatever API shape — the gateway can adapt to it:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Session/thread identity&lt;/STRONG&gt; — Some way to maintain conversation state across turns (session ID, thread ID, conversation ID, etc.)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Message exchange&lt;/STRONG&gt; — An endpoint that accepts user text and returns a reply&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Authentication&lt;/STRONG&gt; — Accepts a bearer token or other credential that the gateway can supply via OBO #1&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The service does &lt;STRONG&gt;not&lt;/STRONG&gt; need to use a specific URL pattern, request/response schema, or framework. The gateway's service client class is the single place where you encode these mappings. The service should still validate the bearer it receives before it trusts any user identity implied by the call.&lt;/P&gt;
&lt;H4&gt;Key implementation details&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Auth handlers&lt;/STRONG&gt; — The gateway needs two auth handler paths:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Agentic path&lt;/STRONG&gt; (AgenticUserAuthorization): Used when the activity arrives through the M365 Copilot agentic channel. Requires both abs_oauth_connection_name and obo_connection_name.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Connector path&lt;/STRONG&gt; (UserAuthorization): Used when the activity arrives through standard Teams / Bot connector. Requires abs_oauth_connection_name only.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="python"&gt;auth_handlers = {
    "service_agentic": AuthHandler(
        auth_type="AgenticUserAuthorization",
        abs_oauth_connection_name="SERVICE_CONNECTION",
        obo_connection_name="SERVICE_OBO_CONNECTION",   # may be same as abs
        scopes=["api://&amp;lt;service-client-id&amp;gt;/access_as_user"],
    ),
    "service_connector": AuthHandler(
        auth_type="UserAuthorization",
        abs_oauth_connection_name="SERVICE_CONNECTION",
        obo_connection_name="",
        scopes=["api://&amp;lt;service-client-id&amp;gt;/access_as_user"],
    ),
}&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Invoke handling&lt;/STRONG&gt; — Copilot sends invoke activities during the SSO token exchange handshake. The gateway must handle these gracefully (return without error) or the sign-in flow breaks.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Session mapping&lt;/STRONG&gt; — Use context.activity.conversation.id as the session key when forwarding to the service. This ensures the same Copilot conversation always maps to the same agentic session.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Healthz bypass&lt;/STRONG&gt; — The ACA health probe hits /healthz. Bypass JWT middleware for this path or the probe fails.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Long-running compatibility bridge&lt;/STRONG&gt; — If the SDK's built-in long-running proactive path does not preserve the message contract your wrapper needs, keep long-running mode enabled but own a very small gateway-local compatibility bridge. That bridge should preserve the original user message activity while still using the proactive continuation context for the outbound reply. Keep this as an infrastructure concern in the gateway; do not move business logic there.&lt;/P&gt;
&lt;H4&gt;Technology choices&lt;/H4&gt;
&lt;P&gt;The gateway uses:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;microsoft-agents-hosting-fastapi for the Bot SDK adapter&lt;/LI&gt;
&lt;LI&gt;microsoft-agents-authentication-msal for MSAL-based token exchange&lt;/LI&gt;
&lt;LI&gt;httpx for forwarding HTTP calls to the service&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;You can substitute any Bot SDK or language as long as you handle the same protocol.&lt;/P&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832845" target="_blank"&gt;&lt;/A&gt;Component 2: The Agentic Service (Stateful, Framework-Agnostic)&lt;/H3&gt;
&lt;P&gt;The service is a regular HTTP API. It can know nothing about Bot Framework, Teams, or Copilot-specific activity shapes, but if it owns downstream delegated access it should validate the inbound service bearer at its own boundary. It receives the service-scoped token, validates it, binds user claims from that validated token, runs your agentic logic, and returns a reply.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Service trust model:&lt;/STRONG&gt; The service is the business/data trust boundary. It should trust the validated bearer token it receives, not convenience headers alone. Forwarded X-User-* headers are optional metadata and should never be the sole basis for user scoping or downstream OBO.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Minimum service requirements&lt;/H4&gt;
&lt;P&gt;The service needs only these capabilities — everything else is optional:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Requirement&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;What it means&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Why&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Validate inbound bearer&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Read and validate the Authorization: Bearer header&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Needed before using the token for OBO, ownership, or authorization&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Extract claims from the validated token&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Decode oid, tid, upn / preferred_username, scopes, etc.&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Needed for session owner isolation and audit trail&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Session/thread identity&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Maintain conversation state across turns&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Copilot expects multi-turn conversations&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Message exchange&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Accept user text, return a reply&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Core function&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;OBO #2&lt;/STRONG&gt; (if user-delegated access needed)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;MSAL acquire_token_on_behalf_of() with the forwarded assertion&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Only if calling downstream APIs as the user&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Ingress hardening&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Prefer internal-only ingress or equivalent network controls when practical&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Reduces attack surface, but does not replace service-side token validation&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4&gt;ContextVar pattern for the OBO assertion&lt;/H4&gt;
&lt;P&gt;The validated JWT assertion (the raw token string from the gateway) must be available deep in the call stack when a downstream OBO call happens — potentially several layers below the HTTP handler. Use contextvars.ContextVar rather than passing it through every function signature:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from contextvars import ContextVar, Token as CtxToken

_USER_ASSERTION: ContextVar[str | None] = ContextVar("user_assertion", default=None)
_USER_CLAIMS: ContextVar[TokenClaims | None] = ContextVar("user_claims", default=None)

def bind_identity(assertion: str, claims: TokenClaims):
    return _USER_ASSERTION.set(assertion), _USER_CLAIMS.set(claims)

def reset_identity(a_tok: CtxToken, c_tok: CtxToken):
    _USER_ASSERTION.reset(a_tok)
    _USER_CLAIMS.reset(c_tok)

# In middleware, after validating the bearer:
a_tok, c_tok = bind_identity(raw_jwt, validated_claims)
try:
    response = await call_next(request)
finally:
    reset_identity(a_tok, c_tok)  # always clean up&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Why ContextVar?&lt;/STRONG&gt; — It is async-safe (each concurrent request gets its own scope), framework-agnostic (works with FastAPI, Flask, Django, raw asyncio), and avoids threading the assertion through your entire agentic stack.&lt;/P&gt;
&lt;H4&gt;OBO #2 to downstream services&lt;/H4&gt;
&lt;P&gt;When your agentic logic needs to call a downstream API as the user:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import msal

app = msal.ConfidentialClientApplication(
    client_id=SERVICE_CLIENT_ID,
    client_credential=SERVICE_CLIENT_SECRET,
    authority=f"https://login.microsoftonline.com/{TENANT_ID}",
)

result = app.acquire_token_on_behalf_of(
    user_assertion=_USER_ASSERTION.get(),          # from ContextVar
    scopes=["&amp;lt;downstream-resource-id&amp;gt;/.default"],  # e.g. Graph, Databricks, your own API
)
downstream_token = result["access_token"]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Common downstream scope examples:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Downstream&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Scope&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Microsoft Graph&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;https://graph.microsoft.com/.default&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Azure Databricks&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Azure SQL&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;https://database.windows.net/.default&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Custom internal API&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://&amp;lt;their-app-id&amp;gt;/.default&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;MCP server behind Entra&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://&amp;lt;mcp-app-id&amp;gt;/.default&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;The pattern is identical for any Entra-protected resource — only the scope changes.&lt;/P&gt;
&lt;H4&gt;Session management&lt;/H4&gt;
&lt;P&gt;Minimum viable session store for MVP (in-memory, single-replica):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Create&lt;/STRONG&gt; — POST /api/chat/sessions → returns session_id&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Send message&lt;/STRONG&gt; — POST /api/chat/sessions/{id}/messages → returns reply&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Get session&lt;/STRONG&gt; — GET /api/chat/sessions/{id} → returns turns history&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Owner isolation&lt;/STRONG&gt; — Every session has an owner_id from the validated JWT oid claim. Reject cross-user access with 403.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Turn windowing&lt;/STRONG&gt; — Cap stored turns at max_turns * 2 entries to prevent unbounded memory growth.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Concurrency lock&lt;/STRONG&gt; — asyncio.Lock() per session prevents interleaved turns.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Your agentic logic goes here&lt;/H4&gt;
&lt;P&gt;The service API is a clean boundary. Inside it, use whatever you want:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Microsoft Agent Framework&lt;/STRONG&gt; — HandoffBuilder, ConcurrentBuilder, etc.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;LangChain / LangGraph&lt;/STRONG&gt; — chains, graphs, tool calling&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Semantic Kernel&lt;/STRONG&gt; — planners and plugins&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Custom code&lt;/STRONG&gt; — direct OpenAI / Azure OpenAI SDK calls with tool loops&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Any other framework&lt;/STRONG&gt; — CrewAI, AutoGen, etc.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Reference API contract&lt;/H4&gt;
&lt;P&gt;If building a new service from scratch, this contract works well with the gateway:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;POST /api/chat/sessions/{session_id}/messages
Authorization: Bearer &amp;lt;service-scoped-token&amp;gt;
Content-Type: application/json
{"text": "user message"}

→ 200 {"session_id": "...", "reply": "...", "turns": [...]}
→ 401 (bad/missing token)
→ 404 (session not found — gateway auto-creates and retries)&lt;/LI-CODE&gt;
&lt;P&gt;However, if you have an &lt;STRONG&gt;existing service&lt;/STRONG&gt; with a different contract, you do not need to change it. Instead, adapt the gateway's service client to speak your service's API (see &lt;A href="#community--1-X5cea8a8d2e7511d235f090bdea6df20202d4014" target="_blank"&gt;&lt;EM&gt;Component 1: The M365 Gateway&lt;/EM&gt;&lt;/A&gt; above). The three essential capabilities are session identity, message exchange, and authentication — the URL paths and payload shapes are flexible.&lt;/P&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832846" target="_blank"&gt;&lt;/A&gt;Component 3: The M365 App Package&lt;/H3&gt;
&lt;P&gt;The app package is a ZIP containing a manifest and icons that registers the gateway as a Custom Engine Agent in Microsoft 365.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Critical manifest fields&lt;/H4&gt;
&lt;LI-CODE lang="json"&gt;{
  "bots": [{
    "botId": "&amp;lt;BOT_APP_ID&amp;gt;",
    "scopes": ["personal"]
  }],
  "webApplicationInfo": {
    "id": "&amp;lt;BOT_SSO_APP_ID&amp;gt;",
    "resource": "api://botid-&amp;lt;BOT_APP_ID&amp;gt;"
  },
  "copilotAgents": {
    "customEngineAgents": [{
      "type": "bot",
      "id": "&amp;lt;BOT_APP_ID&amp;gt;",
      "functionsAs": "agentOnly"
    }]
  },
  "validDomains": [
    "&amp;lt;gateway-host&amp;gt;.azurecontainerapps.io",
    "token.botframework.com"
  ]
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;A class="lia-anchor" name="_Toc227832847" target="_blank"&gt;&lt;/A&gt;Deployment Sequence&lt;/H2&gt;
&lt;P&gt;Follow this exact order. Each step depends on the previous outputs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step details:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Step&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;What&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Depends on&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Create service app + bot app in Entra ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Tenant access&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Grant admin consent for both delegated permission chains&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Step 1&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Create Azure Bot resource, configure OAuth connection (SERVICE_CONNECTION)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 1-2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Provision downstream resources (databases, APIs, warehouses)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Independent&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Build container image, deploy Agentic Service to ACA&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 1-4&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;6&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Validate /healthz, session creation, first turn, OBO to downstream&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Step 5&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;7&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Build container image, deploy M365 Gateway to ACA&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 1-3, 5&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;8&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Validate /healthz, gateway-to-service forwarding&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 6-7&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;9&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Build manifest.json + icons into ZIP package&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 1, 7&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;10&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Upload ZIP to Microsoft 365 admin center or Graph API&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Step 9&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;11&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Set Azure Bot messaging endpoint to https://&amp;lt;gateway&amp;gt;/api/messages&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 3, 7&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;12&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Open M365 Copilot, send a message, verify end-to-end flow&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Steps 10-11&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832848" target="_blank"&gt;&lt;/A&gt;Operatorized bootstrap variant&lt;/H3&gt;
&lt;P&gt;For a reusable customer/operator flow, this pattern works better when it is packaged into &lt;STRONG&gt;two main scripts&lt;/STRONG&gt; plus &lt;STRONG&gt;two-layer env files&lt;/STRONG&gt;:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Operator input env&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; small, human-edited&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; contains only the initial tenant/subscription/resource naming values and any demo-user identities&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Generated runtime env&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; script-owned&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; stores discovered URLs, app IDs, secrets, package IDs, and image refs&lt;/P&gt;
&lt;P&gt;Recommended split:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;bootstrap-azure-demo.sh&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; preflight Azure login and permissions&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create or reuse infra&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create/reuse app registrations&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; enforce admin-consent success&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; build and publish images&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; deploy the service and the gateway&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; wire Azure Bot and OAuth connection&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;bootstrap-m365-demo.sh&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; build the Teams/M365 app package&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; publish to the tenant catalog&lt;/P&gt;
&lt;P&gt;–&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self-install for the signed-in operator&lt;/P&gt;
&lt;P&gt;This keeps the operator surface area small while preserving advanced lower-level scripts for recovery or debugging.&lt;/P&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832849" target="_blank"&gt;&lt;/A&gt;Environment variables reference&lt;/H3&gt;
&lt;H4&gt;Agentic Service&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Variable&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Example&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Purpose&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;AZURE_TENANT_ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;b5d67878-...&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Entra tenant (for OBO #2)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;SERVICE_CLIENT_ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;lt;service-app-id&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Service app registration (for OBO #2)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;SERVICE_CLIENT_SECRET&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;lt;secret&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Service OBO client credential&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;DOWNSTREAM_OBO_SCOPE&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;lt;resource-id&amp;gt;/.default&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Downstream resource scope for OBO #2&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;AZURE_OPENAI_ENDPOINT&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;https://....cognitiveservices.azure.com&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;LLM endpoint&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;AZURE_OPENAI_DEPLOYMENT&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;gpt-4o&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Model deployment name&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;DATABRICKS_WAREHOUSE_ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;warehouse-123&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Optional fixed warehouse id if already known&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;DATABRICKS_AUTO_CREATE_WAREHOUSE&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;true&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Allow bootstrap to create a warehouse when none exists&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4&gt;M365 Gateway&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Variable&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Example&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Purpose&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;AZURE_TENANT_ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;b5d67878-...&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Entra tenant&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;BOT_APP_ID&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;lt;bot-app-id&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Bot registration&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;BOT_APP_PASSWORD&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;lt;secret&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Bot client secret&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;SERVICE_BASE_URL&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;https://my-service.azurecontainerapps.io&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Service endpoint&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;SERVICE_API_SCOPE&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://&amp;lt;service-app-id&amp;gt;/access_as_user&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;OBO #1 target scope&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;SERVICE_EXPECTED_AUDIENCE&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;api://&amp;lt;service-app-id&amp;gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;JWT audience validation&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;&lt;A class="lia-anchor" name="_Toc227832850" target="_blank"&gt;&lt;/A&gt;Repeatability guidance&lt;/H3&gt;
&lt;P&gt;If you expect operators to rerun the deployment in the same working copy:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;keep operator-owned inputs separate from generated runtime values&lt;/LI&gt;
&lt;LI&gt;persist app IDs/object IDs after first create and prefer those on reruns&lt;/LI&gt;
&lt;LI&gt;avoid re-identifying Entra apps by display name alone&lt;/LI&gt;
&lt;LI&gt;bind the generated runtime env to a signature of the operator inputs so a new tenant, subscription, or prefix does not inherit stale values from an old run&lt;/LI&gt;
&lt;LI&gt;prefer early preflight failures over deep runtime failures&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Reference Implementation&lt;/H2&gt;
&lt;P&gt;A concrete implementation of this pattern is available here:&lt;/P&gt;
&lt;P&gt;👉 &lt;A href="https://github.com/james-tn/dbx-mcp-copilot" data-tabster="{&amp;quot;restorer&amp;quot;:{&amp;quot;type&amp;quot;:1}}" target="_blank"&gt;dbx-mcp-copilot reference repo&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This repository demonstrates the full end-to-end application of the architecture described in this guide, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A &lt;STRONG&gt;stateful agentic service&lt;/STRONG&gt; deployed on Azure Container Apps&lt;/LI&gt;
&lt;LI&gt;A &lt;STRONG&gt;thin M365 Gateway (wrapper)&lt;/STRONG&gt; that exposes the service as a Custom Engine agent&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Chained OBO flows&lt;/STRONG&gt; enabling delegated access to downstream systems (e.g., Databricks)&lt;/LI&gt;
&lt;LI&gt;Integration with the &lt;STRONG&gt;Microsoft Agent Framework&lt;/STRONG&gt; for orchestration&lt;/LI&gt;
&lt;LI&gt;Complete &lt;STRONG&gt;operator runbook, CI/CD pipelines, and app packaging&lt;/STRONG&gt; for M365 deployment &lt;A href="https://github.com/james-tn/dbx-mcp-copilot/blob/main/README.md" target="_blank"&gt;[github.com]&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The current implementation is centered on a &lt;STRONG&gt;Daily Account Planner MVP&lt;/STRONG&gt;, which illustrates how a real-world agent can:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Maintain session state and multi-turn reasoning&lt;/LI&gt;
&lt;LI&gt;Access enterprise data using user-delegated tokens&lt;/LI&gt;
&lt;LI&gt;Surface through Microsoft 365 Copilot without rewriting the agent into Copilot-native constructs &lt;A href="https://github.com/james-tn/dbx-mcp-copilot/blob/main/README.md" target="_blank"&gt;[github.com]&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Why this repo is useful&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Concrete mapping of concepts → code&lt;/STRONG&gt;&lt;BR /&gt;Every component in this guide (Gateway, Service, OBO chain) is implemented with working infrastructure and scripts.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reusable gateway pattern&lt;/STRONG&gt;&lt;BR /&gt;The M365 wrapper in the repo is intentionally thin and reusable — aligning with the pattern described in this document and designed to be adapted to other agentic services. &lt;A href="https://github.com/james-tn/dbx-mcp-copilot/blob/main/README.md" target="_blank"&gt;[github.com]&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Operator-ready deployment&lt;/STRONG&gt;&lt;BR /&gt;Includes end-to-end bootstrap scripts, environment modeling, and CI/CD flows for repeatable deployments in real tenants.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 24 Apr 2026 13:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/deploying-an-agentic-service-to-microsoft-365-copilot-with/ba-p/4514197</guid>
      <dc:creator>JamesN</dc:creator>
      <dc:date>2026-04-24T13:00:00Z</dc:date>
    </item>
    <item>
      <title>GPT Capability in Understanding Coordinates:
How GPT-5.4 Transforms Spatial Precision</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/gpt-capability-in-understanding-coordinates-how-gpt-5-4/ba-p/4506726</link>
      <description>&lt;DIV class="topbar"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="page lia-align-left"&gt;
&lt;DIV class="prose"&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;Why I Ran This Experiment&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;This work started not as a benchmarking exercise, but as a practical problem: I needed to automatically extract panel regions from PDF-format &lt;STRONG&gt;electrical Single-Line Diagram (SLD) drawings&lt;/STRONG&gt; using OpenAI models . All experiments were conducted using OpenAI models in &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/concepts/foundry-models-overview" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt;- Microsoft's unified platform for building generative AI applications.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;&amp;nbsp;The downstream goal was a pipeline that combines GPT model with Azure Document Intelligence to generate Bills of Materials (BOMs) — a project I wrote about separately in &lt;EM&gt;&lt;A style="color: var(--accent);" href="#community--1-" target="_blank" rel="noopener"&gt;Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5 + Azure Document Intelligence Pipeline&lt;/A&gt;&lt;/EM&gt;.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;Before building that pipeline, I needed a clear-eyed answer to a deceptively simple question: &lt;STRONG&gt;how well can GPT actually understand and return pixel-level coordinates from an image?&lt;/STRONG&gt; If the model can't reliably locate a panel bounding box, the rest of the pipeline doesn't matter.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;When I first ran these tests against GPT-5.2, the results were mixed — good enough to be promising, but inconsistent enough to leave clear room for improvement. I tried many workarounds: feeding image dimensions explicitly, overlaying coordinate grids, enabling extended reasoning, and building iterative self-correction loops. Each helped, but required deliberate engineering effort. Then GPT-5.4 was released. Re-running the same benchmark revealed that &lt;STRONG&gt;most of those workarounds were no longer necessary&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;STRONG&gt;Context:&lt;/STRONG&gt; All experiments use a fixed CAD-style test image (847 × 783 px) with a known ground-truth bounding box at &lt;CODE&gt;[135, 165, 687, 619]&lt;/CODE&gt;. Accuracy is measured by Intersection over Union (IoU) — a score of 1.0 is a perfect match. Every test was run &lt;STRONG&gt;5 times&lt;/STRONG&gt; and averaged.&lt;/BLOCKQUOTE&gt;
&lt;DIV class="blog-figure lia-align-center"&gt;&lt;img&gt;Figure 1 — The clean electrical SLD drawing (847×783 px) used as the base test image &lt;BR /&gt;for all coordinate experiments.&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;The Experiment Design&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;I designed experiments across two axes: &lt;EM&gt;prompt strategy&lt;/EM&gt; (how spatial information is presented to the model) and &lt;EM&gt;reasoning mode&lt;/EM&gt; (standard vs. extended reasoning). Each combination was tested across both GPT-5.2 and GPT-5.4, producing &lt;STRONG&gt;4 conditions&lt;/STRONG&gt; per test. GPT-5.2 and GPT-5.4 were each tested under two reasoning modes (None vs. High), resulting in four conditions in total.&lt;/P&gt;
&lt;BR /&gt;
&lt;H3&gt;Single-Shot Strategies (Tests 1–5)&lt;/H3&gt;
&lt;P&gt;These tests have &lt;STRONG&gt;no iterative validation loop&lt;/STRONG&gt; — the model gets one prompt and returns its answer. Each test was run &lt;STRONG&gt;5 times&lt;/STRONG&gt; and the results averaged, so the scores reflect consistency, not a single lucky attempt. The differences between tests lie in how spatial information is framed in the prompt.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Test 1&lt;/STRONG&gt; is a simple sanity check: can the model understand percentage-based coordinates at all? The model receives the clean image (no overlay) and is asked: &lt;EM&gt;"return the pixel coordinate at 30% width, 50% height."&lt;/EM&gt; The expected answer is &lt;STRONG&gt;(254, 392)&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;GPT-5.2 gets the X coordinate roughly right (~254–260), but the &lt;STRONG&gt;Y coordinate scatters wildly&lt;/STRONG&gt; — predictions range from 260 to 322, consistently &lt;STRONG&gt;100+ pixels above&lt;/STRONG&gt; the correct position. GPT-5.4 returns &lt;STRONG&gt;(254, 392) on every single run&lt;/STRONG&gt;, essentially pixel-perfect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="blog-figure"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&lt;img&gt;Figure 2 — Test 1 results: GPT-5.2 (left) scatters predictions 100+ pixels above the expected point (red star). GPT-5.4 (right) nails it exactly on all 5 runs.&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="prose lia-align-center"&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;Even on this simple sanity check, the gap is stark: GPT-5.4 is pixel-perfect from the start, while GPT-5.2 shows a clear Y-axis bias. But a single-point test doesn't tell us how well the models handle real spatial tasks. The next question: &lt;STRONG&gt;can they detect a full bounding box?&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;Tests 2–5: Bounding Box Detection with Increasing Prompt Richness&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;&lt;STRONG&gt;Tests 2–5&lt;/STRONG&gt; move to the real task: &lt;STRONG&gt;detecting a bounding box&lt;/STRONG&gt; drawn on the image. Each test sends a different version of the same base image, with progressively richer spatial context in the prompt:&lt;/P&gt;
&lt;P class="lia-align-left"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;Table 1 — Single-shot test descriptions: prompt strategy and input type for Tests 1–5.&lt;/img&gt;&lt;img&gt;Figure 3 — Input images for Tests 2–5, from simplest (orange bbox only) to most structured (numbered grid).&lt;/img&gt;
&lt;DIV class="img-gallery lia-align-left" style="grid-template-columns: 1fr 1fr 1fr;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;Feedback Loop Strategies (Tests 6A–7B)&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;These tests add an &lt;STRONG&gt;iterative validation loop&lt;/STRONG&gt;: the model's predicted bounding box is overlaid on the image and sent back for self-correction — up to 5 iterations (early stop at IoU ≥ 0.99). All feedback tests share the same two-phase structure: an &lt;STRONG&gt;init step&lt;/STRONG&gt; (first prediction) and a &lt;STRONG&gt;validation loop&lt;/STRONG&gt; (iterative correction).&lt;/P&gt;
&lt;P class="lia-align-left"&gt;All feedback tests use the same two images (init + validation overlay), but differ in &lt;EM&gt;prompt strategy&lt;/EM&gt; and &lt;EM&gt;color assignment&lt;/EM&gt;. Image-wise, they fall into two groups:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Group A — Orange GT (Tests 6A, 6C, 7A)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-left"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;Figure 4a — Feedback loop input images. Orange GT box with blue prediction&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Group B — Color Bias / Blue GT (Tests 6B, 6D, 7B)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-clear-both"&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-center"&gt;Figure 4b — Feedback loop input images. &lt;STRONG&gt;Group B&lt;/STRONG&gt; (bottom): colors swapped to test color-role priors.&lt;/P&gt;
&lt;/img&gt;
&lt;DIV class="prose lia-align-center"&gt;&lt;BR /&gt;
&lt;BLOCKQUOTE class="lia-align-left"&gt;&lt;STRONG&gt;What differs between tests in the same group:&lt;/STRONG&gt; The images are identical, but the &lt;EM&gt;prompt&lt;/EM&gt; changes. 6A/6B use holistic comparison ("compare and correct"). 6C/6D additionally send the full history of past predictions as multi-image input. 7A/7B ask for per-edge directional judgments ("move left/right/up/down/none" for each edge independently).&lt;/BLOCKQUOTE&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;Results&lt;/H2&gt;
&lt;/DIV&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;1.&amp;nbsp; Model version is the single biggest factor&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;Across every test, GPT-5.4 dramatically outperforms GPT-5.2. The gap is not incremental — it's the difference between a bounding box that roughly overlaps the target and one that is essentially pixel-perfect. GPT-5.4 achieved an IoU of 0.99 or above on its very first attempt on tests where GPT-5.2 had only scored between 0.76 and 0.88.&lt;/P&gt;
&lt;img&gt;Figure 5 — Single-shot IoU across all 4 conditions. &lt;BR /&gt;GPT-5.4 (green bars) consistently hits ≥0.99 regardless of prompt strategy or reasoning mode. &lt;BR /&gt;GPT-5.2 (blue bars) ranges from 0.76 to 0.92.&lt;/img&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="page lia-align-left"&gt;
&lt;DIV class="prose lia-align-center"&gt;
&lt;DIV class="pull-quote lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;2. GPT-5.2 is inconsistent; GPT-5.4 locks in&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;Raw averages only tell half the story. GPT-5.2 is &lt;STRONG&gt;unpredictable&lt;/STRONG&gt;: on the exact same test with the exact same prompt and image, results fluctuate wildly between runs. The standard deviation on Test 2 is &lt;STRONG&gt;±0.084&lt;/STRONG&gt; — meaning a single run could land anywhere from 0.66 to 0.88. GPT-5.4 stays within &lt;STRONG&gt;±0.003&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;The scatter plots below make this viscerally clear. Each dot is one API call — notice how GPT-5.2 dots spray across the IoU range while GPT-5.4 dots stack on top of each other:&lt;/P&gt;
&lt;img&gt;Figure 6a — GPT-5.2: per-run IoU across Tests 2–5. &lt;BR /&gt;Wide scatter on simpler prompts (Test 2: 0.66–0.88); reasoning mode (orange) provides a lift that shrinks with richer prompts&lt;BR /&gt;(Δmean shown below each panel).&lt;/img&gt;&lt;img&gt;Figure 6b — GPT-5.4: same view — all dots cluster at 0.97–1.0. No meaningful variance, no reasoning benefit.&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="prose lia-align-center"&gt;
&lt;BLOCKQUOTE class="lia-align-left"&gt;&lt;STRONG&gt;Production implication:&lt;/STRONG&gt; With GPT-5.2, you couldn't rely on a single inference call — building a reliable pipeline would require multiple calls and majority voting, multiplying latency and cost. With GPT-5.4, a single call is sufficient.&lt;/BLOCKQUOTE&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;3. Reasoning mode reduced variance for GPT-5.2; GPT-5.4 didn't need it&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;For GPT-5.2, enabling extended reasoning (&lt;CODE&gt;reasoning: high&lt;/CODE&gt;) provided a meaningful boost — especially when the prompt was sparse. On Test 2 (bare image, no spatial context), reasoning added &lt;STRONG&gt;+0.076 IoU&lt;/STRONG&gt; and visibly tightened the spread of results across runs. As prompts got richer, the benefit shrank: with a grid overlay (Test 4), reasoning added only +0.007. In other words, reasoning mode acted as a compensating mechanism — filling in the gaps when the prompt alone didn't provide enough spatial scaffolding.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;For GPT-5.4, reasoning mode &lt;STRONG&gt;offered no additional benefit&lt;/STRONG&gt; on this class of task. The base model already achieves 0.99+ IoU, so there was simply no room for improvement. In a few cases the reasoning runs showed marginal regressions (−0.005 to −0.015), likely within noise. The takeaway isn't that reasoning mode is harmful in general, but rather that &lt;STRONG&gt;a spatial-coordinate task at this complexity level doesn't require it&lt;/STRONG&gt; when the underlying model already has strong coordinate understanding.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;
&lt;P class="lia-align-center"&gt;Figure 7 — Effect of reasoning mode: GPT-5.2 gains +0.04–0.08 from reasoning (blue bars), largest on sparse prompts.&lt;/P&gt;
&lt;P class="lia-align-center"&gt;GPT-5.4 shows no meaningful gain (green bars near zero).&lt;/P&gt;
&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="prose"&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;4. Richer prompts close the gap (but only for GPT-5.2)&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;For GPT-5.2, providing more spatial context in the prompt made a big difference: from 0.765 (Test 2, no info) to 0.910 (Test 4, grid overlay) — a &lt;STRONG&gt;+0.145 IoU gain&lt;/STRONG&gt; just from adding visual reference rulers to the image. Telling the model the image dimensions (Test 3) was a "free win" that cost nothing.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;For GPT-5.4, all prompt variants produce essentially the same result (0.989–0.997). The model already understands spatial coordinates well enough that extra scaffolding adds no value.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-center"&gt;&lt;img&gt;Figure 7 — Prompt information richness: GPT-5.2 climbs steeply as more spatial context is provided. GPT-5.4 is flat at ≥0.99 regardless.&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="page lia-align-left"&gt;
&lt;DIV class="prose"&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;&lt;STRONG&gt;If you're still on GPT-5.2:&lt;/STRONG&gt; Always inject image dimensions into the prompt (free). Use grid overlays for the biggest single-shot gain (+0.145 IoU). With GPT-5.4, none of this is needed.&lt;/BLOCKQUOTE&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;5. Validation loops: essential for GPT-5.2, Option for GPT-5.4&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;The feedback loop tests (6A–7B) showed that iterative self-correction genuinely helped GPT-5.2 improve from its initial prediction. For example, in Test 7A (directional feedback), GPT-5.2 improved from an init IoU of &lt;STRONG&gt;0.926 to a best of 0.969&lt;/STRONG&gt; over 5 iterations.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;For GPT-5.4, &lt;STRONG&gt;every single run hit IoU ≥ 0.99 on iteration 1 and early-stopped immediately&lt;/STRONG&gt;. There was nothing left to correct. The validation loop infrastructure — overlay rendering, multi-turn prompting, iteration logic — becomes dead code you can remove from your pipeline.&lt;/P&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-center"&gt;&lt;img&gt;Figure 8 — Validation loop effect (No Reasoning): GPT-5.2 init (light blue) improves to best (dark blue) over iterations. &lt;BR /&gt;GPT-5.4 (green) starts at ≥0.99 and early-stops at iteration 1.&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H3&gt;6. Prompt instruction matters: holistic vs directional feedback&lt;/H3&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;Comparing 6A/6B (holistic: "compare the two boxes and correct") with 7A/7B (directional: "for each edge, decide which direction to move"), the directional approach consistently reached higher best IoU for GPT-5.2. The per-edge structured output forced the model to reason about each boundary independently rather than making a holistic guess.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;Separately, the &lt;STRONG&gt;color bias tests&lt;/STRONG&gt; (6B, 7B — GT drawn in blue instead of orange) revealed that swapping GT/prediction colors drops the initial accuracy significantly. In 6A (orange GT) the init IoU was &lt;STRONG&gt;0.937&lt;/STRONG&gt;, but in 6B (blue GT) it dropped to &lt;STRONG&gt;0.850&lt;/STRONG&gt;. This suggests GPT models have learned color-role priors — orange is "expected" as the ground truth color.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;However, the validation loop largely recovers this gap: after 5 iterations, 6A and 6B converge to similar best IoU (~0.96). The directional variants (7A, 7B) show the same pattern but converge faster.&lt;/P&gt;
&lt;P class="lia-align-left"&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="blog-figure lia-align-center"&gt;&lt;img&gt;Figure 9 — Color bias effect on GPT-5.2 (No Reasoning). &lt;BR /&gt;Left: initial accuracy drops when GT is drawn in blue. &lt;BR /&gt;Right: after the validation loop, the gap closes. Directional feedback (7A/7B) shows the same pattern.&lt;/img&gt;&lt;/DIV&gt;
&lt;DIV class="prose"&gt;
&lt;DIV class="blog-figure lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;BLOCKQUOTE class="lia-align-left"&gt;&lt;STRONG&gt;For GPT-5.4:&lt;/STRONG&gt; Color bias has no measurable effect. All variants (6A/6B/7A/7B) hit 0.994–0.998 IoU on iteration 1 regardless of color assignment.&lt;/BLOCKQUOTE&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;Summary: What Changed from GPT-5.2 to GPT-5.4&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;The story of this benchmark is really about &lt;STRONG&gt;engineering workarounds that became unnecessary&lt;/STRONG&gt;. Here's what we built for GPT-5.2 and whether you still need it:&lt;/P&gt;
&lt;UL class="lia-align-left"&gt;
&lt;LI&gt;&lt;STRONG&gt;Grid overlays &amp;amp; image dimensions in prompt&lt;/STRONG&gt; — Gave +0.05–0.15 IoU for GPT-5.2. &lt;EM&gt;Not needed for GPT-5.4&lt;/EM&gt; (already ≥0.99 without it).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Extended reasoning mode&lt;/STRONG&gt; — Gave +0.04–0.08 IoU for GPT-5.2. &lt;EM&gt;No benefit for GPT-5.4 on this task&lt;/EM&gt; (already at ceiling without it).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Validation loops (iterative self-correction)&lt;/STRONG&gt; — Improved GPT-5.2 by +0.02–0.10 IoU over 5 iterations. &lt;EM&gt;Unnecessary for GPT-5.4&lt;/EM&gt; (early-stops at iteration 1).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Multiple runs &amp;amp; voting&lt;/STRONG&gt; — Required for GPT-5.2 due to ±0.08 variance. &lt;EM&gt;Not needed for GPT-5.4&lt;/EM&gt; (±0.003 variance, single call sufficient).&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Color convention management&lt;/STRONG&gt; — GPT-5.2 showed color bias (−0.09 IoU when colors swapped). &lt;EM&gt;No effect on GPT-5.4&lt;/EM&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="pull-quote lia-align-left"&gt;GPT-5.4 doesn't just perform better — it makes entire categories of pipeline engineering unnecessary. For clean, CAD-style images like the ones tested here, GPT-5.4 dramatically reduces prompt engineering overhead: grid overlays, image dimension injection, reasoning mode, and validation loops — all of which required deliberate effort with GPT-5.2 — are no longer necessary. This translates directly to simpler pipelines, lower latency, and lower cost. That said, for more complex scenarios — multiple overlapping panels, cluttered backgrounds, or ambiguous region boundaries — iterative validation loops could still prove valuable, and we plan to explore this in future work.&lt;/DIV&gt;
&lt;DIV class="pull-quote lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;This benchmark started as a sanity check and turned into a clear signal: &lt;STRONG&gt;GPT-5.4 represents a genuine leap in spatial coordinate understanding&lt;/STRONG&gt;, not just a marginal iteration. The gap between 0.765 and 0.997 IoU on an identical task is the difference between a prototype experiment and a production-ready component.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-align-left"&gt;
&lt;H2&gt;Try It Yourself&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left"&gt;Ready to explore GPT-5.4's spatial precision capabilities? Here are ways to get started:&lt;/P&gt;
&lt;UL class="lia-align-left"&gt;
&lt;LI&gt;&lt;A style="color: var(--accent);" href="https://github.com/" target="_blank" rel="noopener nofollow noreferrer"&gt;Sample notebooks for bounding box extraction&lt;/A&gt; test :&amp;nbsp; &lt;A class="lia-external-url" href="https://github.com/jihys/cad-image-understanding" target="_blank" rel="noopener"&gt;github&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Read the companion post: &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/extracting-boms-from-electrical-drawings-with-ai-azure-openai-gpt-5--azure-docum/4506891?previewMessage=true" target="_blank" rel="noopener" data-lia-auto-title="Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5 + Azure Document Intelligence" data-lia-auto-title-active="0"&gt;&lt;EM&gt;Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5 + Azure Document Intelligence&lt;/EM&gt;&lt;/A&gt; — See how this benchmark informed a production pipeline&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="page lia-align-left"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;!-- /page --&gt;</description>
      <pubDate>Thu, 23 Apr 2026 19:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/gpt-capability-in-understanding-coordinates-how-gpt-5-4/ba-p/4506726</guid>
      <dc:creator>jihyeseo</dc:creator>
      <dc:date>2026-04-23T19:00:00Z</dc:date>
    </item>
    <item>
      <title>Automate Prior Authorization with AI Agents - Now Available as a Foundry Template</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/automate-prior-authorization-with-ai-agents-now-available-as-a/ba-p/4513432</link>
      <description>&lt;P&gt;&lt;STRONG&gt;By Amit Mukherjee&lt;/STRONG&gt; · Principal Solutions Engineer, Microsoft Health &amp;amp; Life Sciences&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Lindsey Craft-Goins&lt;/STRONG&gt; · Technology Leader - Cloud &amp;amp; AI Platforms, Health &amp;amp; Life Sciences&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Joel Borellis&lt;/STRONG&gt; · Director Solutions Engineering - Cloud &amp;amp; AI Platforms, Health &amp;amp; Life Sciences&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Prior authorization (PA) is one of the most expensive bottlenecks in U.S. healthcare. Physicians complete an average of &lt;STRONG&gt;39 PA requests per week&lt;/STRONG&gt;, spending roughly &lt;STRONG&gt;13 hours of physician-and-staff time&lt;/STRONG&gt; on PA-related work (&lt;A href="https://www.ama-assn.org/system/files/prior-authorization-survey.pdf" target="_blank"&gt;AMA 2024 Prior Authorization Physician Survey&lt;/A&gt;). Turnaround averages 5–14 business days, and PA alone accounts for an estimated &lt;STRONG&gt;$35 billion in annual administrative spending&lt;/STRONG&gt; (Sahni et al., &lt;A href="https://academic.oup.com/healthaffairsscholar/article/2/9/qxae096/7727862" target="_blank"&gt;&lt;EM&gt;Health Affairs Scholar&lt;/EM&gt;&lt;/A&gt;, 2024).&lt;/P&gt;
&lt;P&gt;The regulatory clock is now ticking. &lt;A href="https://www.cms.gov/newsroom/fact-sheets/cms-interoperability-prior-authorization-final-rule-cms-0057-f" target="_blank"&gt;&lt;STRONG&gt;CMS-0057-F&lt;/STRONG&gt;&lt;/A&gt; mandates electronic PA with 72-hour urgent response starting in 2026. Forty-nine states plus DC already have PA laws on the books, and at least half of all U.S. state legislatures introduced new PA reform bills this year, including laws specifically targeting AI use in PA decisions (&lt;A href="https://kffhealthnews.org/morning-breakout/ai-streamlines-prior-authorizations-and-billing-but-raises-costs-report/" target="_blank"&gt;KFF Health News&lt;/A&gt;, April 2026).&lt;/P&gt;
&lt;P&gt;Today we’re making the &lt;STRONG&gt;Prior Authorization Multi-Agent Solution Accelerator&lt;/STRONG&gt; available as a &lt;STRONG&gt;Microsoft Foundry template&lt;/STRONG&gt;. Health plan payers can deploy a working, four-agent PA review pipeline to Azure using the Azure Developer CLI (“azd”) with a single command in supported environments, then customize it to their policies, workflows, and EHR environment.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Try it now: &lt;/STRONG&gt;Find the template in the Foundry template gallery, or clone directly from &lt;A href="https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator" target="_blank"&gt;&lt;STRONG&gt;github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator&lt;/STRONG&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H1&gt;What the template delivers&lt;/H1&gt;
&lt;P&gt;The accelerator deploys &lt;STRONG&gt;four specialist Foundry &lt;/STRONG&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/hosted-agents" target="_blank"&gt;&lt;STRONG&gt;hosted agents&lt;/STRONG&gt;&lt;/A&gt; (Compliance, Clinical Reviewer, Coverage, and Synthesis), each independently containerized and managed by Foundry. In internal testing with synthetic demo cases, the pipeline reduced review workflow, from beginning to completion in &lt;STRONG&gt;under 5 minutes per case&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Agent&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Role&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Key capability&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Compliance&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Documentation check&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;10-item checklist with blocking/non-blocking flags&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Clinical Reviewer&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Clinical evidence&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;ICD-10 validation, PubMed + ClinicalTrials.gov search&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Coverage&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Policy matching&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;CMS NCD/LCD lookup, per-criterion MET/NOT_MET mapping&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Synthesis&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Decision rubric&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3-gate APPROVE/PEND with weighted confidence scoring&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Compliance and Clinical run &lt;STRONG&gt;in parallel&lt;/STRONG&gt;. Coverage runs after clinical findings are ready. Synthesis evaluates all three outputs through a three-gate rubric. The result is a structured recommendation with per-criterion confidence scores and a full audit trail, not a black-box answer.&lt;/P&gt;
&lt;H1&gt;Solution architecture&lt;/H1&gt;
&lt;P&gt;The accelerator runs entirely on Azure. The frontend and backend deploy as Azure Container Apps. The four specialist agents are hosted by Microsoft Foundry. Real-time healthcare data flows through third-party MCP servers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Figure 1: Azure solution architecture&lt;/EM&gt;&lt;/P&gt;
&lt;H1&gt;How the pipeline works&lt;/H1&gt;
&lt;P&gt;The four agents execute in a structured parallel-then-sequential pipeline. Compliance and Clinical run simultaneously in Phase 1. Coverage runs after clinical findings are ready. The Synthesis agent applies a three-gate decision rubric over all prior outputs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Figure 2: Agentic architecture, hosted agent pipeline&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Compliance and Clinical run in parallel via asyncio.gather, since neither depends on the other. Coverage runs sequentially after Clinical because it needs the structured clinical profile for criterion mapping. Synthesis evaluates all three outputs through a three-gate rubric (Provider, Codes, Medical Necessity) with weighted confidence scoring: &lt;STRONG&gt;40% coverage criteria + 30% clinical extraction + 20% compliance + 10% policy match&lt;/STRONG&gt;. The total pipeline time is bound by the slowest parallel agent plus the sequential agents, not the sum. In internal testing with synthetic demo cases, this architecture indicated materially reduced processing time compared to sequential manual workflows.&lt;/P&gt;
&lt;H1&gt;Under the hood&lt;/H1&gt;
&lt;P&gt;For the architect in the room, here are four design decisions worth knowing about:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry hosted agents: &lt;/STRONG&gt;Each agent is independently containerized, versioned, and managed by Foundry’s runtime. The FastAPI backend is a pure HTTP dispatcher. All reasoning happens inside the agent containers, and there are no code changes between local (Docker Compose) and production (Foundry); the environment variable is the only switch.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Structured output: &lt;/STRONG&gt;Every agent uses MAF’s response_format enforcement to produce typed Pydantic schemas at the token level. No JSON parsing, no malformed fences, no free-form text. The orchestrator receives typed Python objects; the frontend receives a stable API contract.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Keyless security: &lt;/STRONG&gt;DefaultAzureCredential throughout, so no API keys are stored anywhere. Managed Identity handles production; azd tokens handle local development. Role assignments are provisioned automatically by Bicep at deploy time.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Observability: &lt;/STRONG&gt;All agents emit OpenTelemetry traces to Azure Application Insights. The Foundry portal shows per-agent spans correlated by case ID. End-to-end latency, per-agent contribution, and error rates are visible from day one with no additional configuration.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;For the full architecture documentation, agent specifications, Pydantic schemas, and extension guides, see the &lt;A href="https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator" target="_blank"&gt;&lt;STRONG&gt;GitHub repository&lt;/STRONG&gt;&lt;/A&gt;.&lt;/P&gt;
&lt;H1&gt;Why this matters now&lt;/H1&gt;
&lt;H2&gt;Human-in-the-loop by design&lt;/H2&gt;
&lt;P&gt;The system runs in &lt;STRONG&gt;LENIENT mode&lt;/STRONG&gt; by default: it produces only APPROVE or PEND and is &lt;STRONG&gt;not designed to produce automated DENY outcomes in its default configuration&lt;/STRONG&gt;. Every recommendation requires a clinician to Accept or Override with documented rationale before the decision is finalized. Override records flow to the audit PDF, notification letters, and downstream systems. This directly addresses the emerging wave of state legislation governing AI use in PA decisions.&lt;/P&gt;
&lt;H2&gt;Domain experts own the rules&lt;/H2&gt;
&lt;P&gt;Agent behavior is defined in &lt;STRONG&gt;markdown skill files&lt;/STRONG&gt;, not Python code. When CMS updates a coverage determination or a plan changes its commercial policy, a clinician or compliance officer edits a text file and redeploys. No engineering PR required.&lt;/P&gt;
&lt;H2&gt;Real-time healthcare data via MCP&lt;/H2&gt;
&lt;P&gt;Agents connect to &lt;STRONG&gt;five MCP servers&lt;/STRONG&gt; for real-time data: ICD-10 codes, NPI Registry, CMS Coverage policies, PubMed, and ClinicalTrials.gov. This incorporates real‑time clinical reference data sources to inform agent recommendations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;Third-party MCP servers are included for demonstration with synthetic data only. Their inclusion does not constitute an endorsement by Microsoft. See the GitHub repository for production migration guidance.&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;Audit-ready from day one&lt;/H2&gt;
&lt;P&gt;Every case generates an &lt;STRONG&gt;8-section audit justification PDF&lt;/STRONG&gt; with per-criterion evidence, data source attribution, timestamps, and confidence breakdowns. Clinician overrides are recorded in Section 9. Notification letters (approval and pend) are generated automatically. These artifacts are designed to support CMS-0057-F documentation requirements.&lt;/P&gt;
&lt;H1&gt;Deploy in under 15 minutes&lt;/H1&gt;
&lt;P&gt;From the Foundry template gallery or from the command line:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;git clone https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator&lt;/P&gt;
&lt;P&gt;cd Prior-Authorization-Multi-Agent-Solution-Accelerator&lt;/P&gt;
&lt;P&gt;azd up&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That single command provisions Foundry, Azure Container Registry, Container Apps, builds all Docker images, registers the four agents, and runs health checks. The demo is live with a synthetic sample case as soon as deployment completes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;What’s included&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;What you customize&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;4 Foundry hosted agents&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Payer-specific coverage policies&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;FastAPI orchestrator + Next.js frontend&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;EHR/FHIR integration for clinical notes&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;5 MCP healthcare data connections&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Self-hosted MCP servers for production PHI&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Audit PDF + notification letter generation&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Authentication (Microsoft Entra ID)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Full Bicep infrastructure-as-code&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Persistent storage (Cosmos DB / PostgreSQL)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;OpenTelemetry + App Insights observability&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Additional agents (Pharmacy, Financial)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H1&gt;Built on&lt;/H1&gt;
&lt;P&gt;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt; + &lt;STRONG&gt;Foundry hosted agents&lt;/STRONG&gt; · &lt;STRONG&gt;Microsoft Agent Framework (MAF)&lt;/STRONG&gt; · Azure OpenAI gpt-5.4 · Azure Container Apps · Azure Developer CLI + Bicep · OpenTelemetry + Azure Application Insights · DefaultAzureCredential (keyless, no secrets)&lt;/P&gt;
&lt;P&gt;Full architecture documentation, agent specifications, and extension guides are in the &lt;A href="https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator" target="_blank"&gt;&lt;STRONG&gt;GitHub repository&lt;/STRONG&gt;&lt;/A&gt;.&lt;/P&gt;
&lt;H1&gt;Get started&lt;/H1&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry template gallery: &lt;/STRONG&gt;Search “AI-Powered Prior Authorization for Healthcare” in the Foundry template section&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;GitHub: &lt;/STRONG&gt;&lt;A href="https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator" target="_blank"&gt;github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Disclaimers&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Not a medical device.&lt;/STRONG&gt; This solution accelerator is not a medical device, is not FDA-cleared, and is not intended for autonomous clinical decision-making. All AI recommendations require qualified clinical review before any authorization decision is finalized.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Not production-ready software.&lt;/STRONG&gt; This is an open-source reference architecture (MIT License), not a supported Microsoft product. Customers are solely responsible for testing, validation, regulatory compliance, security hardening, and production deployment.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Performance figures are illustrative.&lt;/STRONG&gt; Metrics cited (including processing time reductions) are based on internal testing with synthetic demo data. Actual results will vary based on case complexity, infrastructure, and configuration.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Third-party services included for demonstration only; not endorsed by Microsoft&lt;/STRONG&gt;. Customers should evaluate providers against their compliance and data residency requirements.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The demo uses synthetic data only.&lt;/STRONG&gt; Customers deploying real patient data are responsible for HIPAA compliance and establishing appropriate Business Associate Agreements.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;This accelerator is intended to help customers&lt;/STRONG&gt; align documentation workflows with CMS‑0057‑F requirements but has not been independently validated or certified for regulatory compliance.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Apr 2026 16:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/automate-prior-authorization-with-ai-agents-now-available-as-a/ba-p/4513432</guid>
      <dc:creator>amimukherjee</dc:creator>
      <dc:date>2026-04-23T16:00:00Z</dc:date>
    </item>
    <item>
      <title>Failed to add tool to agent - Preview Feature Required?</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/failed-to-add-tool-to-agent-preview-feature-required/m-p/4514092#M1451</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We’ve recently run into an issue where we’re no longer able to add tools to our Foundry agent. This was previously working without problems in our development environment, but now every attempt results in the following error:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;“Failed to add tool to agent Request failed with status code 403.”&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;After inspecting the request in the browser’s developer console, we noticed an additional message:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;"This operation requires the following opt-in preview feature(s): AgentEndpoints=V1Preview. Include the 'Foundry-Features: AgentEndpoints=V1Preview' header in your request."&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can we opt in for this foundry preview feature? and when was this change introduced?&lt;BR /&gt;We are unsure if the issue is related the the preview feature missing, or some other forbidden issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be very much appreciated.&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Arne&lt;/P&gt;</description>
      <pubDate>Thu, 23 Apr 2026 15:09:40 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/failed-to-add-tool-to-agent-preview-feature-required/m-p/4514092#M1451</guid>
      <dc:creator>ArneVG</dc:creator>
      <dc:date>2026-04-23T15:09:40Z</dc:date>
    </item>
    <item>
      <title>Three tiers of Agentic AI - and when to use none of them</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/three-tiers-of-agentic-ai-and-when-to-use-none-of-them/ba-p/4510377</link>
      <description>&lt;H3&gt;Every enterprise has an AI agent. Almost none of them work in production.&lt;/H3&gt;
&lt;P&gt;Walk into any enterprise technology review right now and you will find the same thing. Pilots running. Demos recorded. Steering committees impressed. And somewhere in the background, a quiet acknowledgment that the thing does not actually work at scale yet.&lt;/P&gt;
&lt;P&gt;OutSystems surveyed nearly 1,900 global IT leaders and found that 96% of organizations are already running AI agents in some capacity. Yet only one in nine has those agents operating in production at scale. The experiments are everywhere. The production systems are not.&lt;/P&gt;
&lt;P&gt;That gap is not a capability problem. The infrastructure has matured. Tool calling is standard across all major models. Frameworks like LangGraph, CrewAI, and Microsoft Agent Framework abstract orchestration logic. Model Context Protocol standardizes how agents access external tools and data sources. Google's Agent-to-Agent protocol now under Linux Foundation governance with over 50 enterprise technology partners including Salesforce, SAP, ServiceNow, and Workday standardizes how agents coordinate with each other. The protocols are in place. The frameworks are production ready.&lt;/P&gt;
&lt;P&gt;The gap is a selection and governance problem.&lt;/P&gt;
&lt;P&gt;Teams are building agents on problems that do not need them. Choosing the wrong tier for the ones that do. And treating governance as a compliance checkbox to add after launch, rather than an architectural input to design in from the start. The same OutSystems research found that 94% of organizations are concerned that AI sprawl is increasing complexity, technical debt, and security risk and only 12% have a centralized approach to managing it. Teams are deploying agents the way shadow IT spread through enterprises a decade ago: fast, fragmented, and without a shared definition of what production-ready actually means.&lt;/P&gt;
&lt;P&gt;I've built agentic systems across enterprise clients in logistics, retail, and B2B services. The failures I keep seeing are not technology failures. They are architecture and judgment failures problems that existed before the first line of code was written, in the conversation where nobody asked the prior question.&lt;/P&gt;
&lt;P&gt;This article is the framework I use before any platform conversation starts.&lt;/P&gt;
&lt;H3&gt;What has genuinely shifted in the agentic landscape&lt;/H3&gt;
&lt;P&gt;Three changes are shaping how enterprise agent architecture should be designed today and they are not incremental improvements on what existed before.&lt;/P&gt;
&lt;P&gt;The first is the move from single agents to multi-agent systems. Databricks' State of AI Agents report drawing on data from over 20,000 organizations, including more than 60% of the Fortune 500 found that multi-agent workflows on their platform grew 327% in just four months. This is not experimentation. It is production architecture shifting. A single agent handling everything routing, retrieval, reasoning, execution is being replaced by specialized agents coordinating through defined interfaces. A financial organization, for example, might run separate agents for intent classification, document retrieval, and compliance checking each narrow in scope, each connected to the next through a standardized protocol rather than tightly coupled code.&lt;/P&gt;
&lt;P&gt;The second is protocol standardization. MCP handles vertical connectivity how agents access tools, data sources, and APIs through a typed manifest and standardized invocation pattern. A2A handles horizontal connectivity how agents discover peer agents, delegate subtasks, and coordinate workflows. Production systems today use both. The practical consequence is that multi-agent architectures can be composed and governed as a platform rather than managed as a collection of one-off integrations.&lt;/P&gt;
&lt;P&gt;The third is governance as the differentiating factor between teams that ship and teams that stall. Databricks found that companies using AI governance tools get over 12 times more AI projects into production compared to those without. The teams running production agents are not running more sophisticated models. They built evaluation pipelines, audit trails, and human oversight gates before scaling not after the first incident.&lt;/P&gt;
&lt;H3&gt;Tier 1 - Low-code agents: fast delivery with a defined ceiling&lt;/H3&gt;
&lt;P&gt;The low-code tier is more capable than it was eighteen months ago. Copilot Studio, Salesforce Agentforce, and equivalent platforms now support richer connector libraries, better generative orchestration, and more flexible topic models. The ceiling is higher than it was. It is still a ceiling.&lt;/P&gt;
&lt;P&gt;The core pattern remains: a visual topic model drives a platform-managed LLM that classifies intent and routes to named execution branches. Connectors abstract credential management and API surface. A business team — analyst, citizen developer, IT operations — can build, deploy, and iterate without engineering involvement on every change. For bounded conversational problems, this is the fastest path from requirement to production.&lt;/P&gt;
&lt;P&gt;The production reality is documented clearly. Gartner data found that only 5% of Copilot Studio pilots moved to larger-scale deployment. A European telecom with dedicated IT resources and a full Microsoft enterprise agreement spent six months and did not deliver a single production agent. The visual builder works. The path from prototype to production, production-grade integrations, error handling, compliance logging, exception routing is where most enterprises get stuck, because it requires Power Platform expertise that most business teams do not have.&lt;/P&gt;
&lt;P&gt;The platform ceiling shows up predictably at four points. Async processing anything beyond a synchronous connector call, including approval chains, document pipelines, or batch operations cannot be handled natively. Full payload audit logs platform logs give conversation transcripts and connector summaries, not structured records of every API call and its parameters. Production volume concurrency limits and message throughput budgets bind faster than planning assumptions suggest. Root cause analysis in production you cannot inspect the LLM's confidence score or the alternatives it considered, which makes diagnosing misbehavior significantly harder than it should be.&lt;/P&gt;
&lt;P&gt;The correct diagnostic: can this use case be owned end-to-end by a business team, covered by standard connectors, with no latency SLA below three seconds and no payload-level compliance requirement? Yes, low code is the correct tier. Not a compromise. If no on any point, continue.&lt;/P&gt;
&lt;P&gt;If low-code is the right call for your use case: &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/fundamentals-get-started" target="_blank" rel="noopener"&gt;Copilot Studio quickstart&lt;/A&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Tier 2 - Pro-code agents: the architecture the current landscape demands&lt;/H3&gt;
&lt;P&gt;The defining pattern in production pro-code architecture today is multi-agent. Specialized agents per domain, coordinating through MCP for tool access and A2A for peer-to-peer delegation, with a governance layer spanning the entire system.&lt;/P&gt;
&lt;P&gt;What this looks like in practice: a financial organization handling incoming compliance queries runs separate agents for intent classification, document retrieval, and the compliance check itself. None of these agents tries to do all three jobs. Each has a narrow responsibility, a defined input/output contract typed against a JSON Schema, and a clear handoff boundary. The 327% growth in multi-agent workflows reflects production teams discovering that the failure modes of monolithic agents topic collision, context overflow, degraded classification as scope expands are solved by specialization, not by making a single agent more capable.&lt;/P&gt;
&lt;P&gt;The discipline that makes multi-agent systems reliable is identical to what makes single-agent systems reliable, just enforced across more boundaries: the LLM layer reasons and coordinates; deterministic tool functions enforce. In a compliance pipeline, no LLM decides whether a document satisfies a regulatory requirement. That evaluation runs in a deterministic tool with a versioned rule set, testable outputs, and an immutable audit log. The LLM orchestrates the sequence. The tool produces the compliance record. Mixing these letting an LLM evaluate whether a rule pass collapses the audit trail and introduces probabilistic outputs on questions that have regulatory answers.&lt;/P&gt;
&lt;P&gt;MCP is the tool interface standard today. An MCP server exposes a typed manifest any compliant agent runtime can discover at startup. Tools are versioned, independently deployable, and reusable across agents without bespoke integration code. A2A extends this horizontally: agents advertise capability cards, discover peers, and delegate subtasks through a standardised protocol. The practical consequence is that multi-agent systems built on both protocols can be composed and governed as a platform rather than managed as a collection of one-off integrations.&lt;/P&gt;
&lt;P&gt;Observability is the architectural element that separates teams shipping production agents from teams perpetually in pilot. Build evaluation pipelines, distributed traces across all agent boundaries, and human review gates before scaling. The teams that add these after the first production incident spend months retrofitting what should have been designed in.&lt;/P&gt;
&lt;P&gt;If pro-code is the right call for your use case: &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/foundry/agents/overview" target="_blank" rel="noopener"&gt;Foundry Agent Service&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;The hybrid pattern: still where production deployments land&lt;/H3&gt;
&lt;P&gt;The shift to multi-agent architecture does not change the hybrid pattern it deepens it. Low-code at the conversational surface, pro-code multi-agent systems behind it, with a governance layer spanning both.&lt;/P&gt;
&lt;P&gt;On a logistics client engagement, the brief was a sales assistant for account managers shipment status, account health, and competitive context inside Teams. The business team wanted everything in Copilot Studio. Engineering wanted a custom agent runtime. Both were wrong.&lt;/P&gt;
&lt;P&gt;What we built: Copilot Studio handled all high-frequency, low-complexity queries shipment tracking, account status, open cases through Power Platform connectors. Zero custom code. That covered roughly 78% of actual interaction volume. Requests requiring multi-source reasoning competitive positioning on a specific lane, churn risk across an account portfolio, contract renewal analysis delegated via authenticated HTTP action to a pro-code multi-agent service on Azure. A retrieval agent pulled deal history and market intelligence through MCP-exposed tools. A synthesis agent composed the recommendation with confidence scoring. Structured JSON back to the low-code layer, rendered as an adaptive card in Teams.&lt;/P&gt;
&lt;P&gt;The HITL gate was non-negotiable and designed before deployment, not added after the first incident. No output reached a customer without a manager approval step. The agent drafts. A human sends.&lt;/P&gt;
&lt;P&gt;This boundary low-code for conversational volume, pro-code for reasoning depth maps directly to what the research shows separates teams that ship from teams that stall. The organizations running agents in production drew the line correctly between what the platform can own and what engineering needs to own. Then they built governance into both sides before scaling.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;The four gates - the prior question that still gets skipped&lt;/H3&gt;
&lt;P&gt;Run every candidate use case through these four checks before the platform conversation begins. None of the recent infrastructure improvements change what they are checking, because none of them change the fundamental cost structure of agentic reasoning.&lt;/P&gt;
&lt;P&gt;Gate 1 - is the logic fully deterministic? If every valid output for every valid input can be enumerated in unit tests, the problem does not need an LLM. A rules engine executes in microseconds at zero inference cost and cannot produce a plausible-but-wrong answer. NeuBird AI's production ops agents which have resolved over a million alerts and saved enterprises over $2 million in engineering hours work because alert triage logic that can be expressed as rules runs in deterministic code, and the LLM only handles cases where pattern-matching is insufficient. That boundary is not incidental to the system's reliability. It is the reason for it.&lt;/P&gt;
&lt;P&gt;Gate 2 - is zero hallucination tolerance required? With over 80% of databases now being built by AI agents per Databricks' State of AI Agents report the surface area for hallucination-induced data errors has grown significantly. In domains where a wrong answer is a compliance event financial calculation, medical logic, regulatory determinations irreducible LLM output uncertainty is disqualifying regardless of model version or prompt engineering effort. Exit to deterministic code or classical ML with bounded output spaces.&lt;/P&gt;
&lt;P&gt;Gate 3 - is a sub-100ms latency SLA required? LLM inference is faster than it was eighteen months ago. It is not fast enough for payment transaction processing, real-time fraud scoring, or live inventory management. A three-agent system with MCP tool calls has a P50 latency measured in seconds. These problems need purpose-built transactional architecture.&lt;/P&gt;
&lt;P&gt;Gate 4 - is regulatory explainability required? A2A enables complex agent coordination and delegation. It does not make LLM reasoning reproducible in a regulatory sense. Temperature above zero means the same input produces different outputs across invocations. Regulators in financial services, healthcare, and consumer credit require deterministic, auditable decision rationale. Exit to deterministic workflow with structured audit logging at every&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Five production failure modes - one of them new&lt;/H3&gt;
&lt;P&gt;The four original anti-patterns are still showing up in production. A fifth has been added by scale.&lt;/P&gt;
&lt;P&gt;Routing data retrieval through a reasoning loop. A direct API call returns account status in under 10ms. Routing the same request through an LLM reasoning step adds hundreds of milliseconds, consumes tokens on every call, and introduces output parsing on data that is already structured. The agent calls a structured tool. The tool calls the API. The agent never acts as the integration layer.&lt;/P&gt;
&lt;P&gt;Encoding business rules in prompts. Rules expressed in prompt text drift as models update. They produce probabilistic output across invocations and fail in ways that are difficult to reproduce and diagnose. A rule that must evaluate correctly every time belongs in a deterministic tool function unit-tested, version-controlled, independently deployable via MCP.&lt;/P&gt;
&lt;P&gt;No approval gate on CRUD operations. CRUD operations without a human approval step will eventually misfire on the input that testing did not cover. The gate needs to be designed before deployment, not added after the first incident involving a financial posting, a customer-facing communication, or a data deletion.&lt;/P&gt;
&lt;P&gt;Monolithic agent for all domains. A single agent accumulating every domain leads predictably to topic collision, context overflow, and maintenance that becomes impossible as scope expands. Specialized agents per domain, coordinating through A2A, is the architecture that scales.&lt;/P&gt;
&lt;P&gt;Ungoverned agent sprawl. This is the new one and currently the most prevalent. OutSystems found 94% of organizations concerned about it, with only 12% having a centralized response. Teams building agents independently across fragmented stacks, without shared governance, evaluation standards, or audit infrastructure, produce exactly the same organizational debt that shadow IT created but with higher stakes, because these systems make autonomous decisions rather than just storing and retrieving data. The fix is treating governance as an architectural input before deployment, not a compliance requirement after something breaks.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;The infrastructure is ready. The judgment is not.&lt;/H3&gt;
&lt;P&gt;The tier decision sequence has not changed. Does the problem need natural language understanding or dynamic generation? No — deterministic system, stop. Can a business team own it through standard connectors with no sub-3-second latency SLA and no payload-level compliance requirement? Yes — low-code. Does it need custom orchestration, multi-agent coordination, or audit-grade observability? Yes — pro-code with MCP and A2A. Does it need both a conversational surface and deep backend reasoning? Hybrid, with a governance layer spanning both.&lt;/P&gt;
&lt;P&gt;What has changed is that governance is no longer optional infrastructure to add when you have time. The data is unambiguous. Companies with governance tools get over 12 times more AI projects into production than those without. Evaluation pipelines, distributed tracing across agent boundaries, human oversight gates, and centralised agent lifecycle management are not overhead. They are what converts experiments into production systems. The teams still stuck in pilot are not stuck because the technology failed them. They are stuck because they skipped this layer.&lt;/P&gt;
&lt;P&gt;The protocols are standardised. The frameworks are mature. The infrastructure exists. None of that is what is holding most enterprise agent programmes back.&lt;/P&gt;
&lt;P&gt;What is holding them back is a selection problem disguised as a technology problem — teams building agents before asking whether agents are warranted, choosing platforms before running the four gates, and treating governance as a checkpoint rather than an architectural input.&lt;/P&gt;
&lt;P&gt;I have built agents that should have been workflow engines. Not because the technology was wrong, but because nobody stopped early enough to ask whether it was necessary. The four gates in this article exist because I learned those lessons at clients' expense, not mine. The most useful thing I can offer any team starting an agentic AI project is not a framework selection guide. It is permission to say no — and a clear basis for saying it. &lt;STRONG&gt;Take the four gates framework to your next architecture review&lt;/STRONG&gt;. If you have already shipped agents to production, I would like to hear what worked and what did not - comment below&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;What to do next&lt;/H3&gt;
&lt;P&gt;Three concrete steps depending on where you are right now.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;If you have pilots that have not reached production:&lt;/STRONG&gt; Run them through the four gates in this article before the next sprint. Gate 1 alone will eliminate a meaningful percentage of them. The ones that survive all four are your real candidates for production investment. Download the attached file for gated checklist and take it into your next architecture review.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;If you are starting a new agent project:&lt;/STRONG&gt; Do not open a platform before you have answered the gate questions. Once you have confirmed an agent is warranted and identified the tier, start here: &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/fundamentals-get-started" target="_blank" rel="noopener"&gt;Copilot Studio guided setup&lt;/A&gt; for low-code scenarios, or &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/ai-services/agents/overview" target="_blank" rel="noopener"&gt;Foundry Agent Service&lt;/A&gt; for pro-code patterns with MCP and multi-agent coordination built in. Build governance infrastructure - evaluation pipeline, distributed tracing, HITL gates - before you scale, not after.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;If you have already shipped agents to production:&lt;/STRONG&gt; Share what worked and what did not in the &lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-category" href="https://techcommunity.microsoft.com/category/azure-ai-services" target="_blank" rel="noopener" data-lia-auto-title="Azure AI Tech Community" data-lia-auto-title-active="0"&gt;Azure AI Tech Community&lt;/A&gt; — tag posts with #AgentArchitecture. The most useful signal for teams still in pilot is hearing from practitioners who have been through production, not vendors describing what production should look like.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;References&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;OutSystems — State of AI Development Report - &lt;A href="https://www.outsystems.com/1/state-ai-development-report" target="_blank" rel="noopener"&gt;https://www.outsystems.com/1/state-ai-development-report&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Databricks — State of AI Agents Report - &lt;A href="https://www.databricks.com/resources/ebook/state-of-ai-agents" target="_blank" rel="noopener"&gt;https://www.databricks.com/resources/ebook/state-of-ai-agents&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Gartner — 2025 Microsoft 365 and Copilot Survey - &lt;A href="https://www.gartner.com/en/documents/6548002" target="_blank" rel="noopener"&gt;https://www.gartner.com/en/documents/6548002&lt;/A&gt; (Paywalled primary source — publicly reported via &lt;A href="http://techpartner.news/" target="_blank" rel="noopener"&gt;techpartner.news&lt;/A&gt;: &lt;A href="https://www.techpartner.news/news/gartner-microsoft-copilot-hype-offset-by-roi-and-readiness-realities-618118" target="_blank" rel="noopener"&gt;https://www.techpartner.news/news/gartner-microsoft-copilot-hype-offset-by-roi-and-readiness-realities-618118&lt;/A&gt;)&lt;/P&gt;
&lt;P&gt;Anthropic — Model Context Protocol (MCP) - &lt;A href="https://modelcontextprotocol.io/" target="_blank" rel="noopener"&gt;https://modelcontextprotocol.io&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Google Cloud — Agent-to-Agent Protocol (A2A) . &lt;A href="https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability" target="_blank" rel="noopener"&gt;https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;NeuBird AI — Production Operations Deployment Announcement &lt;A href="https://theaiinsider.tech/2026/04/07/neubird-ai-closes-19-3m-funding-round-to-scale-agentic-ai-across-enterprise-production-operations/" target="_blank" rel="noopener"&gt;NeuBird AI Closes $19.3M Funding Round to Scale Agentic AI Across Enterprise Production Operations&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. &lt;A href="https://arxiv.org/abs/2210.03629" target="_blank" rel="noopener"&gt;https://arxiv.org/abs/2210.03629&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Enterprise Integration Patterns — Gregor Hohpe &amp;amp; Bobby Woolf, Addison-Wesley &lt;A href="https://www.enterpriseintegrationpatterns.com/" target="_blank" rel="noopener"&gt;https://www.enterpriseintegrationpatterns.com&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 16:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/three-tiers-of-agentic-ai-and-when-to-use-none-of-them/ba-p/4510377</guid>
      <dc:creator>sgangaramani</dc:creator>
      <dc:date>2026-04-22T16:00:00Z</dc:date>
    </item>
    <item>
      <title>Introducing Kimi K2.6 in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-kimi-k2-6-in-microsoft-foundry/ba-p/4513125</link>
      <description>&lt;H6 data-section-id="au10ix" data-start="204" data-end="311"&gt;We’re excited to welcome &lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/Kimi-K2.6" target="_blank"&gt;&lt;STRONG data-start="338" data-end="391"&gt;Moonshot AI’s Kimi K2.6&lt;/STRONG&gt;&lt;/A&gt; to Microsoft Foundry expanding the platform’s growing catalog of open and frontier models designed for real-world, production-grade AI systems.&lt;/H6&gt;
&lt;P data-start="537" data-end="784"&gt;Kimi K2.6 represents a new class of&amp;nbsp;&lt;STRONG data-start="573" data-end="603"&gt;agentic, multimodal models&lt;/STRONG&gt; built for long-horizon reasoning, coding, and autonomous execution—bringing developers closer to fully self-directed AI systems that can plan, act, and deliver outcomes end-to-end.&lt;/P&gt;
&lt;H5 data-section-id="8vce18" data-start="791" data-end="815"&gt;&lt;STRONG&gt;Why Kimi K2.6 matters&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P data-start="1040" data-end="1259"&gt;According to Moonshot AI, K2.6 is a&amp;nbsp;&lt;STRONG data-start="1082" data-end="1117"&gt;native multimodal agentic model&lt;/STRONG&gt; that advances capabilities in long-horizon coding, autonomous execution, and multi-agent orchestration.&lt;/P&gt;
&lt;P data-start="1261" data-end="1332"&gt;This means developers can go beyond prompts and build systems where AI:&lt;/P&gt;
&lt;UL data-start="1333" data-end="1540"&gt;
&lt;LI data-section-id="r89ggp" data-start="1333" data-end="1376"&gt;Plans and executes multi-step workflows&lt;/LI&gt;
&lt;LI data-section-id="vqplyo" data-start="1377" data-end="1426"&gt;Writes, debugs, and refactors large codebases&lt;/LI&gt;
&lt;LI data-section-id="1h7xsej" data-start="1427" data-end="1477"&gt;Generates full applications—from UI to backend&lt;/LI&gt;
&lt;LI data-section-id="byejy6" data-start="1478" data-end="1540"&gt;Orchestrates multiple sub-agents to solve complex problems&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1585" data-end="1665"&gt;What differentiates Kimi K2.6 is its focus on &lt;STRONG data-start="1631" data-end="1664"&gt;agentic intelligence at scale&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="1667" data-end="1745"&gt;Unlike traditional models optimized for single responses, K2.6 is designed to:&lt;/P&gt;
&lt;UL data-start="1746" data-end="2028"&gt;
&lt;LI data-section-id="11gn3mk" data-start="1746" data-end="1804"&gt;Handle &lt;STRONG data-start="1755" data-end="1777"&gt;long-running tasks&lt;/STRONG&gt; across hundreds of steps&lt;/LI&gt;
&lt;LI data-section-id="18sg1jm" data-start="1805" data-end="1860"&gt;Coordinate &lt;STRONG data-start="1818" data-end="1858"&gt;parallel sub-agents (“agent swarms”)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="rtz5m5" data-start="1861" data-end="1914"&gt;Combine reasoning with &lt;STRONG data-start="1886" data-end="1912"&gt;tool use and execution&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="emvnwu" data-start="1915" data-end="2028"&gt;Deliver &lt;STRONG data-start="1925" data-end="1945"&gt;complete outputs&lt;/STRONG&gt;—documents, apps, workflows—in a single run&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2030" data-end="2137"&gt;This aligns with the broader industry shift toward &lt;STRONG data-start="2081" data-end="2136"&gt;AI agents that operate more like systems than tools&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H5 data-section-id="dqcy7v" data-start="2144" data-end="2198"&gt;&lt;STRONG&gt;Built for developers: Coding, reasoning, and beyond&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P data-start="2200" data-end="2443"&gt;Kimi K2.6 builds on the Kimi K2 family, which introduced large-scale &lt;STRONG data-start="2269" data-end="2297"&gt;Mixture-of-Experts (MoE)&lt;/STRONG&gt; architectures with up to &lt;STRONG data-start="2323" data-end="2348"&gt;1 trillion parameters&lt;/STRONG&gt;, optimized for reasoning, coding, and agent workflows.&lt;/P&gt;
&lt;P data-start="2445" data-end="2496"&gt;With K2.6, those capabilities are extended further:&lt;/P&gt;
&lt;UL data-start="2498" data-end="2779"&gt;
&lt;LI data-section-id="1srk7zu" data-start="2498" data-end="2572"&gt;&lt;STRONG data-start="2500" data-end="2533"&gt;Deeper reasoning and planning&lt;/STRONG&gt; for complex, multi-file coding tasks&lt;/LI&gt;
&lt;LI data-section-id="1b4hpp6" data-start="2573" data-end="2646"&gt;&lt;STRONG data-start="2575" data-end="2607"&gt;Improved agent orchestration&lt;/STRONG&gt;, enabling cleaner task decomposition&lt;/LI&gt;
&lt;LI data-section-id="exm26b" data-start="2647" data-end="2712"&gt;&lt;STRONG data-start="2649" data-end="2682"&gt;Stronger tool-use reliability&lt;/STRONG&gt; across multi-step workflows&lt;/LI&gt;
&lt;LI data-section-id="wmy135" data-start="2713" data-end="2779"&gt;&lt;STRONG data-start="2715" data-end="2736"&gt;Multimodal inputs&lt;/STRONG&gt;, combining text and visual understanding&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2781" data-end="2840"&gt;The result is a model that is particularly well-suited for:&lt;/P&gt;
&lt;UL data-start="2841" data-end="3003"&gt;
&lt;LI data-section-id="1o1lnt2" data-start="2841" data-end="2881"&gt;Developer copilots and coding agents&lt;/LI&gt;
&lt;LI data-section-id="17x4exq" data-start="2882" data-end="2918"&gt;Document and knowledge workflows&lt;/LI&gt;
&lt;LI data-section-id="1d6njdf" data-start="2919" data-end="2965"&gt;Autonomous research and analysis pipelines&lt;/LI&gt;
&lt;LI data-section-id="lucwpb" data-start="2966" data-end="3003"&gt;End-to-end application generation&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5 data-section-id="raw78e" data-start="3010" data-end="3061"&gt;&lt;STRONG&gt;Open models meet enterprise-grade infrastructure&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P data-start="3063" data-end="3205"&gt;Kimi K2.6 is part of a growing trend toward &lt;STRONG data-start="3107" data-end="3140"&gt;open, high-performance models&lt;/STRONG&gt; that give developers flexibility without sacrificing capability.&lt;/P&gt;
&lt;P data-start="3207" data-end="3290"&gt;In Microsoft Foundry, you can combine this openness with enterprise-grade features:&lt;/P&gt;
&lt;UL data-start="3292" data-end="3562"&gt;
&lt;LI data-section-id="1qxlvuz" data-start="3292" data-end="3334"&gt;&lt;STRONG data-start="3294" data-end="3318"&gt;Unified API and SDKs&lt;/STRONG&gt; across models&lt;/LI&gt;
&lt;LI data-section-id="1115698" data-start="3335" data-end="3383"&gt;&lt;STRONG data-start="3337" data-end="3381"&gt;Model evaluation and observability tools&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="10ky0cp" data-start="3384" data-end="3431"&gt;&lt;STRONG data-start="3386" data-end="3429"&gt;Built-in safety and governance controls&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="6tsn92" data-start="3432" data-end="3498"&gt;&lt;STRONG data-start="3434" data-end="3465"&gt;Flexible deployment options&lt;/STRONG&gt; (global, regional, data zones)&lt;/LI&gt;
&lt;LI data-section-id="mq0iih" data-start="3499" data-end="3562"&gt;&lt;STRONG data-start="3501" data-end="3562"&gt;Integration with agent frameworks and orchestration tools&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3564" data-end="3678"&gt;This means you can experiment with Kimi K2.6 and seamlessly move to production—without re-architecting your stack.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Pricing&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Token Type&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Price per 1M tokens&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Input tokens&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;$0.95&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Output tokens&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;$4&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H5 data-section-id="1jhhzk6" data-start="4801" data-end="4819"&gt;&lt;STRONG&gt;Getting started&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P data-start="4821" data-end="4869"&gt;&lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/Kimi-K2.6" target="_blank"&gt;Kimi K2.6 is now available in Microsoft Foundry.&lt;/A&gt;&lt;/P&gt;
&lt;P data-start="4871" data-end="4879"&gt;You can:&lt;/P&gt;
&lt;UL data-start="4880" data-end="5118"&gt;
&lt;LI data-section-id="epri5z" data-start="4880" data-end="4924"&gt;Explore the model in the Foundry catalog&lt;/LI&gt;
&lt;LI data-section-id="10zqmuq" data-start="4925" data-end="4989"&gt;Benchmark it against other models using built-in evaluations&lt;/LI&gt;
&lt;LI data-section-id="zn8v30" data-start="4990" data-end="5051"&gt;Integrate it into your applications using the Foundry SDK&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="5225" data-end="5399"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 15:56:21 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-kimi-k2-6-in-microsoft-foundry/ba-p/4513125</guid>
      <dc:creator>RashaudSavage</dc:creator>
      <dc:date>2026-04-22T15:56:21Z</dc:date>
    </item>
    <item>
      <title>Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5.4 + Azure Document Intelligence</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/extracting-boms-from-electrical-drawings-with-ai-azure-openai/ba-p/4506891</link>
      <description>&lt;DIV class="container"&gt;
&lt;ARTICLE&gt;
&lt;H2 id="problem"&gt;The Problem&lt;/H2&gt;
&lt;P&gt;Electrical engineering drawings — especially single-line diagrams (SLDs) — are notoriously hard to parse programmatically. They combine dense symbols, small text, and complex geometric structures, all varying in style across documents and vendors.&lt;/P&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 1.&lt;/STRONG&gt; A typical electrical single-line diagram. A single page contains multiple panel regions (HV, TR, LV, GCP), each with dozens of electrical components, connection lines, and specification text — all of which must be correctly identified and assigned to the right panel for BOM extraction.&lt;/img&gt;
&lt;P&gt;Traditionally, extracting a Bill of Materials (BOM) from these drawings has been a manual task — engineers would read through each diagram page by page and transcribe component lists by hand. It's time-consuming, error-prone, and doesn't scale.&lt;/P&gt;
&lt;P&gt;The obvious question: &lt;STRONG&gt;can a Vision Language Model automate this?&lt;/STRONG&gt; We had to rely primarily on the visual content of PDF-converted images alone — without guaranteed access to CAD vector data or metadata. That constraint shaped every technique described in this post.&lt;/P&gt;
&lt;DIV style="background: #FFF4CE; border-left: 4px solid #CA5010; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG style="color: #ca5010;"&gt;Warning:&lt;/STRONG&gt; Why naive inference fails: Feeding a full diagram page to a vision model and asking for a BOM fails catastrophically — too much visual complexity in a single context. Components get missed, hallucinated, or assigned to the wrong panel.&lt;/P&gt;
&lt;/DIV&gt;
&lt;P&gt;This post shares the practical techniques we discovered while building a pipeline to solve this problem. The methods are general — applicable to any task that requires extracting structured information from visually complex technical documents.&lt;/P&gt;
&lt;H2 id="architecture"&gt;Pipeline Architecture: Divide and Conquer&lt;/H2&gt;
&lt;P&gt;The core insight: &lt;STRONG&gt;the unit of analysis must be the panel, not the page.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A single diagram page can contain dozens of panels, each representing a distinct electrical cabinet with its own components. Asking a model to extract a complete BOM from an entire page is asking it to simultaneously locate, read, and count every component across all panels — a task that proved too complex even for GPT-5.4.&lt;/P&gt;
&lt;P&gt;The solution was to first identify and crop each panel as an independent region, then extract the BOM panel by panel. This transformed an intractable whole-page problem into a series of manageable, well-scoped sub-tasks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 2.&lt;/STRONG&gt; Five-stage pipeline architecture. Each stage is color-coded by its primary tool: rule-based (gray), Azure Document Intelligence (orange), hybrid Azure OpenAI+Document Intelligence (purple), and Azure OpenAI GPT-5 vision (blue). Tool tags on the right indicate specific components used at each stage.&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique1"&gt;Technique 1: Azure Document Intelligence-First Detection with Targeted Azure OpenAI Supplement&lt;/H2&gt;
&lt;P&gt;The first challenge is identifying all figure regions on each page — the panel diagrams that contain the actual electrical components. SLD pages typically mix diagrams with title blocks, revision tables, and legend boxes (often along the right edge or bottom). All of these must be located before we can isolate the panels.&lt;/P&gt;
&lt;P&gt;The key design decision: &lt;STRONG&gt;use Azure Document Intelligence (DI) as the primary detector — and reserve GPT-5.4 only for gaps that DI misses.&lt;/STRONG&gt; DI's prebuilt-layout model is fast, deterministic, and cheap compared to an Azure OpenAI vision call. By maximizing DI coverage first, we minimize the number of expensive Azure OpenAI invocations needed to achieve full detection.&lt;/P&gt;
&lt;H3&gt;Two-Pass Document Intelligence Layout detection: Catching Occluded Regions&lt;/H3&gt;
&lt;P&gt;A single Azure Document Intelligence (DI) pass often misses figures that are visually occluded by larger detected regions — particularly smaller panels nested within or adjacent to large ones, and tables along the page edges. The solution:&amp;nbsp;&lt;STRONG&gt;white-fill detected regions and re-run DI&lt;/STRONG&gt; to reveal what was hidden underneath.&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Pass 1: Detect figures on original page
pass1 = analyze_page(di_client, "prebuilt-layout", image_path)
pass1_bboxes = [f["bbox"] for f in pass1["figures"]]

# Pass 2: White-fill Pass 1 regions → re-detect
if pass1_bboxes:
    white_fill_regions(image_path, pass1_bboxes, whitefill_path)
    pass2 = analyze_page(di_client, "prebuilt-layout", whitefill_path)

    # Merge &amp;amp; deduplicate by IoU
    all_figures = pass1["figures"] + pass2["figures"]
    deduped = _dedup_figure_bboxes(all_figures, iou_threshold=0.5)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This two-pass approach is especially effective at catching tables and annotation blocks along the right edge or bottom of the page that DI initially subsumes into a single large region.&lt;/P&gt;
&lt;H3&gt;Azure OpenAI GPT-5.4 Only for the Remaining Gaps&lt;/H3&gt;
&lt;P&gt;AAfter two Azure Document Intelligence(DI) passes, most figure regions are covered. For any remaining gaps, Azure OpenAI GPT-5.4 is called&amp;nbsp;&lt;EM&gt;once&lt;/EM&gt; with the DI-detected regions marked in purple on the image. The model only needs to identify unmarked areas — a much simpler task than full-page detection from scratch.&lt;/P&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG style="color: #0078d4;"&gt;Key finding&lt;/STRONG&gt;&lt;SPAN style="color: #0078d4;"&gt;: DI detection is ~10× faster and significantly cheaper per call than an Azure OpenAI GPT-5.4 vision request. By using DI as the primary detector and Azure OpenAI only for supplemental gap-filling, the pipeline achieves comprehensive coverage while keeping cost and latency low. The two-pass technique further reduces Azure OpenAI's burden by maximizing what DI can find on its own.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique2"&gt;Technique 2: Hybrid Azure OpenAI GPT + Document Intelligence for Text Localization&lt;/H2&gt;
&lt;P&gt;To segment panel areas, we first need to know &lt;STRONG&gt;where panel names appear in the image&lt;/STRONG&gt;. Panel names act as anchor points — once we know their locations, we can use them as seeds to identify the boundaries of each panel region.&lt;/P&gt;
&lt;P&gt;Neither GPT-5.4 nor Azure Document Intelligence alone is sufficient:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;GPT-5.4 :&lt;/STRONG&gt; identifies &lt;EM&gt;what&lt;/EM&gt; the text says, but imprecise on exact pixel locations&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Document Intelligence:&lt;/STRONG&gt; precise coordinates for all text, but struggles with domain-specific abbreviations&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The solution: run both in parallel and cross-match results.&lt;/P&gt;
&lt;img&gt;&lt;BR /&gt;&lt;STRONG&gt;Figure 3.&lt;/STRONG&gt; Two parallel tracks converge into cross-matching. &lt;STRONG&gt;Top:&lt;/STRONG&gt; page is split into overlapping tiles → Azure OpenAI GPT-5.4 extracts panel name candidates per tile → aggregated and deduplicated. &lt;STRONG&gt;Bottom:&lt;/STRONG&gt; Azure DI extracts all text with bounding boxes → rule + Azure OpenAI GPT-5.4 filters by type → cross-matching prioritizes rule-based alignment, with Azure OpenAI GPT-5.4 resolving unmatched cases.&lt;BR /&gt;&lt;/img&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;H3&gt;Tile-Based Name Extraction&lt;/H3&gt;
&lt;P&gt;Rather than feeding the entire page to Azure OpenAI, we split it into overlapping vertical tiles (2000px wide, 400px overlap) and extract panel name candidates from each tile independently. This reduces visual complexity per call and improves recall.&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Split page into overlapping tiles for LLM name extraction
tiles = tile_page(image_path, tile_width=2000, overlap=400)

# Extract names from each tile independently
for tile_img, tile_coords in tiles:
    names = extract_names_from_tile(tile_img, llm_client, deployment)
    # LLM identifies SHORT CODES: "HV 1", "TR-2", "LV 3A"
    # Rejects: component labels, wire tags, terminal IDs&lt;/CODE&gt;&lt;/PRE&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;BR /&gt;&lt;STRONG&gt;Figure 4.&lt;/STRONG&gt; Effect of tile-based extraction vs. full-page extraction. Overlapping tiles reduce visual complexity per LLM call, improving panel name recall — especially for names near page edges.&lt;/img&gt;
&lt;H3&gt;Hallucination Guard&lt;/H3&gt;
&lt;P&gt;Azure OpenAI GPT-5.4 model sometimes fabricate panel names that don't exist in the image. We cross-validate all Azure OpenAI GPT-5.4 model-extracted names against the Azure Document Intelligence OCR text pool using fuzzy matching:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;def hallucination_guard(names, di_lines):
    verified = []
    for name in names:
        if _name_in_ocr(name, ocr_texts):  # 3 matching modes:
            verified.append(name)           #   1. Exact substring
            # 2. Same length, ≤1 char diff
            # 3. Alphanumeric stripping (ignore spaces/punctuation)
        else:
            print(f"  Dropped '{name}' — no OCR match")
    return verified&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;6-Pass Rule Matching Engine&lt;/H3&gt;
&lt;P&gt;Once names are verified, we locate their exact pixel positions via a cascading rule engine that matches panel names against Azure Document Intelligence OCR bounding boxes with decreasing confidence:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;def rule_match_panel_name(panel_name, di_lines, max_merge=3):
    # Pass 1: Exact match (case-insensitive)          → confidence 1.0
    # Pass 2: Alphanumeric (O/0, I/1 tolerance)       → confidence 0.9
    # Pass 3: Substring containment (≥3 chars)         → confidence 0.75
    # Pass 4: Multi-line merge (adjacent DI lines)     → confidence 0.7
    # Pass 5: OCR-confusable (1-char diff, v↔y, s↔5)  → confidence 0.6

    # Conflict resolution: shared bbox → keep highest confidence
    # LLM fallback: for still-unmatched names&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG style="color: #0078d4;"&gt;Key finding:&lt;/STRONG&gt; GPT-5.4&amp;nbsp;identifies &lt;EM&gt;what&lt;/EM&gt; the panel names are (semantic), while Azure Document Intelligence provides &lt;EM&gt;where&lt;/EM&gt; they are (geometric). The rule engine bridges the two with OCR-aware fuzzy matching. This hybrid approach is significantly more robust than either system alone.&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique3"&gt;Technique 3: Iterative Locate → Verify with Oscillation Guard&lt;/H2&gt;
&lt;P&gt;With panel names located, the next challenge is identifying the full panel boundary. This is the hardest stage: panels can be irregularly shaped, share edges with neighbors, and have boundaries formed by a mix of dashed, solid, and implied lines.&lt;/P&gt;
&lt;P&gt;Rather than asking Azure OpenAI GPT-5.4 Model to find the boundary in one shot (which is unreliable), we implemented an &lt;STRONG&gt;iterative Locate → Verify correction loop&lt;/STRONG&gt; with up to 10 attempts per panel.&lt;/P&gt;
&lt;H3&gt;Visual Prompt Composition&lt;/H3&gt;
&lt;P&gt;Each iteration constructs a carefully composed image for Azure OpenAI GPT-5.4, providing spatial context through color-coded overlays:&lt;/P&gt;
&lt;DIV class="two-col"&gt;
&lt;DIV&gt;
&lt;H4&gt;Locate Input&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG style="color: var(--ms-blue);"&gt;Blue box&lt;/STRONG&gt; — target panel name (&lt;CODE&gt;NAME:{panel_name}&lt;/CODE&gt;)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG style="color: var(--green);"&gt;Green boxes&lt;/STRONG&gt; — other panels as spatial reference&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;DIV&gt;
&lt;H4&gt;Verify Input&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG style="color: var(--orange);"&gt;Orange box&lt;/STRONG&gt; — proposed panel bbox&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG style="color: var(--ms-blue);"&gt;Blue box&lt;/STRONG&gt; — panel name location&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG style="color: var(--green);"&gt;Green boxes&lt;/STRONG&gt; — neighboring panels&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;The verify step analyzes each of the four edges independently:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;// Verify response — per-edge analysis
{
  "valid": false,
  "edges": {
    "x1": { "status": "expand", "delta": -45, "corrected": 120 },
    "y1": { "status": "ok" },
    "x2": { "status": "shrink", "delta": -30, "corrected": 850 },
    "y2": { "status": "expand", "delta": 60, "corrected": 1200 }
  },
  "corrected_bbox": [120, 200, 850, 1200]
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Oscillation Detection&lt;/H3&gt;
&lt;P&gt;A critical failure mode: Azure OpenAI GPT-5.4 oscillates on an axis — expanding, then shrinking, then expanding — never converging. We detect this using a 3-value history per axis:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Track last 3 corrections per axis
axis_history = {axis: [] for axis in ["x1", "y1", "x2", "y2"]}

for attempt in range(1, max_tries + 1):
    # ... locate and verify ...
    for axis in AXES:
        hist = axis_history[axis]
        if len(hist) &amp;gt;= 3:
            a, b, c = hist[-3:]
            # Detect: (a &amp;lt; b &amp;gt; c) or (a &amp;gt; b &amp;lt; c) → oscillating
            if (a &amp;lt; b and b &amp;gt; c) or (a &amp;gt; b and b &amp;lt; c):
                corrected[axis] = hist[-2]  # Freeze at previous value&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;The loop processes all panels on a page in &lt;STRONG&gt;batch mode&lt;/STRONG&gt; — a single &lt;CODE&gt;locate-all&lt;/CODE&gt; call positions all panels simultaneously, reducing per-panel LLM calls from N to 1.&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique4"&gt;Technique 4: Few-Shot Visual Prompting&lt;/H2&gt;
&lt;P&gt;Text prompts alone struggle to convey spatial concepts like "what a panel boundary looks like." The visual vocabulary of electrical drawings is too domain-specific to describe in words. The solution: provide GPT-5.4 with &lt;STRONG&gt;few-shot reference images&lt;/STRONG&gt; directly in the prompt.&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;def locate_and_verify_batch(..., guide_image_paths=None):
    guides = list(guide_image_paths or [])
    # Prepend guide images before the actual input:
    loc_raw = call_llm(
        llm_client, deployment, locate_prompt,
        image_paths=guides + [locate_img_path],  # Guides first
    )&lt;/CODE&gt;&lt;/PRE&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;BR /&gt;&lt;STRONG&gt;Figure 5.&lt;/STRONG&gt; Few-shot reference for panel boundary recognition. Example 1: dashed rectangle enclosure. Example 2: mixed boundary with dashed lines, solid edge, and gap + vertical line as shared boundary. Example 3: non-rectangular region returns the full outer bounding box.&lt;/img&gt;
&lt;P&gt;The benefits:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Reduces prompt length&lt;/STRONG&gt; — no need to describe visual concepts in words&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Improves consistency&lt;/STRONG&gt; — the model interprets boundary types correctly across varying layouts&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;User-customizable&lt;/STRONG&gt; — swapping in new guide images adapts to new drawing styles without code changes&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique5"&gt;Technique 5: Reasoning Mode — Performance vs. Cost&lt;/H2&gt;
&lt;P&gt;Azure OpenAI&amp;nbsp; GPT-5.4's reasoning capability (&lt;CODE&gt;reasoning={"effort": "low|medium|high"}&lt;/CODE&gt;) significantly affects both accuracy and latency. We ran systematic experiments across all four reasoning levels in two key pipeline stages: &lt;STRONG&gt;image area detection&lt;/STRONG&gt; and &lt;STRONG&gt;BOM extraction&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;Reasoning in Image Area Detection&lt;/H3&gt;
&lt;P&gt;Each reasoning level was tested 3 times on the region detection stage (Azure OpenAI GPT-5.4 detects and verifies figure boundaries).&lt;/P&gt;
&lt;H4&gt;Detection Quality&lt;/H4&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;BR /&gt;&lt;STRONG&gt;Figure 6.&lt;/STRONG&gt; Region detection output across reasoning levels (High, Medium, Low, None × 3 runs). All levels produced comparable quality. high converged in fewer iterations (2), while medium/low sometimes needed 3.&lt;/img&gt;
&lt;H4&gt;Verification Iterations&lt;/H4&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 7.&lt;/STRONG&gt; Mean verification iterations per page (3 runs avg). high: 2.0, medium: 2.7, low: 3.0, none: 2.0. Lower reasoning needs slightly more iterations but converges to the same quality.&lt;/img&gt;
&lt;H4&gt;Processing Time&lt;/H4&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 8.&lt;/STRONG&gt; Mean LLM time for detection + verification (3 runs avg). high/medium ~170s, low ~52s, none ~15s.&lt;/img&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG&gt;Key finding:&lt;/STRONG&gt;&amp;nbsp;All reasoning levels (low,&amp;nbsp;medium,&amp;nbsp;high) produced similar quality and noticeably better results than&amp;nbsp;none, but with increasing latency (~3× from&amp;nbsp;low&amp;nbsp;to&amp;nbsp;high). Since there was no meaningful quality difference among reasoning levels, we chose&amp;nbsp;low&amp;nbsp;as the optimal setting —&amp;nbsp;&lt;STRONG&gt;getting the benefit of reasoning at minimal latency cost&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/DIV&gt;
&lt;H3&gt;Reasoning in BOM Extraction&lt;/H3&gt;
&lt;P&gt;For BOM extraction — reading component lists from cropped panel images — reasoning has a more pronounced accuracy impact:&lt;/P&gt;
&lt;img&gt;&lt;BR /&gt;&lt;STRONG&gt;Figure 9.&lt;/STRONG&gt; Time vs. Accuracy: Low (~86%, ~2,300s) vs. Medium (~89–91%, ~3,900s) reasoning across 3 runs each. High (Timeout)&lt;/img&gt;
&lt;UL&gt;
&lt;LI&gt;Low: ~85% Accuray, ~2,200–2,400s&lt;/LI&gt;
&lt;LI&gt;Medium: ~85% ~91% Accuracy, ~3700-4100s&lt;/LI&gt;
&lt;LI&gt;High :Timeout&lt;/LI&gt;
&lt;/UL&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3&gt;Recommended Configurations&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Pipeline Stage&lt;/th&gt;&lt;th&gt;Recommended&lt;/th&gt;&lt;th&gt;Rationale&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Image region detection&lt;/td&gt;&lt;td&gt;&lt;CODE&gt;low&lt;/CODE&gt;&lt;/td&gt;&lt;td&gt;Same quality at ~3× less cost&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Region verification&lt;/td&gt;&lt;td&gt;&lt;CODE&gt;low&lt;/CODE&gt;&lt;/td&gt;&lt;td&gt;Sufficient with rich visual context (color-coded overlays)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;BOM extraction&lt;/td&gt;&lt;td&gt;&lt;CODE&gt;medium&lt;/CODE&gt;&lt;/td&gt;&lt;td&gt;+3–5% accuracy; high causes timeouts&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG&gt;Key insight:&lt;/STRONG&gt; Different stages need different reasoning levels. Use &lt;CODE&gt;low&lt;/CODE&gt; for spatially-grounded tasks with rich visual context, &lt;CODE&gt;medium&lt;/CODE&gt; for semantically-demanding reading tasks. &lt;STRONG&gt;Optimize inputs before scaling compute.&lt;/STRONG&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="technique6"&gt;Technique 6: SVG Vector Boundary Detection&lt;/H2&gt;
&lt;P&gt;While the main pipeline relies on Azure OpenAI GPT-5.4's vision for panel boundary detection, we also explored a purely geometric approach that bypasses the GPT-5.4 model entirely — useful when CAD files can be exported as SVG vectors rather than raster images.&lt;/P&gt;
&lt;P&gt;The core idea: &lt;STRONG&gt;panel boundaries are closed geometric shapes formed by line segments in the vector data.&lt;/STRONG&gt; If we can extract meaningful line clusters and find closed cycles in the resulting graph, we can identify panel regions without any LLM calls.&lt;/P&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 10.&lt;/STRONG&gt; Panel regions detected via SVG vector analysis — each color represents a distinct panel boundary identified through line clustering and cycle extraction, with no LLM involvement.&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;The Problem: Too Many Lines&lt;/H3&gt;
&lt;P&gt;A single SVG page can contain &lt;STRONG&gt;17,000+ line segments&lt;/STRONG&gt; — horizontal, vertical, and diagonal — mixing panel borders, component symbols, text strokes, and wiring lines. Attempting to work with raw segments directly is impractical.&lt;/P&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Figure 11.&lt;/STRONG&gt; Raw SVG line extraction from a single page: 15,742 horizontal + 1,106 vertical + 227 diagonal segments. The length distribution (right) shows most segments are very short (symbol strokes, text) — boundary lines are a tiny fraction of the total.&lt;/P&gt;
&lt;/img&gt;
&lt;DIV class="container"&gt;
&lt;ARTICLE&gt;
&lt;H3&gt;Chain-Based Line Clustering&lt;/H3&gt;
&lt;P&gt;The solution is clustering collinear, nearby segments into coherent boundary lines. We first scan fragments in order of position. If the next fragment is close enough, extend the current group. If the gap is too large, start a new one. Discard any group that is too short or too sparse to be a real boundary.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, using a &lt;CODE&gt;DashDotBoundaryStyle&lt;/CODE&gt; strategy with 25px tolerance, 17,000+ raw segments collapse into just &lt;STRONG&gt;15 horizontal + 18 vertical clusters&lt;/STRONG&gt; — each representing a candidate panel edge.&lt;/P&gt;
&lt;FIGURE&gt;&lt;/FIGURE&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 12.&lt;/STRONG&gt; After clustering: 15 horizontal lines (left) and 18 vertical lines (right), each color-coded by cluster. Legend shows pixel position and segment count per cluster. The 3,604 unassigned segments (gray) are symbol strokes and other non-boundary elements, filtered out.&lt;/img&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Group collinear, adjacent line segments into chains
# Lines sharing an endpoint (within tolerance) and same direction → one chain
chains = cluster_lines_to_chains(svg_lines, endpoint_tolerance=2.0)

# Filter by length — short chains are symbols, long chains are boundaries
boundary_chains = [c for c in chains if c.total_length &amp;gt; min_boundary_length]&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Density-Peak Clustering for Boundary Lines&lt;/H3&gt;
&lt;P&gt;Dotted lines have gaps too large to bridge by proximity. Instead, project all fragments onto a ruler and count how many land at each position. Wherever fragments pile up — a spike — that's a boundary line. Find the spikes, ignore the noise.&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Find density peaks in X-coordinates of vertical lines
# and Y-coordinates of horizontal lines
x_peaks = density_peak_cluster(vertical_lines, axis='x')
y_peaks = density_peak_cluster(horizontal_lines, axis='y')

# Each peak represents a candidate panel edge position&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;BoundaryStyleStrategy Abstraction&lt;/H3&gt;
&lt;P&gt;Different drawings use different boundary conventions — solid lines, dashed lines, mixed styles. A strategy pattern allows the detection algorithm to adapt:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Pluggable strategy for what constitutes a "boundary line"
class BoundaryStyleStrategy:
    def is_boundary_line(self, line) -&amp;gt; bool: ...
    def merge_candidates(self, lines) -&amp;gt; List[Chain]: ...

# Implementations:
# - SolidLineStrategy: continuous lines above length threshold
# - DashedLineStrategy: periodic short segments with gaps
# - MixedStrategy: combines both heuristics&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Seed-Guided Minimal Cycle Extraction&lt;/H3&gt;
&lt;P&gt;The final step finds the actual closed regions. Using panel name locations as seeds (from Technique 2), we search for the minimal cycle in the planar graph that encloses each seed point:&lt;/P&gt;
&lt;PRE style="background: #E1DFDD; border: 1px solid #C8C6C4; border-radius: 6px; padding: 18px 22px; overflow-x: auto; margin: 14px 0 22px; font-size: 0.85rem; line-height: 1.65;"&gt;&lt;CODE&gt;# Build planar graph from boundary line segments
G = build_planar_graph(boundary_chains)

for panel_name, seed_point in panel_seeds:
    # Find minimal cycle enclosing the seed point
    cycle = find_minimal_enclosing_cycle(G, seed_point)
    if cycle:
        panel_regions[panel_name] = cycle_to_polygon(cycle)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV style="background: #F3F9FF; border-left: 4px solid #0078D4; padding: 16px 20px; border-radius: 0 4px 4px 0; margin: 18px 0;"&gt;
&lt;P style="margin: 0; color: #323130; font-size: 0.95rem;"&gt;&lt;STRONG&gt;Trade-off:&lt;/STRONG&gt; This approach is faster and more precise than LLM-based detection, but requires SVG vector access (not always available) and is sensitive to non-standard drawing conventions. In our pipeline, it serves as an alternative path when vector data is accessible.&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR class="section-break" /&gt;
&lt;H2 id="results"&gt;Results &amp;amp; Error Analysis&lt;/H2&gt;
&lt;P&gt;The pipeline achieved &lt;STRONG&gt;94.21% accuracy&lt;/STRONG&gt; (277/294 materials correctly extracted) across 53 panels on 4 diagram pages. Processing time was ~62 minutes from pre-cropped panel images.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Overall accuracy&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;94.21%&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Panels processed&lt;/td&gt;&lt;td&gt;53&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Materials extracted&lt;/td&gt;&lt;td&gt;346&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Correctly identified&lt;/td&gt;&lt;td&gt;277 / 294&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Processing time&lt;/td&gt;&lt;td&gt;~62 min (panel images only)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;A &lt;STRONG&gt;+5.43% accuracy improvement&lt;/STRONG&gt; (88.78% → 94.21%) came from iterative prompt refinement based on domain expert review of extraction errors — identifying recurring patterns and translating them into prompt corrections.&lt;/P&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 13.&lt;/STRONG&gt; Iterative prompt refinement: GPT-5.4 results → expert review → error pattern analysis → prompt corrections → re-run.&lt;/img&gt;
&lt;DIV class="container"&gt;
&lt;ARTICLE&gt;&lt;HR class="section-break" /&gt;
&lt;H2 id="takeaways"&gt;Key Takeaways&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Decompose aggressively.&lt;/STRONG&gt; Stage-wise processing with scoped inputs is the difference between working and not working. Break the problem down until each sub-task is tractable for the model.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Visual context beats reasoning effort.&lt;/STRONG&gt; Color-coded overlays and few-shot images reduce reliance on expensive reasoning modes. Optimize inputs before scaling compute.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Verify, but stop early.&lt;/STRONG&gt; Self-correction loops improve accuracy — but accumulated context can confuse the model. Oscillation detection and early stopping are critical.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Hybrid always wins.&lt;/STRONG&gt; Azure Document Intelligence for precise coordinates, GPT-5,4 for semantics. Pure LLM solutions lose on precision-critical tasks.&lt;/P&gt;
&lt;HR class="section-break" /&gt;
&lt;H2&gt;What's Next?&lt;/H2&gt;
&lt;P&gt;The techniques here generalize well beyond SLDs. We're exploring several directions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Other drawing types&lt;/STRONG&gt; — P&amp;amp;ID, mechanical assembly, architectural floor plans. The core pipeline stages stay the same; only the few-shot guides and panel name patterns change per domain.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;ERP/PLM integration&lt;/STRONG&gt; — Feeding extracted BOMs directly into SAP, Oracle, or PTC Windchill to close the loop from drawing to purchase order.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Active learning from HITL corrections&lt;/STRONG&gt; — Using human corrections captured in the Streamlit demo as training signal to drive automatic prompt refinement.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Cost optimization at scale&lt;/STRONG&gt; — Batching Azure OpenAI calls, caching DI results for recurring templates, and leveraging SVG vector detection (Technique 6) whenever CAD exports are available.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Multi-modal verification&lt;/STRONG&gt; — Cross-referencing extracted BOMs against parts databases or previous drawing revisions to validate extraction accuracy in context.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Get Started&lt;/H3&gt;
&lt;P&gt;Run the Demo Clone the GitHub repository and launch the Streamlit HITL demo with your own SLD drawings.&lt;/P&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;DIV class="container"&gt;
&lt;ARTICLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;STRONG&gt;Figure 14.&lt;/STRONG&gt; The Streamlit HITL demo showing Step 5 — BOM Extraction results. Each panel's cropped image is displayed alongside the extracted component table with device symbol, name, specification, quantity, and confidence level.&lt;/img&gt;
&lt;P style="text-align: center; margin: 24px 0;"&gt;&lt;A style="display: inline-block; background: #0078D4; color: #fff; padding: 12px 28px; border-radius: 6px; font-weight: 600; font-size: 1rem; text-decoration: none;" href="https://github.com/jihys/electrical-sld-bom-extraction" target="_blank" rel="noopener"&gt;View on GitHub →&lt;/A&gt;&lt;/P&gt;
&lt;/ARTICLE&gt;
&lt;/DIV&gt;
&lt;P class="lia-align-left" style="text-align: center; margin: 24px 0;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Explore the Services&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-services/openai/" target="_blank" rel="noopener"&gt;Azure OpenAI Service documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-services/document-intelligence/" target="_blank" rel="noopener"&gt;Azure Document Intelligence quickstart&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-studio/" target="_blank" rel="noopener"&gt;Azure AI Foundry overview&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Join the Conversation&lt;/H3&gt;
&lt;P&gt;Have questions or built something similar? Share your experience in the comments below or connect with us on the &lt;A href="https://techcommunity.microsoft.com/category/azure-ai/" target="_blank" rel="noopener"&gt;Azure AI Tech Community&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Related Reading:&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-services/openai/how-to/gpt-with-vision" target="_blank" rel="noopener"&gt;Vision capabilities in Azure OpenAI&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/azure/ai-services/document-intelligence/concept-model-overview" target="_blank" rel="noopener"&gt;Building document processing pipelines with Azure AI&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;P data-testid="MessageSubject"&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/gpt-vision-capability-in-understanding-coordinates-how-gpt-5-4-transforms-spatia/4506726?previewMessage=true" target="_blank" rel="noopener" data-lia-auto-title="GPT Vision Capability in Understanding Coordinates: How GPT-5.4 Transforms Spatial Precision" data-lia-auto-title-active="0"&gt;GPT Vision Capability in Understanding Coordinates: How GPT-5.4 Transforms Spatial Precision&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-left" style="text-align: center; margin: 24px 0;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-left" style="text-align: center; margin: 24px 0;"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 13:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/extracting-boms-from-electrical-drawings-with-ai-azure-openai/ba-p/4506891</guid>
      <dc:creator>jihyeseo</dc:creator>
      <dc:date>2026-04-22T13:00:00Z</dc:date>
    </item>
    <item>
      <title>Introducing OpenAI's GPT-image-2 in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-openai-s-gpt-image-2-in-microsoft-foundry/ba-p/4500571</link>
      <description>&lt;P&gt;Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy.&lt;/P&gt;
&lt;P&gt;OpenAI's &lt;STRONG&gt;GPT-image-2&lt;/STRONG&gt; is now generally available and rolling out today to &lt;A class="lia-external-url" href="http://ai.azure.com" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt;, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one.&lt;/P&gt;
&lt;H1&gt;What is new in GPT-image-2?&lt;/H1&gt;
&lt;P&gt;&lt;STRONG&gt;GPT-image-2 &lt;/STRONG&gt;brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows.&lt;/P&gt;
&lt;H2&gt;Real world intelligence&lt;/H2&gt;
&lt;P&gt;GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks.&lt;/P&gt;
&lt;H2&gt;Multilingual understanding&lt;/H2&gt;
&lt;P&gt;GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized.&lt;/P&gt;
&lt;H2&gt;Increased resolution support&lt;/H2&gt;
&lt;P&gt;GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Resolution guidelines to keep in mind:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="width: 99.7222%; height: 231px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Constraint&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Detail&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Total pixel budget&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Maximum pixels in final image cannot exceed 8,294,400&lt;/P&gt;
&lt;P&gt;Minimum pixels in final image cannot be less than 655,360&lt;/P&gt;
&lt;P&gt;Requests exceeding this are automatically resized to fit.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Resolutions&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;4K, 1024x1024, 1536x1024, and 1024x1536&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Dimension alignment&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Each dimension must be a multiple of 16&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 49.9656%" /&gt;&lt;col style="width: 49.9656%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Note:&lt;/STRONG&gt; If your requested resolution exceeds the pixel budget, the service will automatically resize it down.&lt;/P&gt;
&lt;H2&gt;Intelligent routing layer&lt;/H2&gt;
&lt;P&gt;GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Mode 1 — Legacy size selection&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="width: 1075px; height: 171px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Size tier&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Description&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;smimage&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Small image output&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;image&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Standard image output&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;xlimage&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Large image output&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 537px" /&gt;&lt;col style="width: 537px" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Mode 2 — Token size bucket selection&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In Mode 2, the routing layer selects from &lt;STRONG&gt;six token size buckets&lt;/STRONG&gt; — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="width: 1070px; height: 159px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr style="height: 39.2383px;"&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;&lt;STRONG&gt;Token bucket&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;&lt;STRONG&gt;Approximate legacy size&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39.2383px;"&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;&lt;STRONG&gt;16, 24&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;smimage&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39.2383px;"&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;&lt;STRONG&gt;36, 48&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;image&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 39.2383px;"&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;&lt;STRONG&gt;64, 96&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 39.2383px;"&gt;
&lt;P&gt;xlimage&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 534px" /&gt;&lt;col style="width: 534px" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt.&lt;/P&gt;
&lt;H2&gt;See it in action&lt;/H2&gt;
&lt;P&gt;GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;STRONG&gt;Prompt:&lt;/STRONG&gt; Interior of an empty subway car (no people).&lt;BR /&gt;Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows.&lt;BR /&gt;Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor).&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-style-hidden" border="1" style="width: 1074px; height: 484px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr style="height: 489px;"&gt;&lt;td class="lia-align-center" style="height: 489px;"&gt;&lt;img&gt;Figure 1. Created with GPT-image-1&lt;/img&gt;&lt;/td&gt;&lt;td class="lia-align-center" style="height: 489px;"&gt;&lt;img&gt;Figure 2. Created with GPT-image-1.5&lt;/img&gt;&lt;/td&gt;&lt;td class="lia-align-center" style="height: 489px;"&gt;&lt;img&gt;Figure 3. Created with GPT-image-2&lt;/img&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Prompt: &lt;/STRONG&gt;Populate the ad frames with a cohesive ad campaign for&amp;nbsp;“Zava Flower Delivery”&lt;STRONG&gt; &lt;/STRONG&gt;and use an array of flower types.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;img&gt;Figure 4. Created with GPT-image-2&lt;/img&gt;
&lt;P&gt;&amp;nbsp;And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Prompt: &lt;/STRONG&gt;In all Zava Flower Delivery advertisements, change the flowers shown to&amp;nbsp;roses (red and pink roses).&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;Figure 5. Created with GPT-image-2&lt;/img&gt;
&lt;P&gt;And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases.&lt;/P&gt;
&lt;H1&gt;Image generation across industries&lt;/H1&gt;
&lt;P&gt;These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Retail &amp;amp; e-commerce:&lt;/STRONG&gt; Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Marketing:&lt;/STRONG&gt; Produce crisp, rich in color campaign visuals and social assets localized to different markets.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Media &amp;amp; entertainment:&lt;/STRONG&gt; Generate storyboard panels and scene at resolutions suited to production pipelines.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Education &amp;amp; training:&lt;/STRONG&gt; Create visual learning aids and course materials formatted to exact display requirements across devices.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;UI/UX design:&lt;/STRONG&gt; Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;Trust and safety&lt;/H1&gt;
&lt;P&gt;At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. &lt;A href="https://techcommunity.microsoft.com/t5/aka/ms/RAI" target="_blank" rel="noopener"&gt;Learn more about responsible AI at Microsoft.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content.&lt;/P&gt;
&lt;H1&gt;Pricing&lt;/H1&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="width: 1063px; height: 180px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Offer type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Pricing - Image&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;&lt;STRONG&gt;Pricing - Text&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;GPT-image-2&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Standard Global&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Input Tokens: $8&lt;/P&gt;
&lt;P&gt;Cached Input Tokens: $2&lt;/P&gt;
&lt;P&gt;Output Tokens: $30&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Input Tokens: $5&lt;/P&gt;
&lt;P&gt;Cached Input Tokens: $1.25&lt;/P&gt;
&lt;P&gt;Output Tokens: $10&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Note: All prices are per 1M token.&amp;nbsp;&lt;/P&gt;
&lt;H1&gt;Getting started&lt;/H1&gt;
&lt;P&gt;Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. &lt;STRONG&gt;GPT-image-2&lt;/STRONG&gt; gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the &lt;STRONG&gt;GPT-image-2&lt;/STRONG&gt; in Microsoft Foundry today!&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A class="lia-external-url" href="https://ai.azure.com/" target="_blank" rel="noopener"&gt;Deploy the model in Microsoft Foundry&lt;/A&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A href="https://ai.azure.com/playground/" target="_blank" rel="noopener"&gt;Experiment with the model in the Image playground&lt;/A&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/dall-e?tabs=command-line%2Ckeyless%2Ctypescript-keyless&amp;amp;pivots=programming-language-studio" target="_blank" rel="noopener"&gt;Read the documentation to learn more&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 20:52:15 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-openai-s-gpt-image-2-in-microsoft-foundry/ba-p/4500571</guid>
      <dc:creator>Naomi Moneypenny</dc:creator>
      <dc:date>2026-04-21T20:52:15Z</dc:date>
    </item>
    <item>
      <title>Troubleshooting Microsoft Foundry Accessing On‑Premises APIs Over Private Networking</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/troubleshooting-microsoft-foundry-accessing-on-premises-apis/ba-p/4498549</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Audience:&lt;/STRONG&gt; Azure solution architects, network engineers, and AI practitioners deploying &lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry&lt;/SPAN&gt; in enterprise, network‑isolated environments.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Scenario:&lt;/STRONG&gt; &lt;SPAN data-olk-copy-source="MessageBody"&gt;Foundry Agent Service&lt;/SPAN&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;&amp;nbsp;&lt;/SPAN&gt;must call an on‑premises (or privately hosted) API over VPN or ExpressRoute using &lt;STRONG&gt;on‑premises corporate DNS&lt;/STRONG&gt;. Connectivity works from a virtual machine (VM) in the virtual network (VNet), but fails when the same call is made from a Foundry agent.&lt;/P&gt;
&lt;P&gt;This post consolidates common field patterns observed across customer engagements and maps them directly to official Microsoft guidance. It highlights a frequently missed prerequisite for private connectivity: &lt;STRONG&gt;Project and Agent Capability Hosts&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;The Repeating Enterprise Pattern&lt;/H2&gt;
&lt;P&gt;The architecture is familiar:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry&lt;/SPAN&gt; account and project&lt;/LI&gt;
&lt;LI&gt;Customer‑managed VNet with VPN or ExpressRoute to on‑premises&lt;/LI&gt;
&lt;LI&gt;Corporate (on‑premises) DNS used for API name resolution&lt;/LI&gt;
&lt;LI&gt;A VM in the VNet can successfully resolve and call the on‑prem API&lt;/LI&gt;
&lt;LI&gt;Foundry agents are configured to call the same API (often via an OpenAPI tool)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Despite this, the agent fails with one or more of the following:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;DNS resolution failures&lt;/LI&gt;
&lt;LI&gt;Connection timeouts&lt;/LI&gt;
&lt;LI&gt;HTTP 401 or 403 responses&lt;/LI&gt;
&lt;LI&gt;Unexpected backend or proxy URLs appearing in logs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The consistent question is:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why does this work from a VM in the VNet but not from the Foundry agents?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Key Principle: Foundry Agents Do Not Automatically Run in Your VNet&lt;/H2&gt;
&lt;P&gt;This is the most important mental model to reset.&lt;/P&gt;
&lt;P&gt;Creating a private endpoint for a &lt;SPAN data-olk-copy-source="MessageBody"&gt;Foundry Agent Service&lt;/SPAN&gt;&amp;nbsp;&lt;STRONG&gt;does not place agent runtime traffic into your VNet&lt;/STRONG&gt;. Private endpoints are &lt;STRONG&gt;inbound&lt;/STRONG&gt; constructs. Outbound connectivity from an agent only flows through your VNet when specific requirements are met.&lt;/P&gt;
&lt;P&gt;The most critical of those requirements is a &lt;STRONG&gt;Capability Host&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;What Is a Capability Host?&lt;/H2&gt;
&lt;P&gt;A &lt;STRONG&gt;Capability Host&lt;/STRONG&gt; defines &lt;EM&gt;where&lt;/EM&gt; Foundry capabilities are allowed to execute.&lt;/P&gt;
&lt;P&gt;In private networking scenarios, the capability host:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Binds a Foundry &lt;STRONG&gt;project&lt;/STRONG&gt; or &lt;STRONG&gt;agent&lt;/STRONG&gt; to a customer‑managed subnet&lt;/LI&gt;
&lt;LI&gt;Enables platform‑managed container injection into that subnet&lt;/LI&gt;
&lt;LI&gt;Ensures outbound traffic follows VNet routing, security controls, and DNS configuration&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Capability Host Scope&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Project Capability Host&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Associated at the project level&lt;/LI&gt;
&lt;LI&gt;Applies to all agents in the project&lt;/LI&gt;
&lt;LI&gt;Defines the customer subnet the project is allowed to use&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Agent Capability Host&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Associated at the individual agent level&lt;/LI&gt;
&lt;LI&gt;Can explicitly bind or override subnet placement&lt;/LI&gt;
&lt;LI&gt;Useful when agents require different isolation boundaries&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Key field insight:&lt;/STRONG&gt; If no capability host is associated, the agent runtime is not injected into the VNet—even if VPN, ExpressRoute, private endpoints, and on‑prem DNS are correctly configured.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Capability Host Lifecycle (Conceptual)&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp;&lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry&lt;/SPAN&gt; Account&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ↓&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Project&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ↓ Capability Host&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ↓ Delegated Agent Subnet (VNet)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ↓ Agent Runtime (Container Injection)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Without the capability host step, the chain breaks and the agent executes outside the customer network boundary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Why On‑Premises DNS Appears Correct—but Still Fails&lt;/H2&gt;
&lt;P&gt;DNS is where most investigations stall.&lt;/P&gt;
&lt;P&gt;Teams typically confirm:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;On‑premises DNS resolves the API hostname&lt;/LI&gt;
&lt;LI&gt;VNet DNS settings forward queries to on‑prem DNS&lt;/LI&gt;
&lt;LI&gt;A VM in the subnet resolves and reaches the API&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Yet the agent still fails.&lt;/P&gt;
&lt;P&gt;The reason is simple:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The VM is unquestionably inside the VNet&lt;/LI&gt;
&lt;LI&gt;The agent may not be&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Without a capability host, the agent runtime does &lt;STRONG&gt;not&lt;/STRONG&gt; inherit:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;VNet DNS server settings&lt;/LI&gt;
&lt;LI&gt;Corporate DNS forwarding rules&lt;/LI&gt;
&lt;LI&gt;On‑premises name resolution paths&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As a result, DNS fails even though the &lt;STRONG&gt;DNS design itself is correct&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Secondary Symptom: HTTP 401 Errors&lt;/H2&gt;
&lt;P&gt;After subnet injection and DNS are corrected, some customers encounter HTTP 401 responses.&lt;/P&gt;
&lt;P&gt;This typically means:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The API is now reachable&lt;/LI&gt;
&lt;LI&gt;The request is successfully routed through the private path&lt;/LI&gt;
&lt;LI&gt;Authentication or authorization is failing&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;At this stage, troubleshooting moves from networking to identity:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Validate the credential or token configured in the Foundry connection&lt;/LI&gt;
&lt;LI&gt;Confirm expected headers, audience, or auth flow&lt;/LI&gt;
&lt;LI&gt;Account for managed proxy hops in the request path&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A 401 at this point is progress—it confirms private connectivity is working.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Permissions and Required Services for Capability Hosts&lt;/H2&gt;
&lt;P&gt;Creating capability hosts requires both the correct Azure role-based access control (RBAC) permissions and the presence of specific dependent services. These prerequisites are frequently overlooked and can silently block capability host creation or leave hosts in a failed provisioning state.&lt;/P&gt;
&lt;H3&gt;Required Permissions (RBAC)&lt;/H3&gt;
&lt;P&gt;Microsoft documentation explicitly calls out the following minimum permissions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Contributor&lt;/STRONG&gt; role on the &lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry&lt;/SPAN&gt; account to create capability hosts.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;User Access Administrator&lt;/STRONG&gt; or &lt;STRONG&gt;Owner&lt;/STRONG&gt; role on the subscription or resource group to grant the Foundry project’s managed identity access to dependent Azure resources when using standard agent setup.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Without these roles, capability host creation may fail, or the host may be created without access to required downstream services. For details, see Role-based access control (RBAC) for Microsoft Foundry.&lt;/P&gt;
&lt;H3&gt;Required Azure Services and Connections&lt;/H3&gt;
&lt;P&gt;For standard and network-secured agent setups, capability hosts reference customer-owned Azure resources. The following services must exist and be connected to the Foundry project &lt;STRONG&gt;before&lt;/STRONG&gt; creating the capability host:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Storage&lt;/STRONG&gt; – for file uploads and artifacts&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure AI Search&lt;/STRONG&gt; – for vector stores and retrieval&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Cosmos DB&lt;/STRONG&gt; – for thread and conversation storage&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure AI Services / Azure OpenAI&lt;/STRONG&gt; – for model execution&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These services must be deployed in supported regions and, for private networking scenarios, in the same region as the virtual network. Capability hosts reference these resources through project connections, not raw resource IDs .&lt;/P&gt;
&lt;H3&gt;Networking-Specific Requirements&lt;/H3&gt;
&lt;P&gt;When using private networking:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;STRONG&gt;agent subnet&lt;/STRONG&gt; must already exist and be delegated appropriately.&lt;/LI&gt;
&lt;LI&gt;The capability host must reference the correct &lt;STRONG&gt;customer subnet&lt;/STRONG&gt; at creation time.&lt;/LI&gt;
&lt;LI&gt;Required private endpoints and DNS resolution must be in place for dependent services.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If networking or connections change, capability hosts cannot be updated in-place and must be deleted and recreated with the corrected configuration .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Practical Fix Patterns&lt;/H2&gt;
&lt;H3&gt;1. Create and Associate a Project Capability Host&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Bind the project to the intended delegated agent subnet&lt;/LI&gt;
&lt;LI&gt;Verify the customerSubnet reference&lt;/LI&gt;
&lt;LI&gt;Redeploy the agent after association&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This aligns directly with the &lt;STRONG&gt;Standard Setup with Private Networking&lt;/STRONG&gt; model documented by Microsoft.&lt;/P&gt;
&lt;H3&gt;2. Validate Agent Placement and Network Inheritance&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Confirm the agent is associated with the expected capability host&lt;/LI&gt;
&lt;LI&gt;Verify the capability host references the correct subnet&lt;/LI&gt;
&lt;LI&gt;Ensure network and DNS settings are applied at the subnet level&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The agent inherits routing and DNS behavior only after successful subnet injection.&lt;/P&gt;
&lt;H3&gt;3. Validate DNS From the Agent Subnet&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Confirm VNet DNS settings point to on‑prem DNS&lt;/LI&gt;
&lt;LI&gt;Test name resolution from a VM in the &lt;STRONG&gt;same subnet&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Once injected, the agent uses the same DNS behavior as other resources in that subnet.&lt;/P&gt;
&lt;H3&gt;4. Use a Supported Foundry Experience&lt;/H3&gt;
&lt;P&gt;Be aware of documented constraints:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;End‑to‑end network isolation is &lt;STRONG&gt;not supported&lt;/STRONG&gt; in the newer Foundry portal experience&lt;/LI&gt;
&lt;LI&gt;Network‑isolated agent scenarios require the &lt;STRONG&gt;classic Foundry experience&lt;/STRONG&gt;, SDK, or CLI&lt;/LI&gt;
&lt;LI&gt;Hosted agents do not support full isolation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Mismatch here can make a correct network design appear broken.&lt;/P&gt;
&lt;H2&gt;A Simple Checklist&lt;/H2&gt;
&lt;P&gt;When a Foundry agent cannot reach an on‑prem API:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Is a **Project Capability **&lt;STRONG&gt;Host&lt;/STRONG&gt; associated?&lt;/LI&gt;
&lt;LI&gt;Is the capability host bound to the correct subnet?&lt;/LI&gt;
&lt;LI&gt;Is on‑prem DNS reachable from that subnet?&lt;/LI&gt;
&lt;LI&gt;Is a supported Foundry experience in use?&lt;/LI&gt;
&lt;LI&gt;If reachable, is authentication configured correctly?&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;If items 1 and 2 are not satisfied, all other troubleshooting is premature.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Closing&lt;/H2&gt;
&lt;P&gt;Most private networking issues with &lt;SPAN data-olk-copy-source="MessageBody"&gt;Microsoft Foundry&lt;/SPAN&gt; are not caused by VPNs or DNS infrastructure. They result from an incomplete understanding of **where the agent ****runtime ****actually **&lt;STRONG&gt;executes&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Capability Hosts are the control point. When they are correctly configured, Foundry agents behave exactly as described in Microsoft guidance: they inherit VNet routing, DNS, and security controls and can securely access on‑premises systems over private connectivity.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;No capability host = no VNet injection = no on‑prem connectivity.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;P&gt;The following Microsoft‑published resources were referenced and aligned with throughout this article:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Network‑secured agent setup (GitHub)&lt;/STRONG&gt; – Reference implementation demonstrating a network‑secured Foundry Agent Service deployment with a customer‑managed virtual network, delegated agent subnet, and private connectivity patterns. This notebook illustrates how agent runtimes inherit network behavior only after subnet injection &lt;A href="https://github.com/ralacher/network-secured-agent" target="_blank" rel="noopener"&gt;ralacher/network-secured-agent&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/foundry/agents/how-to/virtual-networks?view=foundry-classic" target="_blank" rel="noopener"&gt;Set up private networking for Foundry Agent Service - Microsoft Foundry | Microsoft Learn&lt;/A&gt;s .&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 20 Apr 2026 13:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/troubleshooting-microsoft-foundry-accessing-on-premises-apis/ba-p/4498549</guid>
      <dc:creator>pankajag</dc:creator>
      <dc:date>2026-04-20T13:00:00Z</dc:date>
    </item>
    <item>
      <title>Need Guidance on cost breakdown of Microsoft Foundry Agent portal I created</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/need-guidance-on-cost-breakdown-of-microsoft-foundry-agent/m-p/4512815#M1449</link>
      <description>&lt;P&gt;I have developed a complaint handling portal for customers and employees using &lt;STRONG&gt;Azure AI Foundry&lt;/STRONG&gt;. The solution is built with Foundry agents, models from the catalog, input/output caching, agent logging/tracing, and other Foundry capabilities. The frontend and orchestration layer are deployed on &lt;STRONG&gt;Azure Container Apps&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;While Azure Cost Analysis provides an overview of spending, several parts remain unclear or act as a black box for accurate estimation, including:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Token consumption assumptions (input/output tokens across different models and agents)&lt;/LI&gt;&lt;LI&gt;User concurrency, sessions, and behavior patterns&lt;/LI&gt;&lt;LI&gt;Agent logging and observability costs&lt;/LI&gt;&lt;LI&gt;Impact of input/output caching&lt;/LI&gt;&lt;LI&gt;Detailed resource consumption and billing in Azure Container Apps&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What is the best way to accurately calculate or estimate the total running cost for such an Azure AI Foundry-based platform with Container Apps frontend?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are there official Microsoft documentation, pricing guides, or reference architectures for cost breakdown? How do companies typically present costs for such AI platforms to attract customers (e.g., TCO models or per-user pricing)? I want to know how the platform costs are shown to customers.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 07:20:01 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/need-guidance-on-cost-breakdown-of-microsoft-foundry-agent/m-p/4512815#M1449</guid>
      <dc:creator>Tasmia_Monzoor</dc:creator>
      <dc:date>2026-04-20T07:20:01Z</dc:date>
    </item>
    <item>
      <title>Unable to add SharePoint site as a tool in Foundry Agent (403 – User does not have valid license)</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/unable-to-add-sharepoint-site-as-a-tool-in-foundry-agent-403/m-p/4512460#M1448</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I’m very new to&amp;nbsp;&lt;STRONG&gt;Foundry&lt;/STRONG&gt; and I’m trying to add a&amp;nbsp;&lt;STRONG&gt;SharePoint site as a tool&lt;/STRONG&gt; (SharePoint grounding) in a &lt;STRONG&gt;Foundry Agent&lt;/STRONG&gt;, but it fails with:&lt;/P&gt;&lt;P&gt;HTTP 403 – Forbidden Authorization Failed – User does not have valid license Tool: sharepoint_grounding&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error{"error": "Tool_User_Error", "message": "[Sharepoint-tool] Request to Graph API failed with HTTP status 403, error-code: Forbidden and error-message: Authorization Failed - User does not have valid license. Client Request Id: 0000000000000000000000. Find out more troubleshooting details here - https://aka.ms/foundrysharepointtroubleshooting", "code": "sharepoint_grounding_tool_user_error", "tool": "sharepoint_grounding", "allow_retry": false, "extra_info": null}&lt;/P&gt;&lt;P&gt;Azure roles, Graph permissions, and SharePoint access are all correctly configured (Owner, Azure AI Admin/Developer/User), and the SharePoint site is accessible outside Foundry. Despite this, Foundry blocks the tool with a license error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help or guidance would be very much appreciated.&lt;BR /&gt;regards&lt;BR /&gt;Angela&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2026 13:45:11 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/unable-to-add-sharepoint-site-as-a-tool-in-foundry-agent-403/m-p/4512460#M1448</guid>
      <dc:creator>Angela2</dc:creator>
      <dc:date>2026-04-17T13:45:11Z</dc:date>
    </item>
    <item>
      <title>Claude Opus 4.7 is available on Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/claude-opus-4-7-is-available-on-microsoft-foundry/ba-p/4511759</link>
      <description>&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Anthropic continues to push the frontier of AI for real-world, production work with its new model&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Claude Opus 4.7.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Opus 4.7 is Anthropic’s most advanced generally available model.&amp;nbsp; Teams&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt; building on Azure can access &lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/claude-opus-4-7" target="_blank" rel="noopener"&gt;Claude Opus 4.7 through Microsoft Foundry &lt;/A&gt;&amp;nbsp;, today. This model is d&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;esigned for the&amp;nbsp;workflows&amp;nbsp;enterprises&amp;nbsp;run&amp;nbsp;and delivers meaningful gains across&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;agentic coding, long-running autonomous tasks, and professional work&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;For organizations already building with Claude Opus 4.6, Opus 4.7&amp;nbsp;represents&amp;nbsp;a powerful upgrade bringing stronger instruction following, better vision and office outputs, and improved memory capabilities to enable enhanced performance across complex, multi-step&amp;nbsp;workflows..&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;What Microsoft Foundry brings to the table&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Microsoft Foundry is Microsoft's unified platform for building, deploying, and governing AI applications at scale. It gives enterprise teams a single control plane for model selection, data connections,&amp;nbsp;observability&amp;nbsp;and access controls all backed by Azure's global infrastructure and security layer.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Bringing Claude Opus 4.7 into that environment means enterprises get frontier model capability without&amp;nbsp;compromising&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;governance&amp;nbsp;properties their security and legal teams&amp;nbsp;require.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Developers call Claude Opus 4.7 through Foundry's standard&amp;nbsp;model&amp;nbsp;APIs, which means existing tool chains, SDKs, and prompt harnesses require minimal changes. For teams already running Opus 4.6 in Microsoft Foundry, this is a direct upgrade path.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;What Opus 4.7 adds for enterprise workloads&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:480,&amp;quot;335559739&amp;quot;:140}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;These improvements land directly in production workflows. Enterprises running Opus 4.6 today should see measurable gains in long-running tasks, coding,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;visual capabilities, instruction following, and reasoning. .&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;The model reasons through underspecified requests, making sensible assumptions and stating them clearly rather than stalling on clarifying questions&amp;nbsp; a practical improvement for any workflow that touches real-world data.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp; Check out &lt;A class="lia-external-url" href="https://www.anthropic.com/news/claude-opus-4-7" target="_blank" rel="noopener"&gt;Anthropic blog &lt;/A&gt;for benchmark table&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:15594741,&amp;quot;335559739&amp;quot;:360,&amp;quot;335572083&amp;quot;:13,&amp;quot;335572084&amp;quot;:0,&amp;quot;335572085&amp;quot;:14514047,&amp;quot;469789810&amp;quot;:&amp;quot;single&amp;quot;}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Built-in enterprise governance&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:480,&amp;quot;335559739&amp;quot;:140}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Running Opus 4.7 through Microsoft Foundry means every inference request inherits Foundry's enterprise controls: Azure Active Directory for identity and access, private networking via&amp;nbsp;VNet&amp;nbsp;and private endpoints,&amp;nbsp;logging&amp;nbsp;and audit trails through Azure Monitor.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Teams&amp;nbsp;don't&amp;nbsp;need to build a separate governance layer around the model. Foundry handles it, and security teams already know how to audit it.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Key use cases by vertical&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559738&amp;quot;:480,&amp;quot;335559739&amp;quot;:140}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Getting started&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;&lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/claude-opus-4-7" target="_blank" rel="noopener"&gt;Claude Opus 4.7 is available today in Microsoft Foundry's model catalog&lt;/A&gt;. Teams already using Anthropic models on Microsoft Foundry can upgrade through the catalog with no infrastructure changes. New teams can deploy through the standard Foundry provisioning flow.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;Claude Code is available via the Anthropic API routed through Microsoft Foundry; teams can point their existing Claude Code installations at their Foundry endpoint with a single configuration change. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="none"&gt;GitHub Copilot CLI with Claude is also available to GitHub Copilot Enterprise subscribers today.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt; &lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{&amp;quot;335557856&amp;quot;:16777215,&amp;quot;335559739&amp;quot;:240}"&gt;&lt;SPAN data-contrast="none"&gt;Anthropic has provided a &lt;A class="lia-external-url" href="https://platform.claude.com/docs/en/about-claude/models/migration-guide#migrating-to-claude-opus-4-7" target="_blank" rel="noopener"&gt;migration guide &lt;/A&gt;covering the key differences between Opus 4.6 and Opus 4.7 to help teams get the most out of the new model with minimal harness changes.&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2026 16:20:43 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/claude-opus-4-7-is-available-on-microsoft-foundry/ba-p/4511759</guid>
      <dc:creator>amar_badal</dc:creator>
      <dc:date>2026-04-16T16:20:43Z</dc:date>
    </item>
    <item>
      <title>Introducing MAI-Image-2-Efficient: Faster, More Efficient Image Generation</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-mai-image-2-efficient-faster-more-efficient-image/ba-p/4510918</link>
      <description>&lt;H4&gt;&lt;STRONG&gt;Building on our momentum&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Just last week, we celebrated a major milestone: the public preview launch of three new first-party Microsoft AI models in Microsoft Foundry: &lt;STRONG&gt;MAI-Image-2&lt;/STRONG&gt;, &lt;STRONG&gt;MAI-Voice-1&lt;/STRONG&gt;, and &lt;STRONG&gt;MAI-Transcribe-1&lt;/STRONG&gt;. Together, they represent a comprehensive multimedia AI stack purpose-built for developers that spans image generation, natural speech synthesis, and enterprise-grade transcription across 25 languages.&lt;/P&gt;
&lt;P&gt;The response from the developer community has been incredible, and we're not slowing down.&lt;/P&gt;
&lt;P&gt;Fast on the heels of that launch, we're thrilled to introduce the next addition to the MAI image generation family: &lt;STRONG&gt;MAI-Image-2-Efficient – or Image-2e for short. &lt;/STRONG&gt;It’s now available in public preview in &lt;A class="lia-external-url" href="https://aka.ms/mai-image-2e-foundrycard" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt; and &lt;A href="https://msi-playground.microsoft.com/chat" target="_blank" rel="noopener"&gt;MAI Playground&lt;/A&gt;.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What makes MAI-Image-2-Efficient unique?&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;MAI-Image-2-Efficient is built on the same architecture as MAI-Image-2 which is the model that debuted at #3 on the Arena.ai leaderboard for image model families. Based on customer feedback, we’ve now improved it and engineered for speed and efficiency.&lt;/P&gt;
&lt;P&gt;It’s up to 22%&amp;nbsp;faster&amp;nbsp;with&amp;nbsp;4x more efficiency&amp;nbsp;compared to MAI-Image-2 when&amp;nbsp;normalized by latency and GPU usage&lt;SUP&gt;1&lt;/SUP&gt;.&amp;nbsp;It also outpaces leading text-to-image models by 40% on average&lt;SUP&gt;2&lt;/SUP&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In short, MAI-Image-2-Efficient gives developers more output for less compute&lt;/STRONG&gt;, unlocking a whole new category of use cases.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Who is MAI-Image-2-Efficient for?&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;MAI-Image-2-Efficient is designed for builders who need &lt;STRONG&gt;high-quality image generation at speed and scale&lt;/STRONG&gt;. Here are the top use cases where Image-2-Efficient shines:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;High-volume production workflows&lt;/STRONG&gt;:&lt;STRONG&gt; &lt;/STRONG&gt;E-commerce platforms, media companies, and marketing teams often need to generate thousands of images per day, as part of their business processes to generate targeted advertisements, concept art and mood boards. MAI-Image-2-Efficient's superior efficiency means larger batches at lower GPU cost, so your team can think and iterate as fast as you want and reach the&amp;nbsp;end-product&amp;nbsp;faster.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Real-time and conversational experiences: &lt;/STRONG&gt;When users expect images to appear mid-conversation (in a chatbot, a creative copilot, or an AI-powered design tool), every&amp;nbsp;millisecond counts. Thanks to its lower latency, MAI-Image-2-Efficient serves as an excellent backbone for interactive applications that require fast response times.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Rapid prototyping and creative iteration:&lt;/STRONG&gt; MAI-Image-2-Efficient enables your team to quickly and affordably test new pipelines, experiment with creative ideas, or refine prompts. You don't need the complete model to validate a concept; what you need is speed, and that's exactly what MAI-Image-2-Efficient provides.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;H3&gt;&lt;STRONG&gt;MAI-Image-2 vs. MAI-Image-2-Efficient — which should you use?&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;MAI-Image-2-Efficient and MAI-Image-2 are built for different strengths, so choosing the right model depends on the needs of your workflow.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;MAI-Image-2-Efficient&lt;/STRONG&gt; is the ideal choice for high-volume workflows where latency and speed are priorities. If your pipeline needs to generate images quickly and at scale, MAI-Image-2-Efficient delivers without compromise.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;MAI-Image-2&lt;/STRONG&gt; is the recommended&amp;nbsp;option&amp;nbsp;when your images require precise, detailed text rendering, or when scenes demand the deepest photorealistic contrast and smoothness.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The two models also have distinct visual signatures:&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;MAI-Image-2-Efficient&amp;nbsp;renders&amp;nbsp;with sharpness and defined lines, making it a strong choice for illustration, animation, and photoreal images designed to grab attention.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;MAI-Image-2 delivers smoother, more nuanced contrast, making it the go-to for photorealistic imagery that prioritizes depth and subtlety.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;STRONG&gt;Try it today&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;MAI-Image-2-Efficient is available now in &lt;A class="lia-external-url" href="https://aka.ms/mai-image-2e-foundrycard" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt; and &lt;A href="https://msi-playground.microsoft.com/chat" target="_blank" rel="noopener"&gt;MAI Playground&lt;/A&gt;. For builders in Foundry, MAI-Image-2-Efficient starts at $5 USD per 1M tokens for text input and $19.50 USD per 1M tokens for image output.&lt;/P&gt;
&lt;P&gt;And this is just the beginning. We have more exciting announcements lined up; stay tuned for what we're bringing to &lt;STRONG&gt;Microsoft Build&lt;/STRONG&gt; &lt;STRONG&gt;2026&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;References:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;EM&gt;As tested on April 13, 2026. Compared to MAI-Image-2 when normalized by latency and GPU usage. Throughput per GPU vs MAI-Image-2 on NVIDIA H100 at 1024×1024; measured with optimized batch sizes and matched latency targets. Results vary with batch size, concurrency, and latency constraints.&lt;/EM&gt;&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;As tested on April 13, 2026. Compared to Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image and Gemini 3 Pro Image: Measured at p50 latency via AI Studio API (1:1, 1K images; minimal reasoning unless noted; web search disabled). MAI-Image-2, MAI-Image-2-Efficient, GPT-Image-1.5-High: Measured at p50 latency via Foundry API.&amp;nbsp;&lt;/EM&gt;&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 16:00:17 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/introducing-mai-image-2-efficient-faster-more-efficient-image/ba-p/4510918</guid>
      <dc:creator>Naomi Moneypenny</dc:creator>
      <dc:date>2026-04-14T16:00:17Z</dc:date>
    </item>
    <item>
      <title>Gemma 4 now available in Microsoft Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/gemma-4-now-available-in-microsoft-foundry/ba-p/4510984</link>
      <description>&lt;P&gt;Experimenting with open-source models has become a core part of how innovative AI teams stay competitive: experimenting with the latest architectures and often fine-tuning on proprietary data to achieve lower latencies and cost.&lt;/P&gt;
&lt;P&gt;Today, we’re happy to announce that the &lt;STRONG&gt;Gemma 4 family&lt;/STRONG&gt;, Google DeepMind’s newest model family, is now available in &lt;A href="https://techcommunity.microsoft.com/t5/ai.azure.com" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt; via the &lt;A href="https://aka.ms/hf/foundry-models" target="_blank" rel="noopener"&gt;Hugging Face collection.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Azure customers can now discover, evaluate, and deploy Gemma 4 &lt;STRONG&gt;inside their Azure environment&lt;/STRONG&gt; with the same policies they rely on for every other workload. Foundry is the only hyperscaler platform where developers can access OpenAI, Anthropic, Gemma, and over 11,000+ models under a single control plane. Through our close collaboration with&lt;STRONG&gt; Hugging Face,&lt;/STRONG&gt; Gemma 4 joining that collection continues Microsoft’s push to bring customers the widest selection of models from any cloud – and fits in line with our enhanced investments in open-source development.&lt;/P&gt;
&lt;H1&gt;&lt;STRONG&gt;Frontier Intelligence, open-source weights&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P&gt;Released by Google DeepMind on April 2, 2026, Gemma 4 is built from the same research foundation as Gemini 3 and packaged as open weights under an &lt;STRONG&gt;Apache 2.0 license&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Key capabilities across the Gemma 4 family:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Native multimodal:&lt;/STRONG&gt; Text + image + video inputs across all sizes; analyze video by processing sequences of frames; audio input on edge models (E2B, E4B)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Enhanced reasoning &amp;amp; coding capabilities:&lt;/STRONG&gt; Multi-step planning, deep logic, and improvements in math and instruction-following enabling autonomous agents&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Trained for global deployment:&lt;/STRONG&gt; Pretrained on 140+ languages with support for 35+ languages out of the box&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Long context:&lt;/STRONG&gt; Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B) allow developers to reason across extensive codebases, lengthy documents, or multi-session histories&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;Why choose Foundry?&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P&gt;Foundry is built to give developers breadth -- access to models from major model providers, open and proprietary, under one roof.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Stay within Azure to work leading models.&lt;/STRONG&gt; When you deploy through Foundry, models run inside your Azure environment and are subject to the same network policies, identity controls, and audit processes your organization already has in place.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Managed online endpoints &lt;/STRONG&gt;handle serving, scaling, and monitoring without manually setting up and managing the underlying infrastructure.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Serverless deployment with Azure Container Apps &lt;/STRONG&gt;allows developers to deploy and run containerized applications while reducing infrastructure management and saving costs.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Gated model access&lt;/STRONG&gt; integrates directly with Hugging Face user tokens, so models that require license acceptance stay compliant can be accessed without manual approvals.&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://www.foundrylocal.ai/" target="_blank" rel="noopener"&gt;Foundry Local&lt;/A&gt; lets you run optimized Hugging Face models directly on your own hardware using the same model catalog and SDK patterns as your cloud deployments. Read the documentation here: &lt;A class="lia-external-url" href="https://aka.ms/foundrylocal" target="_blank" rel="noopener"&gt;https://aka.ms/foundrylocal&lt;/A&gt; and &lt;A class="lia-external-url" href="https://aka.ms/HF/foundrylocal" target="_blank" rel="noopener"&gt;https://aka.ms/HF/foundrylocal&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Microsoft’s approach to&amp;nbsp;&lt;A class="lia-external-url" href="https://aka.ms/RAI" target="_blank" rel="noopener"&gt;Responsible AI&lt;/A&gt;&lt;/STRONG&gt; is grounded in our AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy new models responsibly in production environments.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;What are teams building with Gemma 4 in Foundry&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P&gt;Gemma 4’s combination of multimodal input, agentic function calling, and long context offers a wide range of production use cases:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Document intelligence:&lt;/STRONG&gt; Processing PDFs, charts, invoices, and complex tables using native vision capabilities&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Multilingual enterprise apps:&lt;/STRONG&gt; 140+ natively trained languages — ideal for multinational customer support, content platforms as well as language learning tools for grammar correction and writing practice&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Long-context analytics:&lt;/STRONG&gt; Reasoning across entire codebases, legal documents, or multi-session conversation histories&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;Getting started&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P&gt;Try &lt;A class="lia-external-url" href="https://aka.ms/hf/foundry-models" target="_blank" rel="noopener"&gt;Gemma 4&lt;/A&gt; in &lt;A class="lia-external-url" href="http://ai.azure.com" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt; today. New models from Hugging Face continue to roll out to Foundry on a regular basis through our ongoing collaboration. If there's a model you want to see added, &lt;A class="lia-external-url" href="https://github.com/huggingface/Microsoft-Azure/issues/new" target="_blank" rel="noopener"&gt;let us know here&lt;/A&gt;. Stay connected to our developer community on &lt;A class="lia-external-url" href="https://discord.com/invite/microsoftfoundry" target="_blank" rel="noopener"&gt;Discord&lt;/A&gt; and stay up to date on what is new in Foundry through the &lt;A class="lia-external-url" href="https://aka.ms/hf/model-mondays" target="_blank" rel="noopener"&gt;Model Mondays series&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 16:05:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/gemma-4-now-available-in-microsoft-foundry/ba-p/4510984</guid>
      <dc:creator>vaidyas</dc:creator>
      <dc:date>2026-04-14T16:05:26Z</dc:date>
    </item>
    <item>
      <title>Building Real-Time Speech Translation with AI Avatars with Azure Speech Services</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/building-real-time-speech-translation-with-ai-avatars-with-azure/ba-p/4490972</link>
      <description>&lt;ARTICLE&gt;
&lt;H2&gt;Introduction&lt;/H2&gt;
&lt;P&gt;  Language barriers remain one of the biggest challenges in communication. Whether you are holding an all-hands meeting for a globally distributed team, consulting with non-native speaking patients, or teaching students across continents—seamless, real-time translation makes or breaks effective communication.Traditional translation tools feel impersonal and disconnected. Text captions scroll across screens while speakers continue in their native tongue, creating a disjointed experience. What if your audience could see and hear an AI avatar speaking directly to them in their own language, with natural lip-sync and human-like expressions?&lt;/P&gt;
&lt;P&gt;  We can use&lt;STRONG&gt; Azure Speech Translation and Avatar&lt;/STRONG&gt; to address this: a speaker talks in one language, and listeners watch an AI avatar deliver the translated speech in their chosen language. Imagine a CEO in Tokyo delivering a quarterly update. Employees in Munich, São Paulo, and Mumbai each see an AI avatar speaking to them in German, Portuguese, and Hindi respectively—all in real-time, with synchronized lip movements and natural speech patterns. The speaker focuses on their message; the technology handles the rest.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this blog, we will discuss a sample implementation that used Azure Speech, Translation and Avatar capabilities.&lt;/P&gt;
&lt;H2&gt;How It Works&lt;/H2&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;📚 &lt;STRONG&gt;Ready to build your own real-time translation avatar application? Grab the complete source code and documentation from GitHub &lt;/STRONG&gt;: &lt;A href="https://github.com/l-sudarsan/avatar-translation" target="_blank" rel="noopener"&gt;github.com/l-sudarsan/avatar-translation&lt;/A&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The application uses a &lt;STRONG&gt;session-based Speaker/Listener architecture&lt;/STRONG&gt; to separate the presenter's control interface from the audience's viewing experience. The speaker can create and configure a session based on requirements&lt;/P&gt;
&lt;H5&gt;Speaker Mode&lt;/H5&gt;
&lt;P&gt;The speaker interface gives presenters full control over the translation session:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Session Management&lt;/STRONG&gt;: Create sessions and generate shareable listener URLs&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Language Configuration&lt;/STRONG&gt;: Select source language (what you speak) and target language (what listeners hear)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Avatar Selection&lt;/STRONG&gt;: Choose from prebuilt or custom avatars for the translation output&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Real-time Feedback&lt;/STRONG&gt;: View live transcription of your speech and monitor listener count&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No Avatar Display&lt;/STRONG&gt;: The interface intentionally hides the avatar video/audio to prevent microphone feedback loops&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;Listener Mode&lt;/H5&gt;
&lt;P&gt;The listener interface delivers an immersive, distraction-free viewing experience:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Easy Access&lt;/STRONG&gt;: Join via a simple URL containing the session code (e.g., &lt;CODE&gt;/listener/123456&lt;/CODE&gt;)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Avatar Video&lt;/STRONG&gt;: Watch the AI avatar with synchronized lip movements matching the translated speech&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Translated Audio&lt;/STRONG&gt;: Hear the avatar speak the translation in the target language&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Caption Display&lt;/STRONG&gt;: Read real-time translation text alongside the avatar&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Translation History&lt;/STRONG&gt;: Scroll through all translations from the session&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Data Flow &amp;amp; Solution Components&lt;/H3&gt;
&lt;P&gt;The diagram below shows data flow and how the components interact. The Flask server acts as the central hub, coordinating communication between the speaker's browser, Azure Speech Services, and multiple listener clients.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="architecture-flow"&gt;
&lt;DIV class="flow-box"&gt;
&lt;DIV class="flow-box-title"&gt;&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;HR /&gt;
&lt;H2&gt;Implementation Deep Dive&lt;/H2&gt;
&lt;P&gt;You can check the complete source code in the &lt;A href="https://github.com/l-sudarsan/avatar-translation" target="_blank" rel="noopener"&gt;GitHub repository&lt;/A&gt;.&lt;/P&gt;
&lt;H3&gt;Core Components&lt;/H3&gt;
&lt;P&gt;Five main technical components power the application, each handling a specific part of the translation pipeline.&lt;/P&gt;
&lt;H4&gt;1. Backend: Flask + Socket.IO&lt;/H4&gt;
&lt;P&gt;The server uses &lt;STRONG&gt;Flask&lt;/STRONG&gt; and &lt;STRONG&gt;Flask-SocketIO&lt;/STRONG&gt; with the &lt;STRONG&gt;Eventlet&lt;/STRONG&gt; async worker for WebSocket support. This combination delivers:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;HTTP endpoints&lt;/STRONG&gt; for session management and avatar connection&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;WebSocket rooms&lt;/STRONG&gt; for real-time translation broadcasting&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Session storage&lt;/STRONG&gt; for managing multiple concurrent translation sessions&lt;/LI&gt;
&lt;/UL&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="code-comment"&gt;# Session structure&lt;/SPAN&gt;
sessions = {
    &lt;SPAN class="code-string"&gt;"123456"&lt;/SPAN&gt;: {
        &lt;SPAN class="code-string"&gt;"name"&lt;/SPAN&gt;: &lt;SPAN class="code-string"&gt;"Q1 Townhall"&lt;/SPAN&gt;,
        &lt;SPAN class="code-string"&gt;"source_language"&lt;/SPAN&gt;: &lt;SPAN class="code-string"&gt;"en-US"&lt;/SPAN&gt;,
        &lt;SPAN class="code-string"&gt;"target_language"&lt;/SPAN&gt;: &lt;SPAN class="code-string"&gt;"ja-JP"&lt;/SPAN&gt;,
        &lt;SPAN class="code-string"&gt;"avatar"&lt;/SPAN&gt;: &lt;SPAN class="code-string"&gt;"lisa"&lt;/SPAN&gt;,
        &lt;SPAN class="code-string"&gt;"listeners"&lt;/SPAN&gt;: &lt;SPAN class="code-keyword"&gt;set&lt;/SPAN&gt;(),
        &lt;SPAN class="code-string"&gt;"is_translating"&lt;/SPAN&gt;: &lt;SPAN class="code-keyword"&gt;False&lt;/SPAN&gt;
    }
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H4&gt;2. Audio Streaming: Browser to Server&lt;/H4&gt;
&lt;P&gt;Instead of relying on server-side microphone access, the browser captures audio directly using the &lt;STRONG&gt;Web Audio API&lt;/STRONG&gt;:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="code-comment"&gt;// Speaker captures microphone at 16kHz&lt;/SPAN&gt;
&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; audioContext = &lt;SPAN class="code-keyword"&gt;new&lt;/SPAN&gt; &lt;SPAN class="code-function"&gt;AudioContext&lt;/SPAN&gt;({ sampleRate: &lt;SPAN class="code-string"&gt;16000&lt;/SPAN&gt; });
&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; mediaStream = &lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; navigator.mediaDevices.&lt;SPAN class="code-function"&gt;getUserMedia&lt;/SPAN&gt;({ audio: &lt;SPAN class="code-keyword"&gt;true&lt;/SPAN&gt; });

&lt;SPAN class="code-comment"&gt;// Process audio and send via Socket.IO&lt;/SPAN&gt;
processor.onaudioprocess = (event) =&amp;gt; {
    &lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; pcmData = &lt;SPAN class="code-function"&gt;convertToPCM16&lt;/SPAN&gt;(event.inputBuffer);
    socket.&lt;SPAN class="code-function"&gt;emit&lt;/SPAN&gt;(&lt;SPAN class="code-string"&gt;'audioData'&lt;/SPAN&gt;, { sessionId, audioData: pcmData });
};&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This approach works seamlessly across different deployment environments without requiring server microphone permissions.&lt;/P&gt;
&lt;H4&gt;3. Azure Speech Translation&lt;/H4&gt;
&lt;P&gt;The server receives audio chunks and feeds them to Azure's &lt;STRONG&gt;TranslationRecognizer&lt;/STRONG&gt; via a &lt;STRONG&gt;PushAudioInputStream&lt;/STRONG&gt;:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="code-comment"&gt;# Configure translation&lt;/SPAN&gt;
translation_config = speechsdk.translation.&lt;SPAN class="code-function"&gt;SpeechTranslationConfig&lt;/SPAN&gt;(
    subscription=SPEECH_KEY,
    region=SPEECH_REGION
)
translation_config.speech_recognition_language = &lt;SPAN class="code-string"&gt;"en-US"&lt;/SPAN&gt;
translation_config.&lt;SPAN class="code-function"&gt;add_target_language&lt;/SPAN&gt;(&lt;SPAN class="code-string"&gt;"ja"&lt;/SPAN&gt;)

&lt;SPAN class="code-comment"&gt;# Push audio stream&lt;/SPAN&gt;
push_stream = speechsdk.audio.&lt;SPAN class="code-function"&gt;PushAudioInputStream&lt;/SPAN&gt;()
audio_config = speechsdk.audio.&lt;SPAN class="code-function"&gt;AudioConfig&lt;/SPAN&gt;(stream=push_stream)

&lt;SPAN class="code-comment"&gt;# Handle recognition results&lt;/SPAN&gt;
&lt;SPAN class="code-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="code-function"&gt;on_recognized&lt;/SPAN&gt;(evt):
    translation = evt.result.translations[&lt;SPAN class="code-string"&gt;"ja"&lt;/SPAN&gt;]
    socketio.&lt;SPAN class="code-function"&gt;emit&lt;/SPAN&gt;(&lt;SPAN class="code-string"&gt;'translationResult'&lt;/SPAN&gt;, {
        &lt;SPAN class="code-string"&gt;'original'&lt;/SPAN&gt;: evt.result.text,
        &lt;SPAN class="code-string"&gt;'translated'&lt;/SPAN&gt;: translation
    }, room=session_id)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H4&gt;4. Avatar Synthesis with WebRTC&lt;/H4&gt;
&lt;P&gt;Each listener establishes a &lt;STRONG&gt;WebRTC&lt;/STRONG&gt; connection to Azure's Avatar Service:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;ICE Token Exchange&lt;/STRONG&gt;: Server provides TURN server credentials&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;SDP Negotiation&lt;/STRONG&gt;: Browser and Azure exchange session descriptions&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Avatar Connection&lt;/STRONG&gt;: Listener sends local SDP offer, receives remote answer&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Video Stream&lt;/STRONG&gt;: Avatar video flows directly to listener via WebRTC&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="code-comment"&gt;// Listener connects to avatar&lt;/SPAN&gt;
&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; peerConnection = &lt;SPAN class="code-keyword"&gt;new&lt;/SPAN&gt; &lt;SPAN class="code-function"&gt;RTCPeerConnection&lt;/SPAN&gt;(iceConfig);
&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; offer = &lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; peerConnection.&lt;SPAN class="code-function"&gt;createOffer&lt;/SPAN&gt;();
&lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; peerConnection.&lt;SPAN class="code-function"&gt;setLocalDescription&lt;/SPAN&gt;(offer);

&lt;SPAN class="code-comment"&gt;// Send to Azure Avatar Service&lt;/SPAN&gt;
&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; response = &lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; &lt;SPAN class="code-function"&gt;fetch&lt;/SPAN&gt;(&lt;SPAN class="code-string"&gt;'/api/connectListenerAvatar'&lt;/SPAN&gt;, {
    method: &lt;SPAN class="code-string"&gt;'POST'&lt;/SPAN&gt;,
    headers: { &lt;SPAN class="code-string"&gt;'session-id'&lt;/SPAN&gt;: sessionId },
    body: JSON.&lt;SPAN class="code-function"&gt;stringify&lt;/SPAN&gt;({ sdp: offer.sdp })
});

&lt;SPAN class="code-keyword"&gt;const&lt;/SPAN&gt; { sdp: remoteSdp } = &lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; response.&lt;SPAN class="code-function"&gt;json&lt;/SPAN&gt;();
&lt;SPAN class="code-keyword"&gt;await&lt;/SPAN&gt; peerConnection.&lt;SPAN class="code-function"&gt;setRemoteDescription&lt;/SPAN&gt;({ type: &lt;SPAN class="code-string"&gt;'answer'&lt;/SPAN&gt;, sdp: remoteSdp });&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H4&gt;5. Real-Time Broadcasting&lt;/H4&gt;
&lt;P&gt;When the speaker talks, translations flow to all listeners simultaneously:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Each listener maintains their own WebRTC connection to the Avatar Service, ensuring independent video streams while receiving synchronized translation text.&lt;/P&gt;
&lt;/ARTICLE&gt;
&lt;H3 data-line="284"&gt;WebRTC Avatar Connection Flow&lt;/H3&gt;
&lt;P&gt;The avatar video streaming uses WebRTC for low-latency delivery. Each listener establishes their own peer connection to Azure's Avatar Service through a multi-step handshake process.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;ARTICLE&gt;
&lt;P&gt;&amp;nbsp;Key Design Decisions&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Browser audio capture&lt;/STRONG&gt;: Works in any environment without requiring server microphone permissions&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Session-based rooms&lt;/STRONG&gt;: Isolates translation streams and supports multiple concurrent sessions&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Separate speaker/listener UIs&lt;/STRONG&gt;: Prevents audio feedback and optimizes each user's experience&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Socket.IO for broadcasts&lt;/STRONG&gt;: Delivers reliable real-time messaging with automatic reconnection&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;WebRTC for avatar&lt;/STRONG&gt;: Provides low-latency video streaming with peer-to-peer efficiency&lt;/LI&gt;
&lt;/UL&gt;
&lt;/ARTICLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;ARTICLE&gt;&lt;HR /&gt;
&lt;H2&gt;Application Areas&lt;/H2&gt;
&lt;P&gt;Real-time speech translation with AI avatars unlocks transformative possibilities across industries. Here are key sectors where this technology drives significant impact.&lt;/P&gt;
&lt;H3&gt;&lt;SPAN class="section-icon"&gt;🏢&lt;/SPAN&gt; Enterprise &amp;amp; Corporate&lt;/H3&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Internal Townhalls &amp;amp; All-Hands Meetings&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Global organizations deliver executive communications where every employee hears the message in their native language—not through subtitles, but through an avatar speaking directly to them.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Sales Conversations&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Sales teams engage international prospects without language barriers. The avatar builds a more personal connection than text translation while preserving the original speaker's authenticity.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Training &amp;amp; Onboarding&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Standardized training content reaches employees worldwide, with each viewer experiencing the material in their preferred language through an engaging avatar presenter.&lt;/P&gt;
&lt;/DIV&gt;
&lt;H3&gt;&lt;SPAN class="section-icon"&gt;🏥&lt;/SPAN&gt; Healthcare&lt;/H3&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Patient Communication&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Healthcare providers consult with patients who speak different languages, while the avatar delivers medical information clearly and accurately in the patient's native tongue.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Telehealth&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Remote healthcare consultations reach non-native speakers effectively, improving health outcomes by ensuring patients fully understand their care instructions.&lt;/P&gt;
&lt;/DIV&gt;
&lt;H3&gt;&lt;SPAN class="section-icon"&gt;🎓&lt;/SPAN&gt; Education&lt;/H3&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Online Learning&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Educational institutions expand their global reach, offering lectures and courses in multiple languages through avatar presenters.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Interactive Lessons&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Engaging avatar presenters captivate students while delivering content in their native language.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Museum Tours&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Cultural institutions offer multilingual guided experiences where visitors receive personalized tours in their language of choice.&lt;/P&gt;
&lt;/DIV&gt;
&lt;H3&gt;&lt;SPAN class="section-icon"&gt;📺&lt;/SPAN&gt; Media &amp;amp; Entertainment&lt;/H3&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Broadcasting&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="use-case-title"&gt;News organizations and content creators deliver content to international audiences with localized avatar presenters, keeping viewers engaged while breaking language barriers.&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV class="use-case-section"&gt;
&lt;P class="use-case-title"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="use-case-title"&gt;&lt;STRONG&gt;Live Events&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Conferences, product launches, and presentations reach global audiences with real-time translated avatar streams for each language group.&lt;/P&gt;
&lt;/DIV&gt;
&lt;HR /&gt;
&lt;H2&gt;Custom Avatars: Your Brand, Your Voice&lt;/H2&gt;
&lt;P&gt;While prebuilt avatars work great for many scenarios, organizations can build &lt;STRONG&gt;custom avatars&lt;/STRONG&gt; that represent their brand identity. This section covers the creation process and important ethical considerations.&lt;/P&gt;
&lt;H3&gt;The Process&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Request Access&lt;/STRONG&gt;: Submit &lt;A href="https://aka.ms/customneural" target="_blank" rel="noopener"&gt;Microsoft's intake form&lt;/A&gt; for custom avatar approval&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Record Training Data&lt;/STRONG&gt;: Capture at least 10 minutes of video featuring your avatar talent&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Obtain Consent&lt;/STRONG&gt;: Record the talent acknowledging use of their likeness&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Train the Model&lt;/STRONG&gt;: Use Microsoft Foundry Portal to train your custom avatar&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Deploy&lt;/STRONG&gt;: Deploy the trained model to your Azure Speech resource&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;Responsible AI Considerations&lt;/H3&gt;
&lt;P&gt;Building synthetic representations of people carries ethical responsibilities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Explicit Written Consent&lt;/STRONG&gt;: Always get permission from the talent&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Informed Consent&lt;/STRONG&gt;: Make sure talent understands how the technology works&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Usage Transparency&lt;/STRONG&gt;: Share intended use cases with the talent&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Prohibited Uses&lt;/STRONG&gt;: Never use for deception, misinformation, or impersonation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Microsoft publishes comprehensive &lt;A href="https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/speech-service/text-to-speech/disclosure-voice-talent" target="_blank" rel="noopener"&gt;Responsible AI guidelines&lt;/A&gt; that you must follow when creating custom avatars.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Getting Started&lt;/H2&gt;
&lt;P&gt;Ready to build your own real-time translation avatar application? Grab the complete source code and documentation from GitHub.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;📚 &lt;STRONG&gt;Full Documentation&lt;/STRONG&gt;: &lt;A href="https://github.com/l-sudarsan/avatar-translation/tree/master/docs" target="_blank" rel="noopener"&gt;github.com/l-sudarsan/avatar-translation/docs&lt;/A&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;Prerequisites&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Python 3.8+&lt;/LI&gt;
&lt;LI&gt;Azure Speech Service subscription&lt;/LI&gt;
&lt;LI&gt;Modern browser (Chrome, Edge, Firefox)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Quick Start&lt;/H3&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="code-comment"&gt;# Clone the repository&lt;/SPAN&gt;
git clone https://github.com/l-sudarsan/avatar-translation.git
cd avatar-translation

&lt;SPAN class="code-comment"&gt;# 1. Create and activate virtual environment&lt;/SPAN&gt;
python -m venv venv
.\venv\Scripts\Activate

&lt;SPAN class="code-comment"&gt;# 2. Install dependencies&lt;/SPAN&gt;
pip install -r requirements.txt

&lt;SPAN class="code-comment"&gt;# 3. Configure Azure credentials&lt;/SPAN&gt;
cp .env.example .env
&lt;SPAN class="code-comment"&gt;# Edit .env with your SPEECH_REGION and SPEECH_KEY&lt;/SPAN&gt;

&lt;SPAN class="code-comment"&gt;# 4. Run the application&lt;/SPAN&gt;
python -m flask run --host=0.0.0.0 --port=5000&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;Demo Sequence&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;Open &lt;CODE&gt;http://localhost:5000/speaker&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;Configure session (name, source language, target language, avatar)&lt;/LI&gt;
&lt;LI&gt;Click &lt;STRONG&gt;Create Session&lt;/STRONG&gt; → Copy the listener URL&lt;/LI&gt;
&lt;LI&gt;Open the listener URL in another browser/device&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Wait&lt;/STRONG&gt; for the avatar to connect (video appears)&lt;/LI&gt;
&lt;LI&gt;Start speaking → Listeners see the avatar + translations&lt;/LI&gt;
&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;Tip&lt;/STRONG&gt;: For the best demo experience, open the listener URL on a separate device to avoid audio feedback from the avatar's output being picked up by the speaker's microphone.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;HR /&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;Real-time speech translation with AI avatars marks a significant leap forward in cross-language communication. By combining Azure's powerful Speech Translation, Text-to-Speech, and Avatar Synthesis services, you can build experiences that feel personal and engaging—not just functional.&lt;/P&gt;
&lt;P&gt;The future of multilingual communication isn't about reading subtitles. It's about having someone speak directly to you in your language.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Resources&lt;/H2&gt;
&lt;H3&gt;Project Repository&lt;/H3&gt;
&lt;UL class="resources-list"&gt;
&lt;LI&gt;💻 &lt;A href="https://github.com/l-sudarsan/avatar-translation" target="_blank" rel="noopener"&gt;GitHub: avatar-translation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;📚 &lt;A href="https://github.com/l-sudarsan/avatar-translation/tree/master/docs" target="_blank" rel="noopener"&gt;Project Documentation&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Azure Documentation&lt;/H3&gt;
&lt;UL class="resources-list"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/" target="_blank" rel="noopener"&gt;Azure Speech Service Documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar" target="_blank" rel="noopener"&gt;Text-to-Speech Avatar Overview&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create" target="_blank" rel="noopener"&gt;Custom Avatar Creation Guide&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/speech-service/text-to-speech/disclosure-voice-talent" target="_blank" rel="noopener"&gt;Responsible AI for Voice &amp;amp; Avatar&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=speech-translation" target="_blank" rel="noopener"&gt;Language Support&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/ARTICLE&gt;</description>
      <pubDate>Tue, 14 Apr 2026 15:45:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/building-real-time-speech-translation-with-ai-avatars-with-azure/ba-p/4490972</guid>
      <dc:creator>sudarsan</dc:creator>
      <dc:date>2026-04-14T15:45:00Z</dc:date>
    </item>
    <item>
      <title>Simplifying Image Classification with Azure AutoML for Images: A Practical Guide</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/simplifying-image-classification-with-azure-automl-for-images-a/ba-p/4479632</link>
      <description>&lt;H1&gt;&lt;STRONG&gt;1. The Challenge of Traditional Image Classification&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-selectable-paragraph=""&gt;Anyone who has worked with computer vision knows the drill: you need to classify images, so you dive into TensorFlow or PyTorch, spend days architecting a convolutional neural network, experiment with dozens of hyperparameters, and hope your model generalizes well. It’s time-consuming, requires deep expertise, and often feels like searching for a needle in a haystack.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;What if there was a better way?&lt;/P&gt;
&lt;H1&gt;&lt;STRONG&gt;2. Enter Azure AutoML for Images&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-selectable-paragraph=""&gt;Azure AutoML for Images is a game-changer in the computer vision space. It’s a feature within Azure Machine Learning that automatically builds high-quality vision models from your image data with minimal code. Think of it as having an experienced ML engineer working alongside you, handling all the heavy lifting while you focus on your business problem.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;What Makes AutoML for Images Special?&lt;/H2&gt;
&lt;H3 data-selectable-paragraph=""&gt;1. Automatic Model Selection&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Instead of manually choosing between&amp;nbsp;&lt;STRONG&gt;ResNet, EfficientNet, or dozens of other architectures&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;AutoML for Images (Azure ML)&amp;nbsp;&lt;/STRONG&gt;evaluates multiple state-of-the-art deep learning models and selects the best one for your specific dataset. It’s like having access to an entire model zoo with an intelligent curator.&lt;/P&gt;
&lt;H3 data-selectable-paragraph=""&gt;2. Intelligent Hyperparameter Tuning&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;The system doesn’t just pick a model — it optimizes it. Learning rates, batch sizes, augmentation strategies, and more are automatically tuned to squeeze out the best possible performance. What would take weeks of manual experimentation happens in hours.&lt;/P&gt;
&lt;H3 data-selectable-paragraph=""&gt;3. Built-in Best Practices&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Data preprocessing, augmentation techniques, and training strategies that would require extensive domain knowledge are pre-configured and applied automatically. You get enterprise-grade ML without needing to be an ML expert.&lt;/P&gt;
&lt;H2&gt;Key Capabilities&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;The repository demonstrates several powerful features:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Multi-class and Multi-label Classification:&amp;nbsp;&lt;/STRONG&gt;Whether you need to classify an image into a single category or tag it with multiple labels, AutoML manages both scenarios seamlessly.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Format Flexibility:&lt;/STRONG&gt;&amp;nbsp;Works with standard image formats including JPEG and PNG, making it easy to integrate with existing datasets.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Full Transparency&lt;/STRONG&gt;: Unlike black-box solutions, you maintain complete visibility and control over the training process. You can monitor metrics, understand model decisions, and fine-tune as needed.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Production-Ready Deployment:&lt;/STRONG&gt;&amp;nbsp;Once trained, models can be easily deployed to Azure endpoints, ready to serve predictions at scale.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Real-World Applications&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;The practical applications are vast:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;E-commerce&lt;/STRONG&gt;: Automatically categorize product images for better search and recommendations.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Healthcare&lt;/STRONG&gt;: Classify medical images for diagnostic support.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Manufacturing&lt;/STRONG&gt;: Detect defects in production line images.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Agriculture&lt;/STRONG&gt;: Identify crop diseases or estimate yield from aerial imagery.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Content Moderation:&lt;/STRONG&gt;&amp;nbsp;Automatically flag inappropriate visual content.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;3. A Practical Example: Metal Defect Detection&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-selectable-paragraph=""&gt;The repository includes a complete end-to-end example of detecting defects in metal surfaces — a critical quality control task in manufacturing. The notebooks demonstrate how to:&lt;/P&gt;
&lt;OL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Download and organize image&amp;nbsp;&lt;/STRONG&gt;data from sources like Kaggle,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Create&amp;nbsp;&lt;STRONG&gt;training&amp;nbsp;&lt;/STRONG&gt;and&amp;nbsp;&lt;STRONG&gt;validation splits&amp;nbsp;&lt;/STRONG&gt;with proper directory structure,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Upload data to Azure ML&lt;/STRONG&gt;&amp;nbsp;as versioned datasets,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Configure GPU&lt;/STRONG&gt;&amp;nbsp;compute that scales based on demand,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Train multiple models&lt;/STRONG&gt;&amp;nbsp;with&amp;nbsp;&lt;STRONG&gt;automated hyperparameter tuning,&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Evaluate results&amp;nbsp;&lt;/STRONG&gt;with comprehensive metrics and visualizations,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Deploy the best model&amp;nbsp;&lt;/STRONG&gt;as a production-ready REST API,&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Export to ONNX for edge&amp;nbsp;&lt;/STRONG&gt;deployment scenarios.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-selectable-paragraph=""&gt;The metal defect use case is particularly instructive because it mirrors real industrial applications where quality control is critical but expertise is scarce. The notebooks show how a small team can build production-grade computer vision systems without a dedicated ML research team.&lt;/P&gt;
&lt;H2 data-selectable-paragraph=""&gt;Getting Started: What You Need&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;The prerequisites are straightforward:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;An Azure subscription (free tier available for experimentation)&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;An Azure Machine Learning workspace&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Python 3.7 or later&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-selectable-paragraph=""&gt;That’s it. No local GPU clusters to configure, no complex deep learning frameworks to master.&lt;/P&gt;
&lt;H2 data-selectable-paragraph=""&gt;Repository Structure&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;The repository is thoughtfully organized into three progressive notebooks:&lt;/P&gt;
&lt;OL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Downloading images.ipynb&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;Shows how to acquire and prepare image datasets&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Demonstrates proper directory structure for classification tasks&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Includes data exploration and visualization techniques&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-selectable-paragraph=""&gt;&lt;A href="https://github.com/retkowsky/image-classification-azure-automl-for-images/blob/main/1.%20Downloading%20images.ipynb" target="_blank" rel="noopener"&gt;image-classification-azure-automl-for-images/1. Downloading images.ipynb at main · retkowsky/image-classification-azure-automl-for-images&lt;/A&gt;&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt; Azure ML AutoML for Images.ipynb&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;The core workflow: connect to Azure ML, upload data, configure training&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Covers both simple model training and advanced hyperparameter tuning&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Shows how to evaluate models and select the best performing one&lt;BR /&gt;Demonstrates deployment to managed online endpoints&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-selectable-paragraph=""&gt;&lt;A href="https://github.com/retkowsky/image-classification-azure-automl-for-images/blob/main/2.%20Azure%20ML%20AutoML%20for%20Images.ipynb" target="_blank" rel="noopener"&gt;image-classification-azure-automl-for-images/2. Azure ML AutoML for Images.ipynb at main · retkowsky/image-classification-azure-automl-for-images&lt;/A&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt; Edge with ONNX local model.ipynb&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;Exports trained models to ONNX format&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Shows how to run inference locally without cloud connectivity&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Perfect for edge computing and IoT scenarios&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-selectable-paragraph=""&gt;&lt;A href="https://github.com/retkowsky/image-classification-azure-automl-for-images/blob/main/3.%20Edge%20with%20ONNX%20local%20model.ipynb" target="_blank" rel="noopener"&gt;image-classification-azure-automl-for-images/3. Edge with ONNX local model.ipynb at main · retkowsky/image-classification-azure-automl-for-images&lt;/A&gt;&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;Each Python notebook is self-contained with clear explanations, making it easy to understand each step of the process. You can run them sequentially to build a complete solution, or jump to specific sections relevant to your use case.&lt;/P&gt;
&lt;H2 data-selectable-paragraph=""&gt;The Developer Experience&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;What sets this approach apart is the developer experience. The repository provides Python notebooks that guide you through the entire workflow. You’re not just reading documentation — you’re working with practical, runnable examples that demonstrate real scenarios.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;Let’s walk through the code to see how straightforward this actually is.&lt;/P&gt;
&lt;H2 data-selectable-paragraph=""&gt;Use-case description&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;This image classification model is designed to&amp;nbsp;&lt;STRONG&gt;identify and classify defects on metal surfaces&lt;/STRONG&gt;&amp;nbsp;in a manufacturing context.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;We want to classify defective images into&amp;nbsp;&lt;STRONG&gt;Crazing, Inclusion, Patches, Pitted, Rolled &amp;amp; Scratches.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Press enter or click to view image in full size&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;All code and images are available here: &lt;A href="https://github.com/retkowsky/image-classification-azure-automl-for-images" target="_blank" rel="noopener"&gt;retkowsky/image-classification-azure-automl-for-images: Azure AutoML for images — Image classification&lt;/A&gt;&lt;/P&gt;
&lt;H3&gt;Step 1: Connect to Azure ML Workspace&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;First, establish connection to your&amp;nbsp;&lt;STRONG&gt;Azure ML workspace&lt;/STRONG&gt;&amp;nbsp;using Azure credentials:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;print("Connection to the Azure ML workspace…")
credential = DefaultAzureCredential()

ml_client = MLClient(
  credential,
  os.getenv("subscription_id"),
  os.getenv("resource_group"),
  os.getenv("workspace")
)

print("✅ Done")&lt;/LI-CODE&gt;
&lt;P data-selectable-paragraph=""&gt;That’s it.&lt;/P&gt;
&lt;H3&gt;Step 2: Upload Your Dataset&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Upload your&amp;nbsp;&lt;STRONG&gt;image dataset to Azure ML.&lt;/STRONG&gt;&amp;nbsp;The code handles this elegantly:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;my_images = Data(
  path=TRAIN_DIR,
  type=AssetTypes.URI_FOLDER,
  description="Metal defects images for images classification",
  name="metaldefectimagesds",
)

uri_folder_data_asset = ml_client.data.create_or_update(my_images)

print("🖼️ Informations:")
print(uri_folder_data_asset)
print("\n🖼️ Path to folder in Blob Storage:")
print(uri_folder_data_asset.path)&lt;/LI-CODE&gt;
&lt;P data-selectable-paragraph=""&gt;Your local images are now versioned data assets in Azure, ready for training.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Step 3: Create GPU Compute Cluster&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;&lt;STRONG&gt;AutoML needs compute power&lt;/STRONG&gt;. Here’s how you create a GPU cluster that auto-scales:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;compute_name = "gpucluster"

try:
  _ = ml_client.compute.get(compute_name)
  print("✅ Found existing Azure ML compute target.")

except ResourceNotFoundError:
  print(f"🛠️ Creating a new Azure ML compute cluster '{compute_name}'…")
  compute_config = AmlCompute(
   name=compute_name,
   type="amlcompute",
   size="Standard_NC16as_T4_v3", # GPU VM
   idle_time_before_scale_down=1200,
   min_instances=0, # Scale to zero when idle
   max_instances=4,
)
 
ml_client.begin_create_or_update(compute_config).result()
print("✅ Done")&lt;/LI-CODE&gt;
&lt;P data-selectable-paragraph=""&gt;The cluster scales from 0 to 4 instances based on workload, so you only pay for what you use.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Step 4: Configure AutoML Training&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Now comes the magic. Here’s the entire configuration for an AutoML image classification job using a specific model (here a resnet34). It is possible as well to access all the available models from the image classification AutoML library.&lt;/P&gt;
&lt;P&gt;Press enter or click to view image in full size&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&amp;amp;tabs=python#supported-model-architectures" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&amp;amp;tabs=python#supported-model-architectures&lt;/A&gt;&lt;SPAN data-selectable-paragraph=""&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;image_classification_job = automl.image_classification(
  compute=compute_name,
  experiment_name=exp_name,
  training_data=my_training_data_input,
  validation_data=my_validation_data_input,
  target_column_name="label",
)

# Set training parameters
image_classification_job.set_limits(timeout_minutes=60)
image_classification_job.set_training_parameters(model_name="resnet34")&lt;/LI-CODE&gt;
&lt;P data-selectable-paragraph=""&gt;That’s approximately 10 lines of code to configure what would traditionally require hundreds of lines and deep expertise.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Step 5: Hyperparameter Tuning (Optional)&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Want to explore multiple models and configurations?&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;image_classification_job = automl.image_classification(
    compute=compute_name,  # Compute cluster
    experiment_name=exp_name,  # Azure ML job
    training_data=my_training_data_input,  # Training
    validation_data=my_validation_data_input,  # Validation
    target_column_name="label",  # Target
    primary_metric=ClassificationPrimaryMetrics.ACCURACY,  # Metric
    tags={"usecase": "metal defect", "type" : "computer vision", 
"product" : "azure ML", "ai": "image classification", "hyper": "YES"},
)

image_classification_job.set_limits(
    timeout_minutes=60,  # Timeout in min
    max_trials=5,  # Max model number
    max_concurrent_trials=2,  # Concurrent training
)

image_classification_job.extend_search_space([
    SearchSpace(
        model_name=Choice(["vitb16r224", "vits16r224"]),
        learning_rate=Uniform(0.001, 0.01),  # LR
        number_of_epochs=Choice([15, 30]),  # Epoch
    ),
    SearchSpace(
        model_name=Choice(["resnet50"]),
        learning_rate=Uniform(0.001, 0.01),  # LR
        layers_to_freeze=Choice([0, 2]),  # Layers to freeze
    ),
])

image_classification_job.set_sweep(
    sampling_algorithm="Random",  # Random sampling to select combinations of hyperparameters. 
    early_termination=BanditPolicy(evaluation_interval=2,  # The model is evaluated every 2 iterations.
                                   slack_factor=0.2,  # If a run’s performance is 20% worse than the best run so far, it may be terminated.
                                   delay_evaluation=6),  # The policy waits until 6 iterations have completed before starting to 
                                                         # evaluate and potentially terminate runs.
)&lt;/LI-CODE&gt;
&lt;P data-selectable-paragraph=""&gt;AutoML will now automatically try different model architectures, learning rates, and augmentation strategies to find the best configuration.&lt;/P&gt;
&lt;H3&gt;Step 6: Launch Training&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Submit the job and monitor progress:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Submit the job
returned_job = ml_client.jobs.create_or_update(image_classification_job)
print(f"✅ Created job: {returned_job}")

# Stream the logs in real-time
ml_client.jobs.stream(returned_job.name)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;While training runs, you can monitor metrics, view logs, and track progress through the&amp;nbsp;&lt;STRONG&gt;Azure ML Studio UI&lt;/STRONG&gt;&amp;nbsp;or&amp;nbsp;&lt;STRONG&gt;programmatically&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;Step 7: Results&lt;/H3&gt;
&lt;img /&gt;
&lt;H3&gt;Step 8: Deploy to Production&lt;/H3&gt;
&lt;P data-selectable-paragraph=""&gt;Once training completes, deploy the best model as a REST endpoint:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# Create endpoint configuration
online_endpoint_name = "metal-defects-classification"

endpoint = ManagedOnlineEndpoint(
  name=online_endpoint_name,
  description="Metal defects image classification",
  auth_mode="key",
  tags={
    "usecase": "metal defect", 
    "type": "computer vision"
   },
)

# Deploy the endpoint
ml_client.online_endpoints.begin_create_or_update(endpoint).result()&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;&lt;STRONG&gt;Your model is now a production API endpoint, ready to classify images at scale.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Beyond the Cloud: Edge Deployment with ONNX&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;One of the most powerful aspects of this approach is flexibility in deployment. The repository includes a third notebook demonstrating how to export your trained model to&amp;nbsp;&lt;STRONG&gt;ONNX&amp;nbsp;&lt;/STRONG&gt;(&lt;EM&gt;Open Neural Network Exchange&lt;/EM&gt;) format for edge deployment.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;This means you can:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Deploy models on IoT devices&lt;/STRONG&gt;&amp;nbsp;for real-time inference without cloud connectivity&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Reduce latency&amp;nbsp;&lt;/STRONG&gt;by processing images locally on edge hardware&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Lower costs&amp;nbsp;&lt;/STRONG&gt;by eliminating constant cloud API calls&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Ensure privacy&lt;/STRONG&gt;&amp;nbsp;by keeping sensitive images on-premises&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-selectable-paragraph=""&gt;The&amp;nbsp;&lt;STRONG&gt;ONNX&amp;nbsp;&lt;/STRONG&gt;export process is straightforward and integrates seamlessly with the AutoML workflow. Your cloud-trained model can run anywhere ONNX Runtime is supported — from Raspberry Pi devices to industrial controllers.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import onnxruntime

# Load the ONNX model
session = onnxruntime.InferenceSession("model.onnx")

# Run inference locally
results = session.run(None, {input_name: image_data})&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;This cloud-to-edge workflow is particularly valuable for manufacturing, retail, and remote monitoring scenarios where edge processing is essential.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Interactive webapp for image classification&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2 data-selectable-paragraph=""&gt;Interpreting model predictions&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;Deployed endpoint returns base64 encoded image string if both model_explainability and visualizations are set to True.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2 data-selectable-paragraph=""&gt;Why This Matters?&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;In the AI era, the competitive advantage isn’t about who can build the most complex models — it’s about who can deploy effective solutions fastest. Azure AutoML for Images democratizes computer vision by making sophisticated ML accessible to a broader audience.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;Small teams can now accomplish what previously required dedicated ML specialists. Prototypes that took months can be built in days. And the quality? Often on par with or better than manually crafted solutions, thanks to AutoML’s systematic approach and access to cutting-edge techniques.&lt;/P&gt;
&lt;H2 data-selectable-paragraph=""&gt;What the Code Reveals&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;Looking at the actual implementation reveals several important insights:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Minimal Boilerplate:&lt;/STRONG&gt;&amp;nbsp;The entire training pipeline — from data upload to model deployment — requires less than 50 lines of meaningful code. Compare this to traditional PyTorch or TensorFlow implementations that often exceed several hundred lines.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Built-in Best Practices:&amp;nbsp;&lt;/STRONG&gt;Notice how the code automatically manages concerns like data versioning, experiment tracking, and compute auto-scaling. These aren’t afterthoughts — they’re integral to the platform.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Production-Ready from Day One:&amp;nbsp;&lt;/STRONG&gt;The deployed endpoint isn’t a prototype. It includes authentication, scaling, monitoring, and all the infrastructure needed for production workloads. You’re building production systems, not demos.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Flexibility Without Complexity:&amp;nbsp;&lt;/STRONG&gt;The simple API hides complexity without sacrificing control. Need to specify a particular model architecture? One parameter. Want hyperparameter tuning? Add a few lines. The abstraction level is perfectly calibrated.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Observable and Debuggable:&amp;nbsp;&lt;/STRONG&gt;The `.stream()` method and comprehensive logging mean you’re never in the dark about what’s happening. You can monitor training progress, inspect metrics, and debug issues — all critical for real projects.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-selectable-paragraph=""&gt;The Cost of Complexity&lt;/H2&gt;
&lt;P data-selectable-paragraph=""&gt;Traditional ML projects fail not because of technology limitations but because of complexity. The learning curve is steep, the iteration cycles are long, and the resource requirements are high. By abstracting away this complexity, AutoML for Images changes the economics of computer vision projects.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;You can now:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Validate ideas quickly:&lt;/STRONG&gt;&amp;nbsp;Test whether image classification solves your problem before committing significant resources&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Iterate faster:&amp;nbsp;&lt;/STRONG&gt;Experiment with different approaches in hours rather than weeks&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Scale expertise:&amp;nbsp;&lt;/STRONG&gt;Enable more team members to work with computer vision, not just ML specialists&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-selectable-paragraph=""&gt;Image classification is a fundamental building block for countless AI applications.&amp;nbsp;&lt;STRONG&gt;Azure AutoML for Images&amp;nbsp;&lt;/STRONG&gt;makes it accessible, practical, and production-ready. Whether you’re a seasoned data scientist looking to accelerate your workflow or a developer taking your first steps into computer vision, this approach offers a compelling path forward.&lt;/P&gt;
&lt;P data-selectable-paragraph=""&gt;The future of ML isn’t about writing more complex code — it’s about writing smarter code that leverages powerful platforms to deliver business value faster. This repository shows you exactly how to do that.&lt;/P&gt;
&lt;H1&gt;&lt;STRONG&gt;Practical Tips from the Code&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-selectable-paragraph=""&gt;After reviewing the notebooks, here are some key takeaways for your own projects:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Start with a Single Model:&amp;nbsp;&lt;/STRONG&gt;The basic configuration with `model_name=”resnet34"` is perfect for initial experiments. Only move to hyperparameter sweeps once you’ve validated your data and use case.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Use Tags Strategically:&amp;nbsp;&lt;/STRONG&gt;The code demonstrates adding tags to jobs and endpoints (e.g., `”usecase”: “metal defect”`). This becomes invaluable when managing multiple experiments and models in production.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Leverage Auto-Scaling:&lt;/STRONG&gt;&amp;nbsp;The compute configuration with `min_instances=0` means you’re not paying for idle resources. The cluster scales up when needed and scales down to zero when idle.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Monitor Training Live:&amp;nbsp;&lt;/STRONG&gt;The `ml_client.jobs.stream()` method is your best friend during development. You see exactly what’s happening and can catch issues early.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Version Your Data:&amp;nbsp;&lt;/STRONG&gt;Creating named data assets (`name=”metaldefectimagesds”`) means your experiments are reproducible. You can always trace back which data version produced which model.&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;&lt;STRONG&gt;Think Cloud-to-Edge:&lt;/STRONG&gt;&amp;nbsp;Even if you’re deploying to the cloud initially, the ONNX export capability gives you flexibility for future edge scenarios without retraining.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H1&gt;&lt;STRONG&gt;Resources&lt;/STRONG&gt;&lt;/H1&gt;
&lt;UL&gt;
&lt;LI data-selectable-paragraph=""&gt;Azure ML:&amp;nbsp;&lt;A href="https://azure.microsoft.com/en-us/products/machine-learning" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-us/products/machine-learning&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Demos notebooks:&amp;nbsp;&lt;A href="https://github.com/retkowsky/image-classification-azure-automl-for-images" target="_blank" rel="noopener"&gt;https://github.com/retkowsky/image-classification-azure-automl-for-images&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;AutoML for Images documentation:&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Available models:&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&amp;amp;tabs=python#supported-model-architectures" target="_blank" rel="noopener"&gt;Set up AutoML for computer vision — Azure Machine Learning | Microsoft Learn&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-selectable-paragraph=""&gt;Connect with the author:&amp;nbsp;&lt;A href="https://www.linkedin.com/in/serger/" target="_blank" rel="noopener"&gt;https://www.linkedin.com/in/serger/&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 14 Apr 2026 14:03:39 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/simplifying-image-classification-with-azure-automl-for-images-a/ba-p/4479632</guid>
      <dc:creator>Serge_Retkowsky</dc:creator>
      <dc:date>2026-04-14T14:03:39Z</dc:date>
    </item>
    <item>
      <title>Microsoft Foundry Agent via Responses API rejects local image input as Base64 data URL / byte array</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/microsoft-foundry-agent-via-responses-api-rejects-local-image/m-p/4511076#M1447</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;we are seeing an issue with the new Microsoft Foundry Agents via the Responses API when sending a local image as part of the user message.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What works&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;text-only input&lt;/LI&gt;&lt;LI&gt;image by public URL&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;What fails&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;local PNG passed as Base64 data URL&lt;/LI&gt;&lt;LI&gt;local PNG passed as raw byte array through SDK methods&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Example failing image part:&lt;/P&gt;&lt;P&gt;{ "type": "input_image", "image_url": "data:image/png;base64,...", "detail": "auto" }&lt;/P&gt;&lt;P&gt;Returned error:&lt;/P&gt;&lt;P&gt;{ "code": "invalid_payload", "message": "The provided data does not match the expected schema", "param": "/", "type": "invalid_request_error", "details": [] }&lt;/P&gt;&lt;P&gt;We reproduced this in:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;C#&lt;/LI&gt;&lt;LI&gt;Python&lt;/LI&gt;&lt;LI&gt;raw REST&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;So this does not appear to be limited to one SDK.&lt;/P&gt;&lt;P&gt;Also important: the same pattern is used in the sample repo for the Foundry Agent Web App, and this scenario worked for us about one week ago:&lt;BR /&gt;https://github.com/microsoft-foundry/foundry-agent-webapp&lt;/P&gt;&lt;P&gt;Could you confirm whether local image input is currently supported for Foundry Agents through the Responses API, or whether this is a regression?&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 07:59:35 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/microsoft-foundry-agent-via-responses-api-rejects-local-image/m-p/4511076#M1447</guid>
      <dc:creator>twennemann</dc:creator>
      <dc:date>2026-04-14T07:59:35Z</dc:date>
    </item>
    <item>
      <title>Now in Foundry: Microsoft Harrier and NVIDIA EGM-8B</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/now-in-foundry-microsoft-harrier-and-nvidia-egm-8b/ba-p/4510851</link>
      <description>&lt;P&gt;This week's &lt;STRONG&gt;Model Monday&lt;/STRONG&gt;s edition highlights three models that share a common thread: each achieves results comparable to larger leading models, as a result of targeted training strategies rather than scale. &lt;STRONG&gt;Microsoft Research's harrier-oss-v1-0.6b&lt;/STRONG&gt; from achieves state-of-the-art results on the Multilingual MTEB v2 embedding benchmark at 0.6B parameters through contrastive learning and knowledge distillation.&lt;STRONG&gt; NVIDIA's EGM-8B &lt;/STRONG&gt;scores 91.4 average IoU on the RefCOCO visual grounding benchmark by training a small Vision Language Model (VLM) with reinforcement learning to match the output quality of much larger models.&lt;/P&gt;
&lt;P&gt;Together they represent a practical argument for efficiency-first model development: the gap between small and large models continues to narrow when training methodology is the focus rather than parameter count alone.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Models of the week&lt;/STRONG&gt;&lt;/H2&gt;
&lt;H3&gt;Microsoft Research: harrier-oss-v1-0.6b&lt;/H3&gt;
&lt;P&gt;Model Specs&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-level="1"&gt;Parameters / size: 0.6B&lt;/LI&gt;
&lt;LI aria-level="1"&gt;Context length: 32,768 tokens&lt;/LI&gt;
&lt;LI aria-level="1"&gt;Primary task: Text embeddings (retrieval, semantic similarity, classification, clustering, reranking)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Why it's interesting&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;State-of-the-art on Multilingual MTEB v2 from Microsoft Research&lt;/STRONG&gt;: harrier-oss-v1-0.6b is a new embedding model released by Microsoft Research, achieving a 69.0 score on the Multilingual MTEB v2 (Massive Text Embedding Benchmark) leaderboard—placing it at the top of its size class at release. It is part of the harrier-oss family spanning harrier-oss-v1-270m (66.5 MTEB v2), harrier-oss-v1-0.6b (69.0), and harrier-oss-v1-27b (74.3), with the 0.6B variant further trained with knowledge distillation from the larger family members. Benchmarks: &lt;A href="https://huggingface.co/spaces/mteb/leaderboard" target="_blank" rel="noopener"&gt;Multilingual MTEB v2 Leaderboard&lt;/A&gt;.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;Decoder-only architecture with task-instruction queries:&lt;/STRONG&gt; Unlike most embedding models that use encoder-only transformers, harrier-oss-v1-0.6b uses a decoder-only architecture with last-token pooling and L2 normalization. Queries are prefixed with a one-sentence task instruction (e.g., "Instruct: Retrieve relevant passages that answer the query\nQuery: ...") while documents are encoded without instructions—allowing the same deployed model to be specialized for retrieval, classification, or similarity tasks through the prompt alone.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;Broad task coverage across six embedding scenarios: &lt;/STRONG&gt;The model is trained and evaluated on retrieval, clustering, semantic similarity, classification, bitext mining, and reranking—making it suitable as a general embedding backbone for multi-task pipelines rather than a single-use retrieval model. One endpoint, consistent embeddings across the stack.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;100+ language support:&lt;/STRONG&gt; Trained on a large-scale mixture of multilingual data covering Arabic, Chinese, Japanese, Korean, and 100+ additional languages, with strong cross-lingual transfer for tasks that span language boundaries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Try it&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Use Case&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Prompt Pattern&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Multilingual semantic search&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Prepend task instruction to query; encode documents without instruction; rank by cosine similarity&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Cross-lingual document clustering&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Embed documents across languages; apply clustering to group semantically related content&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Text classification with embeddings&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Encode labeled examples + new text; classify by nearest-neighbor similarity in embedding space&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Bitext mining&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Encode parallel corpora in source and target languages; align segments by embedding similarity&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Sample prompt for a global enterprise knowledge base deployment:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;You are building a multilingual internal knowledge base for a global professional services firm. Using the harrier-oss-v1-0.6b endpoint deployed in Microsoft Foundry, encode all internal documents—policy guides, project case studies, and technical documentation—across English, French, German, and Japanese. At query time, prepend the task instruction to each employee query: "Instruct: Retrieve relevant internal documents that answer the employee's question\nQuery: {question}". Retrieve the top-5 most similar documents by cosine similarity and pass them to a language model with the instruction: "Using only the provided documents, answer the question and cite the source document title for each claim. If no document addresses the question, say so."&lt;/P&gt;
&lt;H3&gt;NVIDIA: EGM-8B&lt;/H3&gt;
&lt;P&gt;Model Specs&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-level="1"&gt;Parameters / size: ~8.8B&lt;/LI&gt;
&lt;LI aria-level="1"&gt;Context length: 262,144 tokens&lt;/LI&gt;
&lt;LI aria-level="1"&gt;Primary task: Image-text-to-text (visual grounding)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Why it's interesting&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;Preforms well on visual grounding compared to larger models even at its small size: &lt;/STRONG&gt;EGM-8B achieves 91.4 average Intersection over Union (IoU) on the RefCOCO benchmark—the standard measure of how accurately a model localizes a described region within an image. Compared to its base model Qwen3-VL-8B-Thinking (87.8 IoU), EGM-8B achieves a +3.6 IoU gain through targeted Reinforcement Learning (RL) fine-tuning. Benchmarks: &lt;A href="https://nvlabs.github.io/EGM" target="_blank" rel="noopener"&gt;EGM Project Page&lt;/A&gt;.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;5.9x faster than larger models at inference&lt;/STRONG&gt;: EGM-8B achieves 737ms average latency. The research demonstrates that test-time compute can be scaled horizontally across small models—generating many medium-quality responses and selecting the best—rather than relying on a single expensive forward pass through a large model.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;Two-stage training:&lt;/STRONG&gt;&amp;nbsp;EGM-8B is trained first with Supervised Fine-Tuning (SFT) on detailed chain-of-thought reasoning traces generated by a proprietary VLM, then refined with Group Relative Policy Optimization (GRPO) using a reward function combining IoU accuracy and task success. The intermediate SFT checkpoint is available as nvidia/EGM-8B-SFT for developers who want to experiment with the intermediate stage.&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;STRONG&gt;Addresses a root cause of small model grounding errors&lt;/STRONG&gt;: The EGM research identifies that 62.8% of small model errors on visual grounding stem from complex multi-relational descriptions—where a model must reason about spatial relationships, attributes, and context simultaneously. By focusing test-time compute on reasoning through these complex prompts, EGM-8B closes the gap without increasing the underlying model size.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Try it&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Use Case&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Prompt Pattern&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Object localization&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Submit image + natural language description; receive bounding box coordinates&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Document region extraction&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Provide scanned document image + field description; extract specific regions&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Visual quality control&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Submit product image + defect description; localize defect region for downstream classification&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Retail shelf analysis&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21"&gt;
&lt;P&gt;Provide shelf image + product description; return location of specified SKU&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Sample prompt for a retail and logistics deployment:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;You are building a visual inspection system for a logistics warehouse. Using the EGM-8B endpoint deployed in Microsoft Foundry, submit each incoming package scan image along with a natural language grounding query describing the region of interest: "Please provide the bounding box coordinate of the region this sentence describes: {description}". For example: "the label on the upper-left side of the box", "the barcode on the bottom face", or "the damaged corner on the right side". Use the returned bounding box coordinates to route each package to the appropriate inspection station based on the identified region.&lt;/P&gt;
&lt;H2&gt;Getting started&lt;/H2&gt;
&lt;P&gt;You can deploy open-source&amp;nbsp;&lt;A href="https://aka.ms/hf/foundry-models" target="_blank" rel="noopener"&gt;Hugging Face models&lt;/A&gt;&amp;nbsp;directly in&amp;nbsp;&lt;A href="http://ai.azure.com/" target="_blank" rel="noopener"&gt;Microsoft Foundry&lt;/A&gt;&amp;nbsp;by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/hf/model-mondays" target="_blank" rel="noopener"&gt;Follow along the Model Mondays series and access the GitHub to stay up to date on the latest&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/hf/docs/microsoft-azure" target="_blank" rel="noopener"&gt;Read Hugging Face on Azure docs&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/hf/docs/microsoft-azure/one-click-deploy" target="_blank" rel="noopener"&gt;Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://aka.ms/hf/foundry-models" target="_blank" rel="noopener"&gt;Explore models in Microsoft Foundry&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Mon, 13 Apr 2026 20:01:16 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/now-in-foundry-microsoft-harrier-and-nvidia-egm-8b/ba-p/4510851</guid>
      <dc:creator>Osi</dc:creator>
      <dc:date>2026-04-13T20:01:16Z</dc:date>
    </item>
    <item>
      <title>When Should You Use RAG vs Fine-Tuning in Microsoft Foundry?</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/when-should-you-use-rag-vs-fine-tuning-in-microsoft-foundry/m-p/4510099#M1444</link>
      <description>&lt;P&gt;If you've been working with&amp;nbsp;&lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt;, you've likely come across this question:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Should I use RAG or Fine-Tuning?&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;The answer becomes much simpler when you focus on the &lt;STRONG&gt;core goal&lt;/STRONG&gt; of your solution. Here's a straightforward way to think about it.&lt;/P&gt;&lt;H5&gt;&lt;STRONG&gt;What is RAG (Retrieval-Augmented Generation)?&lt;/STRONG&gt;&lt;/H5&gt;&lt;P&gt;RAG allows your model to &lt;STRONG&gt;retrieve relevant information from your data sources&lt;/STRONG&gt; before generating a response.&lt;/P&gt;&lt;P&gt;Instead of relying only on what the model already knows, it:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Searches your documents or knowledge base&lt;/LI&gt;&lt;LI&gt;Retrieves relevant content&lt;/LI&gt;&lt;LI&gt;Uses that context to generate grounded, cited answers&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Use RAG when:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;✅ Your data changes frequently&lt;/P&gt;&lt;P&gt;✅ You need answers based on real documents&lt;/P&gt;&lt;P&gt;✅ You have a large, evolving document library&lt;/P&gt;&lt;P&gt;✅ You are building "chat with your data" experiences&lt;/P&gt;&lt;H5&gt;&lt;STRONG&gt;What is Fine-Tuning?&lt;/STRONG&gt;&lt;/H5&gt;&lt;P&gt;Fine-tuning &lt;STRONG&gt;customizes how the model behaves&lt;/STRONG&gt; by training it on task-specific examples.&lt;/P&gt;&lt;P&gt;It helps the model:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Produce consistent and structured outputs&lt;/LI&gt;&lt;LI&gt;Follow a specific tone, format, or brand voice&lt;/LI&gt;&lt;LI&gt;Align with business rules, compliance policies, and workflows&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Use Fine-Tuning when:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;✅ You need consistent and predictable responses&lt;/P&gt;&lt;P&gt;✅ You want a specific tone, format, or behavior&lt;/P&gt;&lt;P&gt;✅ Your task is stable and well-defined&lt;/P&gt;&lt;P&gt;✅ You are operating at massive scale&lt;/P&gt;&lt;H5&gt;&lt;STRONG&gt;Visual Overview&lt;/STRONG&gt;&lt;/H5&gt;&lt;P&gt;Below is a quick visual summary to help compare both approaches:&lt;/P&gt;&lt;img /&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H5&gt;&lt;STRONG&gt;A Simple Way to Decide&lt;/STRONG&gt;&lt;/H5&gt;&lt;P&gt;Ask yourself:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Is my problem about &lt;STRONG&gt;accessing the right data&lt;/STRONG&gt;, or &lt;STRONG&gt;controlling how the model behaves&lt;/STRONG&gt;?&lt;/EM&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If it's about &lt;STRONG&gt;data&lt;/STRONG&gt; → use &lt;STRONG&gt;RAG&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;If it's about &lt;STRONG&gt;behavior&lt;/STRONG&gt; → use &lt;STRONG&gt;Fine-Tuning&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H5&gt;&lt;STRONG&gt;Quick Comparison&lt;/STRONG&gt;&lt;/H5&gt;&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;What You Need&lt;/th&gt;&lt;th&gt;RAG&lt;/th&gt;&lt;th&gt;Fine-Tuning&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Data changes often&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;td&gt;❌ Not ideal&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Change model behavior/style&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Fast to get started&lt;/td&gt;&lt;td&gt;✅ Faster&lt;/td&gt;&lt;td&gt;❌ Needs training&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;High-volume, stable queries&lt;/td&gt;&lt;td&gt;⚠️ Token costs grow&lt;/td&gt;&lt;td&gt;✅ Predictable cost&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Brand voice / compliance&lt;/td&gt;&lt;td&gt;⚠️ Limited&lt;/td&gt;&lt;td&gt;✅ Built into model&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Large, evolving document library&lt;/td&gt;&lt;td&gt;✅ Perfect fit&lt;/td&gt;&lt;td&gt;❌ Hard to maintain&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;&lt;H5&gt;&lt;STRONG&gt;Can You Use Both?&lt;/STRONG&gt;&lt;/H5&gt;&lt;P&gt;In many real-world scenarios, the best teams do exactly that:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;RAG&lt;/STRONG&gt; brings in the right, up-to-date information&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Fine-Tuning&lt;/STRONG&gt; ensures consistent behavior and output quality&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Think of RAG as giving your model the &lt;EM&gt;right books to read&lt;/EM&gt;, and Fine-Tuning as teaching it &lt;EM&gt;how to think and respond&lt;/EM&gt;. Together, they cover both sides of the equation.&lt;/P&gt;&lt;P&gt;I'd love to hear from others in the community:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Are you using &lt;STRONG&gt;RAG&lt;/STRONG&gt;, &lt;STRONG&gt;Fine-Tuning&lt;/STRONG&gt;, or &lt;STRONG&gt;both&lt;/STRONG&gt; in your Foundry projects?&lt;/LI&gt;&lt;LI&gt;What use cases are you solving?&lt;/LI&gt;&lt;LI&gt;What challenges or trade-offs have you experienced along the way?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Looking forward to your insights. Let's learn from each other! 🚀&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 01:23:43 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-discussions/when-should-you-use-rag-vs-fine-tuning-in-microsoft-foundry/m-p/4510099#M1444</guid>
      <dc:creator>SajedaSultana</dc:creator>
      <dc:date>2026-04-10T01:23:43Z</dc:date>
    </item>
    <item>
      <title>Bringing GigaTIME to Microsoft Foundry: Unlocking Tumor Microenvironment Insights with Multimodal AI</title>
      <link>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/bringing-gigatime-to-microsoft-foundry-unlocking-tumor/ba-p/4509452</link>
      <description>&lt;H2 data-section-id="pljofu" data-start="507" data-end="559"&gt;Expanding Microsoft Foundry for Scientific AI Workloads&lt;/H2&gt;
&lt;P class="lia-align-justify" data-start="561" data-end="843"&gt;AI is increasingly being applied to model complex real-world systems, from climate and industrial processes to human biology. In healthcare, one of the biggest challenges is translating routinely available data into deeper biological insight at scale.&lt;/P&gt;
&lt;P class="lia-align-justify" data-start="845" data-end="1000"&gt;&lt;STRONG data-start="845" data-end="924"&gt;&lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/GigaTIME" target="_blank"&gt;GigaTIME&lt;/A&gt; is now available in Microsoft Foundry&lt;/STRONG&gt;, bringing advanced multimodal capabilities to healthcare and life sciences. This brings advanced multimodal capabilities into Foundry, with Foundry Labs enabling early exploration and the Foundry platform supporting scalable deployment aligned to the model’s intended use.&lt;/P&gt;
&lt;P class="lia-align-justify" data-start="1172" data-end="1259"&gt;Read on to understand how GigaTIME works and how you can start exploring it in Foundry.&lt;/P&gt;
&lt;H2 data-section-id="134wdxj" data-start="1266" data-end="1319"&gt;From Routine Slides to Deep Biological Insight&lt;/H2&gt;
&lt;P class="lia-align-justify" data-start="1321" data-end="1566"&gt;Understanding how tumors interact with the immune system is central to modern cancer research. While techniques like multiplex immunofluorescence provide these insights, they are expensive and difficult to scale across large patient populations.&lt;/P&gt;
&lt;P class="lia-align-justify" data-start="1568" data-end="1862"&gt;GigaTIME addresses this challenge by translating widely available hematoxylin and eosin pathology slides into spatially resolved protein activation maps. This allows researchers to infer biological signals such as immune activity, tumor growth, and cellular interactions at a much deeper level. Developed in collaboration with Providence and the University of Washington, GigaTIME enables analysis of tumor microenvironments across diverse cancer types, helping accelerate discovery and improve understanding of disease biology.&lt;/P&gt;
&lt;H4 data-section-id="7uuvl3" data-start="2110" data-end="2139"&gt;&lt;STRONG data-start="2113" data-end="2139"&gt;Use cases for GigaTIME&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-start="2141" data-end="2257"&gt;GigaTIME is designed to support research and evaluation workflows across a range of real-world scientific scenarios:&lt;/P&gt;
&lt;UL&gt;
&lt;LI data-start="1267" data-end="1467"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="1267" data-end="1320"&gt;Population-Scale Tumor Microenvironment Analysis:&lt;/STRONG&gt; Enable large-scale analysis of tumor–immune interactions by generating virtual multiplex immunofluorescence outputs from routine pathology slides.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="1469" data-end="1648"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="1469" data-end="1505"&gt;Biomarker Association Discovery:&lt;/STRONG&gt; Identify relationships between protein activation patterns and clinical attributes such as mutations, biomarkers, and disease characteristics.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="1650" data-end="1838"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="1650" data-end="1697"&gt;Patient Stratification and Cohort Analysis:&lt;/STRONG&gt; Segment patient populations across cancer types and subtypes using spatial and combinatorial signals for research and hypothesis generation.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Clinical Trial Retrospective Analysis:&lt;/STRONG&gt; Apply GigaTIME to H&amp;amp;E archives from completed clinical trials to retrospectively characterize tumor microenvironment features associated with treatment outcomes, enabling new insights from existing trial data without additional tissue processing.&amp;nbsp;&lt;/LI&gt;
&lt;LI data-start="1840" data-end="2024"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="1840" data-end="1878"&gt;Tumor–Immune Interaction Analysis:&lt;/STRONG&gt; Assess whether immune cells are infiltrating tumor regions or being excluded by analyzing spatial relationships between tumor and immune signals.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="2026" data-end="2200"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="2026" data-end="2071"&gt;Immune System Structure Characterization:&lt;/STRONG&gt; Understand how immune cell populations are organized within tissue to evaluate coordination or fragmentation of immune response.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="2202" data-end="2368"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="2202" data-end="2247"&gt;Immune Checkpoint Context Interpretation:&lt;/STRONG&gt; Examine how immune activity may be locally regulated by analyzing overlap between immune markers and checkpoint signals.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="2370" data-end="2503"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="2370" data-end="2403"&gt;Tumor Proliferation Analysis:&lt;/STRONG&gt; Identify actively growing tumor regions by combining proliferation signals with tumor localization.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI data-start="2505" data-end="2650"&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG data-start="2505" data-end="2552"&gt;Stromal and Vascular Context Understanding:&lt;/STRONG&gt;&amp;nbsp;Analyze how tissue architecture, such as vascular density and desmoplastic stroma, shapes immune cell access to tumor regions, helping characterize mechanisms of immune exclusion or infiltration.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="1ujvefq" data-start="2099" data-end="2144"&gt;Start with Exploration, Then Go Deeper&lt;/H2&gt;
&lt;H4 data-section-id="kia921" data-start="2146" data-end="2177"&gt;&lt;STRONG&gt;Explore in Foundry Labs&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P class="lia-align-justify" data-start="2179" data-end="2419"&gt;&lt;A class="lia-external-url" href="https://labs.ai.azure.com/" target="_blank" rel="noopener"&gt;Foundry Labs&lt;/A&gt; provides a lightweight environment for early exploration of emerging AI capabilities. It allows developers and researchers to quickly understand how models like GigaTIME behave before integrating them into production workflows.&lt;/P&gt;
&lt;P class="lia-align-justify" data-start="2421" data-end="2598"&gt;With &lt;A class="lia-external-url" href="https://labs.ai.azure.com/projects/gigatime/" target="_blank" rel="noopener"&gt;GigaTIME in Foundry Labs&lt;/A&gt;, you can engage with real-world healthcare scenarios and explore how multimodal models translate pathology data into meaningful biological insight.&lt;/P&gt;
&lt;P class="lia-align-justify" data-start="2600" data-end="2637"&gt;Through curated experiences, you can:&lt;/P&gt;
&lt;UL class="lia-align-justify" data-start="2638" data-end="2778"&gt;
&lt;LI data-section-id="l8hods" data-start="2638" data-end="2675"&gt;Run inference on pathology images&lt;/LI&gt;
&lt;LI data-section-id="yvfjkc" data-start="2676" data-end="2725"&gt;Visualize spatial protein activation patterns&lt;/LI&gt;
&lt;LI data-section-id="1x5j20d" data-start="2726" data-end="2778"&gt;Explore tumor and immune interactions in context&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify" data-start="2780" data-end="2880"&gt;This helps you build intuition and evaluate how the model can be applied to your specific use cases.&lt;/P&gt;
&lt;H4 data-section-id="1p3jw29" data-start="2887" data-end="2925"&gt;&lt;STRONG&gt;Go Deeper with GitHub Examples&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P class="lia-align-justify" data-start="2927" data-end="3168"&gt;For advanced scenarios, you can access the underlying notebooks and workflows via GitHub. These examples provide flexibility to customize pipelines, extend workflows, and integrate GigaTIME into broader research and application environments. Together, Foundry Labs and GitHub provide a path from guided exploration to deeper customization.&lt;/P&gt;
&lt;H2 data-section-id="1dcw9se" data-start="3274" data-end="3324"&gt;Discover and Deploy GigaTIME in Microsoft Foundry&lt;/H2&gt;
&lt;H4 data-section-id="ui7s9x" data-start="3326" data-end="3365"&gt;&lt;STRONG&gt;Discover in the Foundry Catalog&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-start="3367" data-end="3533"&gt;GigaTIME is available in the &lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/GigaTIME" target="_blank" rel="noopener"&gt;Foundry model catalog&lt;/A&gt; alongside a growing set of domain-specific models across healthcare, geospatial intelligence, physical systems and more.&lt;/P&gt;
&lt;img /&gt;
&lt;H4 data-section-id="s4blo0" data-start="3540" data-end="3564"&gt;&lt;STRONG&gt;Deploy for Research Workflows&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P class="lia-align-justify" data-start="3566" data-end="3709"&gt;For advanced usage, GigaTIME can be deployed as an endpoint within Foundry to support research and evaluation workflows such as:&lt;/P&gt;
&lt;UL class="lia-align-justify" data-start="3710" data-end="3792"&gt;
&lt;LI data-section-id="15d2sec" data-start="3710" data-end="3733"&gt;Biomarker discovery&lt;/LI&gt;
&lt;LI data-section-id="tdzamb" data-start="3734" data-end="3760"&gt;Patient stratification&lt;/LI&gt;
&lt;LI data-section-id="118fqn" data-start="3761" data-end="3792"&gt;Clinical research pipelines&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify" data-start="3794" data-end="3971"&gt;You can start with early exploration in Foundry Labs and transition to scalable deployment on the Foundry platform using the tools and workflows designed for each stage, in line with the intended use of the model.&lt;/P&gt;
&lt;H2 data-section-id="1seug6d" data-start="3978" data-end="4027"&gt;A New Class of AI for Scientific Discovery&lt;/H2&gt;
&lt;P class="lia-align-justify" data-start="4029" data-end="4120"&gt;GigaTIME reflects a broader shift toward AI systems designed to model real-world phenomena. These systems are multimodal, deeply tied to domain-specific data, and designed to produce spatial and contextual outputs. They rely on workflows that combine data processing, model inference, and interpretation, which requires platforms that support the full lifecycle from exploration to production. Microsoft Foundry is built to support this evolution.&lt;/P&gt;
&lt;H4 data-section-id="xfp7aw" data-start="4485" data-end="4518"&gt;&lt;STRONG&gt;Learn More and Get Started&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-start="4520" data-end="4555"&gt;To explore GigaTIME in more detail:&lt;/P&gt;
&lt;UL data-start="4557" data-end="4794"&gt;
&lt;LI data-section-id="1f0z166" data-start="4557" data-end="4640"&gt;Read the &lt;A class="lia-external-url" href="https://www.microsoft.com/en-us/research/blog/gigatime-scaling-tumor-microenvironment-modeling-using-virtual-population-generated-by-multimodal-ai/" target="_blank" rel="noopener"&gt;Microsoft Research&lt;/A&gt; blog on the underlying research, and population-scale findings.&lt;/LI&gt;
&lt;LI data-section-id="4vn8z4" data-start="4641" data-end="4683"&gt;Try hands-on scenarios in &lt;A class="lia-external-url" href="https://labs.ai.azure.com/projects/gigatime/" target="_blank" rel="noopener"&gt;Foundry Labs&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-section-id="1usrfw7" data-start="4684" data-end="4733"&gt;Access &lt;A class="lia-external-url" href="https://aka.ms/gigatime-sample" target="_blank" rel="noopener"&gt;GitHub examples&lt;/A&gt; for advanced workflows&lt;/LI&gt;
&lt;LI data-section-id="6h5wwb" data-start="4734" data-end="4794"&gt;Explore and deploy the model through the &lt;A class="lia-external-url" href="https://ai.azure.com/catalog/models/GigaTIME" target="_blank" rel="noopener"&gt;Foundry catalog&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="34b6ff" data-start="4801" data-end="4821"&gt;Looking Ahead&lt;/H2&gt;
&lt;P class="lia-align-justify" data-start="4823" data-end="5002"&gt;As AI continues to expand into domains like healthcare, climate science, and industrial systems, the ability to connect models, data, and workflows becomes increasingly important. GigaTIME highlights what is possible when these elements come together, transforming routinely available data into actionable scientific insight.&lt;/P&gt;
&lt;P data-start="5151" data-end="5193"&gt;We are excited to see what you build next.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 11:36:56 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/bringing-gigatime-to-microsoft-foundry-unlocking-tumor/ba-p/4509452</guid>
      <dc:creator>Saumil-Shrivastava</dc:creator>
      <dc:date>2026-04-14T11:36:56Z</dc:date>
    </item>
  </channel>
</rss>

