ai
434 TopicsIntroducing OpenAI's GPT-image-2 in Microsoft Foundry
Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy. OpenAI's GPT-image-2 is now generally available and rolling out today to Microsoft Foundry, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one. What is new in GPT-image-2? GPT-image-2 brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows. Real world intelligence GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks. Multilingual understanding GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized. Increased resolution support GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions. Resolution guidelines to keep in mind: Constraint Detail Total pixel budget Maximum pixels in final image cannot exceed 8,294,400 Minimum pixels in final image cannot be less than 655,360 Requests exceeding this are automatically resized to fit. Resolutions 4K, 1024x1024, 1536x1024, and 1024x1536 Dimension alignment Each dimension must be a multiple of 16 Note: If your requested resolution exceeds the pixel budget, the service will automatically resize it down. Intelligent routing layer GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value. Mode 1 — Legacy size selection In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation: Size tier Description smimage Small image output image Standard image output xlimage Large image output This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes. Mode 2 — Token size bucket selection In Mode 2, the routing layer selects from six token size buckets — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers: Token bucket Approximate legacy size 16, 24 smimage 36, 48 image 64, 96 xlimage This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt. See it in action GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used: Prompt: Interior of an empty subway car (no people). Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows. Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor). As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image: Prompt: Populate the ad frames with a cohesive ad campaign for “Zava Flower Delivery” and use an array of flower types. And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change: Prompt: In all Zava Flower Delivery advertisements, change the flowers shown to roses (red and pink roses). And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases. Image generation across industries These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios: Retail & e-commerce: Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing. Marketing: Produce crisp, rich in color campaign visuals and social assets localized to different markets. Media & entertainment: Generate storyboard panels and scene at resolutions suited to production pipelines. Education & training: Create visual learning aids and course materials formatted to exact display requirements across devices. UI/UX design: Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires. Trust and safety At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. Learn more about responsible AI at Microsoft. For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content. Pricing Model Offer type Pricing - Image Pricing - Text GPT-image-2 Standard Global Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $30 Input Tokens: $5 Cached Input Tokens: $1.25 Output Tokens: $10 Note: All prices are per 1M token. Getting started Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. GPT-image-2 gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the GPT-image-2 in Microsoft Foundry today! Deploy the model in Microsoft Foundry Experiment with the model in the Image playground Read the documentation to learn more9.4KViews3likes4CommentsMC1269241 - Can't turn off antropic for agentmode
Hi, I’m trying to figure out how to turn off the new feature where Anthropic is enabled by default for everyone in the EU for agent mode in excel and powerpoint, as described in the Message Center news in the title. It is not possible to press Save when you have unchecked the setting unless you also consent to the terms of use, it is greyed out. Since we want the feature turned off, we do not want to agree to any terms of use. Has anyone found a solution? This seems to be misconfigured.Stop Experimenting, Start Building: AI Apps & Agents Dev Days Has You Covered
The AI landscape has shifted. The question is no longer “Can we build AI applications?” it’s “Can we build AI applications that actually work in production?” Demos are easy. Reliable, scalable, resilient AI systems that handle real-world complexity? That’s where most teams struggle. If you’re an AI developer, software engineer, or solution architect who’s ready to move beyond prototypes and into production-grade AI, there’s a series built specifically for you. What Is AI Apps & Agents Dev Days? AI Apps & Agents Dev Days is a monthly technical series from Microsoft Reactor, delivered in partnership with Microsoft and NVIDIA. You can explore the full series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ This isn’t a slide deck marathon. The series tagline says it best: “It’s not about slides, it’s about building.” Each session tackles real-world challenges, shares patterns that actually work, and digs into what’s next in AI-driven app and agent design. You bring your curiosity, your code, and your questions. You leave with something you can ship. The sessions are led by experienced engineers and advocates from both Microsoft and NVIDIA, people like Pamela Fox, Bruno Capuano, Anthony Shaw, Gwyneth Peña-Siguenza, and solutions architects from NVIDIA’s Cloud AI team. These aren’t theorists; they’re practitioners who build and ship the tools you use every day. What You’ll Learn The series covers the full spectrum of building AI applications and agent-based systems. Here are the key themes: Building AI Applications with Azure, GitHub, and Modern Tooling Sessions walk through how to wire up AI capabilities using Azure services, GitHub workflows, and the latest SDKs. The focus is always on code-first learning, you’ll see real implementations, not abstract architecture diagrams. Designing and Orchestrating AI Agents Agent development is one of the series’ strongest threads. Sessions cover how to build agents that orchestrate long-running workflows, persist state automatically, recover from failures, and pause for human-in-the-loop input, without losing progress. For example, the session “AI Agents That Don’t Break Under Pressure” demonstrates building durable, production-ready AI agents using the Microsoft Agent Framework, running on Azure Container Apps with NVIDIA serverless GPUs. Scaling LLM Inference and Deploying to Production Moving from a working prototype to a production deployment means grappling with inference performance, GPU infrastructure, and cost management. The series covers how to leverage NVIDIA GPU infrastructure alongside Azure services to scale inference effectively, including patterns for serverless GPU compute. Real-World Architecture Patterns Expect sessions on container-based deployments, distributed agent systems, and enterprise-grade architectures. You’ll learn how to use services like Azure Container Apps to host resilient AI workloads, how Foundry IQ fits into agent architectures as a trusted knowledge source, and how to make architectural decisions that balance performance, cost, and scalability. Why This Matters for Your Day Job There’s a critical gap between what most AI tutorials teach and what production systems actually require. This series bridges that gap: Production-ready patterns, not demos. Every session focuses on code and architecture you can take directly into your projects. You’ll learn patterns for state persistence, failure recovery, and durable execution — the things that break at 2 AM. Enterprise applicability. The scenarios covered — travel planning agents, multi-step workflows, GPU-accelerated inference — map directly to enterprise use cases. Whether you’re building internal tooling or customer-facing AI features, the patterns transfer. Honest trade-off discussions. The speakers don’t shy away from the hard questions: When do you need serverless GPUs versus dedicated compute? How do you handle agent failures gracefully? What does it actually cost to run these systems at scale? Watch On-Demand, Build at Your Own Pace Every session is available on-demand. You can watch, pause, and build along at your own pace, no need to rearrange your schedule. The full playlist is available at This is particularly valuable for technical content. Pause a session while you replicate the architecture in your own environment. Rewind when you need to catch a configuration detail. Build alongside the presenters rather than just watching passively. What You’ll Walk Away Wit After working through the series, you’ll have: Practical agent development skills — how to design, orchestrate, and deploy AI agents that handle real-world complexity, including state management, failure recovery, and human-in-the-loop patterns Production architecture patterns — battle-tested approaches for deploying AI workloads on Azure Container Apps, leveraging NVIDIA GPU infrastructure, and building resilient distributed systems Infrastructure decision-making confidence — a clearer understanding of when to use serverless GPUs, how to optimise inference costs, and how to choose the right compute strategy for your workload Working code and reference implementations — the sessions are built around live coding and sample applications (like the Travel Planner agent demo), giving you starting points you can adapt immediately A framework for continuous learning — with new sessions each month, you’ll stay current as the AI platform evolves and new capabilities emerge Start Building The AI applications that will matter most aren’t the ones with the flashiest demos — they’re the ones that work reliably, scale gracefully, and solve real problems. That’s exactly what this series helps you build. Whether you’re designing your first AI agent system or hardening an existing one for production, the AI Apps & Agents Dev Days sessions give you the patterns, tools, and practical knowledge to move forward with confidence. Explore the series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ and start watching the on-demand sessions at the link above. The best time to level up your AI engineering skills was yesterday. The second-best time is right now and these sessions make it easy to start.GPT Capability in Understanding Coordinates: How GPT-5.4 Transforms Spatial Precision
Why I Ran This Experiment This work started not as a benchmarking exercise, but as a practical problem: I needed to automatically extract panel regions from PDF-format electrical Single-Line Diagram (SLD) drawings using OpenAI models . All experiments were conducted using OpenAI models in Microsoft Foundry- Microsoft's unified platform for building generative AI applications. The downstream goal was a pipeline that combines GPT model with Azure Document Intelligence to generate Bills of Materials (BOMs) — a project I wrote about separately in Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5 + Azure Document Intelligence Pipeline. Before building that pipeline, I needed a clear-eyed answer to a deceptively simple question: how well can GPT actually understand and return pixel-level coordinates from an image? If the model can't reliably locate a panel bounding box, the rest of the pipeline doesn't matter. When I first ran these tests against GPT-5.2, the results were mixed — good enough to be promising, but inconsistent enough to leave clear room for improvement. I tried many workarounds: feeding image dimensions explicitly, overlaying coordinate grids, enabling extended reasoning, and building iterative self-correction loops. Each helped, but required deliberate engineering effort. Then GPT-5.4 was released. Re-running the same benchmark revealed that most of those workarounds were no longer necessary. Context: All experiments use a fixed CAD-style test image (847 × 783 px) with a known ground-truth bounding box at [135, 165, 687, 619] . Accuracy is measured by Intersection over Union (IoU) — a score of 1.0 is a perfect match. Every test was run 5 times and averaged. for all coordinate experiments. The Experiment Design I designed experiments across two axes: prompt strategy (how spatial information is presented to the model) and reasoning mode (standard vs. extended reasoning). Each combination was tested across both GPT-5.2 and GPT-5.4, producing 4 conditions per test. GPT-5.2 and GPT-5.4 were each tested under two reasoning modes (None vs. High), resulting in four conditions in total. Single-Shot Strategies (Tests 1–5) These tests have no iterative validation loop — the model gets one prompt and returns its answer. Each test was run 5 times and the results averaged, so the scores reflect consistency, not a single lucky attempt. The differences between tests lie in how spatial information is framed in the prompt. Test 1 is a simple sanity check: can the model understand percentage-based coordinates at all? The model receives the clean image (no overlay) and is asked: "return the pixel coordinate at 30% width, 50% height." The expected answer is (254, 392). GPT-5.2 gets the X coordinate roughly right (~254–260), but the Y coordinate scatters wildly — predictions range from 260 to 322, consistently 100+ pixels above the correct position. GPT-5.4 returns (254, 392) on every single run, essentially pixel-perfect. Even on this simple sanity check, the gap is stark: GPT-5.4 is pixel-perfect from the start, while GPT-5.2 shows a clear Y-axis bias. But a single-point test doesn't tell us how well the models handle real spatial tasks. The next question: can they detect a full bounding box? Tests 2–5: Bounding Box Detection with Increasing Prompt Richness Tests 2–5 move to the real task: detecting a bounding box drawn on the image. Each test sends a different version of the same base image, with progressively richer spatial context in the prompt: Feedback Loop Strategies (Tests 6A–7B) These tests add an iterative validation loop: the model's predicted bounding box is overlaid on the image and sent back for self-correction — up to 5 iterations (early stop at IoU ≥ 0.99). All feedback tests share the same two-phase structure: an init step (first prediction) and a validation loop (iterative correction). All feedback tests use the same two images (init + validation overlay), but differ in prompt strategy and color assignment. Image-wise, they fall into two groups: Group A — Orange GT (Tests 6A, 6C, 7A) Group B — Color Bias / Blue GT (Tests 6B, 6D, 7B) Figure 4b — Feedback loop input images. Group B (bottom): colors swapped to test color-role priors. What differs between tests in the same group: The images are identical, but the prompt changes. 6A/6B use holistic comparison ("compare and correct"). 6C/6D additionally send the full history of past predictions as multi-image input. 7A/7B ask for per-edge directional judgments ("move left/right/up/down/none" for each edge independently). Results 1. Model version is the single biggest factor Across every test, GPT-5.4 dramatically outperforms GPT-5.2. The gap is not incremental — it's the difference between a bounding box that roughly overlaps the target and one that is essentially pixel-perfect. GPT-5.4 achieved an IoU of 0.99 or above on its very first attempt on tests where GPT-5.2 had only scored between 0.76 and 0.88. GPT-5.4 (green bars) consistently hits ≥0.99 regardless of prompt strategy or reasoning mode. GPT-5.2 (blue bars) ranges from 0.76 to 0.92. 2. GPT-5.2 is inconsistent; GPT-5.4 locks in Raw averages only tell half the story. GPT-5.2 is unpredictable: on the exact same test with the exact same prompt and image, results fluctuate wildly between runs. The standard deviation on Test 2 is ±0.084 — meaning a single run could land anywhere from 0.66 to 0.88. GPT-5.4 stays within ±0.003. The scatter plots below make this viscerally clear. Each dot is one API call — notice how GPT-5.2 dots spray across the IoU range while GPT-5.4 dots stack on top of each other: Wide scatter on simpler prompts (Test 2: 0.66–0.88); reasoning mode (orange) provides a lift that shrinks with richer prompts (Δmean shown below each panel). Production implication: With GPT-5.2, you couldn't rely on a single inference call — building a reliable pipeline would require multiple calls and majority voting, multiplying latency and cost. With GPT-5.4, a single call is sufficient. 3. Reasoning mode reduced variance for GPT-5.2; GPT-5.4 didn't need it For GPT-5.2, enabling extended reasoning ( reasoning: high ) provided a meaningful boost — especially when the prompt was sparse. On Test 2 (bare image, no spatial context), reasoning added +0.076 IoU and visibly tightened the spread of results across runs. As prompts got richer, the benefit shrank: with a grid overlay (Test 4), reasoning added only +0.007. In other words, reasoning mode acted as a compensating mechanism — filling in the gaps when the prompt alone didn't provide enough spatial scaffolding. For GPT-5.4, reasoning mode offered no additional benefit on this class of task. The base model already achieves 0.99+ IoU, so there was simply no room for improvement. In a few cases the reasoning runs showed marginal regressions (−0.005 to −0.015), likely within noise. The takeaway isn't that reasoning mode is harmful in general, but rather that a spatial-coordinate task at this complexity level doesn't require it when the underlying model already has strong coordinate understanding. Figure 7 — Effect of reasoning mode: GPT-5.2 gains +0.04–0.08 from reasoning (blue bars), largest on sparse prompts. GPT-5.4 shows no meaningful gain (green bars near zero). 4. Richer prompts close the gap (but only for GPT-5.2) For GPT-5.2, providing more spatial context in the prompt made a big difference: from 0.765 (Test 2, no info) to 0.910 (Test 4, grid overlay) — a +0.145 IoU gain just from adding visual reference rulers to the image. Telling the model the image dimensions (Test 3) was a "free win" that cost nothing. For GPT-5.4, all prompt variants produce essentially the same result (0.989–0.997). The model already understands spatial coordinates well enough that extra scaffolding adds no value. ided. GPT-5.4 is flat at ≥0.99 regardless. If you're still on GPT-5.2: Always inject image dimensions into the prompt (free). Use grid overlays for the biggest single-shot gain (+0.145 IoU). With GPT-5.4, none of this is needed. 5. Validation loops: essential for GPT-5.2, Option for GPT-5.4 The feedback loop tests (6A–7B) showed that iterative self-correction genuinely helped GPT-5.2 improve from its initial prediction. For example, in Test 7A (directional feedback), GPT-5.2 improved from an init IoU of 0.926 to a best of 0.969 over 5 iterations. For GPT-5.4, every single run hit IoU ≥ 0.99 on iteration 1 and early-stopped immediately. There was nothing left to correct. The validation loop infrastructure — overlay rendering, multi-turn prompting, iteration logic — becomes dead code you can remove from your pipeline. GPT-5.4 (green) starts at ≥0.99 and early-stops at iteration 1. 6. Prompt instruction matters: holistic vs directional feedback Comparing 6A/6B (holistic: "compare the two boxes and correct") with 7A/7B (directional: "for each edge, decide which direction to move"), the directional approach consistently reached higher best IoU for GPT-5.2. The per-edge structured output forced the model to reason about each boundary independently rather than making a holistic guess. Separately, the color bias tests (6B, 7B — GT drawn in blue instead of orange) revealed that swapping GT/prediction colors drops the initial accuracy significantly. In 6A (orange GT) the init IoU was 0.937, but in 6B (blue GT) it dropped to 0.850. This suggests GPT models have learned color-role priors — orange is "expected" as the ground truth color. However, the validation loop largely recovers this gap: after 5 iterations, 6A and 6B converge to similar best IoU (~0.96). The directional variants (7A, 7B) show the same pattern but converge faster. Left: initial accuracy drops when GT is drawn in blue. Right: after the validation loop, the gap closes. Directional feedback (7A/7B) shows the same pattern. For GPT-5.4: Color bias has no measurable effect. All variants (6A/6B/7A/7B) hit 0.994–0.998 IoU on iteration 1 regardless of color assignment. Summary: What Changed from GPT-5.2 to GPT-5.4 The story of this benchmark is really about engineering workarounds that became unnecessary. Here's what we built for GPT-5.2 and whether you still need it: Grid overlays & image dimensions in prompt — Gave +0.05–0.15 IoU for GPT-5.2. Not needed for GPT-5.4 (already ≥0.99 without it). Extended reasoning mode — Gave +0.04–0.08 IoU for GPT-5.2. No benefit for GPT-5.4 on this task (already at ceiling without it). Validation loops (iterative self-correction) — Improved GPT-5.2 by +0.02–0.10 IoU over 5 iterations. Unnecessary for GPT-5.4 (early-stops at iteration 1). Multiple runs & voting — Required for GPT-5.2 due to ±0.08 variance. Not needed for GPT-5.4 (±0.003 variance, single call sufficient). Color convention management — GPT-5.2 showed color bias (−0.09 IoU when colors swapped). No effect on GPT-5.4. GPT-5.4 doesn't just perform better — it makes entire categories of pipeline engineering unnecessary. For clean, CAD-style images like the ones tested here, GPT-5.4 dramatically reduces prompt engineering overhead: grid overlays, image dimension injection, reasoning mode, and validation loops — all of which required deliberate effort with GPT-5.2 — are no longer necessary. This translates directly to simpler pipelines, lower latency, and lower cost. That said, for more complex scenarios — multiple overlapping panels, cluttered backgrounds, or ambiguous region boundaries — iterative validation loops could still prove valuable, and we plan to explore this in future work. This benchmark started as a sanity check and turned into a clear signal: GPT-5.4 represents a genuine leap in spatial coordinate understanding, not just a marginal iteration. The gap between 0.765 and 0.997 IoU on an identical task is the difference between a prototype experiment and a production-ready component. Try It Yourself Ready to explore GPT-5.4's spatial precision capabilities? Here are ways to get started: Sample notebooks for bounding box extraction test : github Read the companion post: Extracting BOMs from Electrical Drawings with AI: Azure OpenAI GPT-5 + Azure Document Intelligence — See how this benchmark informed a production pipeline260Views3likes0CommentsAutomate Prior Authorization with AI Agents - Now Available as a Foundry Template
By Amit Mukherjee · Principal Solutions Engineer, Microsoft Health & Life Sciences Lindsey Craft-Goins · Technology Leader - Cloud & AI Platforms, Health & Life Sciences Joel Borellis · Director Solutions Engineering - Cloud & AI Platforms, Health & Life Sciences Prior authorization (PA) is one of the most expensive bottlenecks in U.S. healthcare. Physicians complete an average of 39 PA requests per week, spending roughly 13 hours of physician-and-staff time on PA-related work (AMA 2024 Prior Authorization Physician Survey). Turnaround averages 5–14 business days, and PA alone accounts for an estimated $35 billion in annual administrative spending (Sahni et al., Health Affairs Scholar, 2024). The regulatory clock is now ticking. CMS-0057-F mandates electronic PA with 72-hour urgent response starting in 2026. Forty-nine states plus DC already have PA laws on the books, and at least half of all U.S. state legislatures introduced new PA reform bills this year, including laws specifically targeting AI use in PA decisions (KFF Health News, April 2026). Today we’re making the Prior Authorization Multi-Agent Solution Accelerator available as a Microsoft Foundry template. Health plan payers can deploy a working, four-agent PA review pipeline to Azure using the Azure Developer CLI (“azd”) with a single command in supported environments, then customize it to their policies, workflows, and EHR environment. Try it now: Find the template in the Foundry template gallery, or clone directly from github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator What the template delivers The accelerator deploys four specialist Foundry hosted agents (Compliance, Clinical Reviewer, Coverage, and Synthesis), each independently containerized and managed by Foundry. In internal testing with synthetic demo cases, the pipeline reduced review workflow, from beginning to completion in under 5 minutes per case. Agent Role Key capability Compliance Documentation check 10-item checklist with blocking/non-blocking flags Clinical Reviewer Clinical evidence ICD-10 validation, PubMed + ClinicalTrials.gov search Coverage Policy matching CMS NCD/LCD lookup, per-criterion MET/NOT_MET mapping Synthesis Decision rubric 3-gate APPROVE/PEND with weighted confidence scoring Compliance and Clinical run in parallel. Coverage runs after clinical findings are ready. Synthesis evaluates all three outputs through a three-gate rubric. The result is a structured recommendation with per-criterion confidence scores and a full audit trail, not a black-box answer. Solution architecture The accelerator runs entirely on Azure. The frontend and backend deploy as Azure Container Apps. The four specialist agents are hosted by Microsoft Foundry. Real-time healthcare data flows through third-party MCP servers. Figure 1: Azure solution architecture How the pipeline works The four agents execute in a structured parallel-then-sequential pipeline. Compliance and Clinical run simultaneously in Phase 1. Coverage runs after clinical findings are ready. The Synthesis agent applies a three-gate decision rubric over all prior outputs. Figure 2: Agentic architecture, hosted agent pipeline Compliance and Clinical run in parallel via asyncio.gather, since neither depends on the other. Coverage runs sequentially after Clinical because it needs the structured clinical profile for criterion mapping. Synthesis evaluates all three outputs through a three-gate rubric (Provider, Codes, Medical Necessity) with weighted confidence scoring: 40% coverage criteria + 30% clinical extraction + 20% compliance + 10% policy match. The total pipeline time is bound by the slowest parallel agent plus the sequential agents, not the sum. In internal testing with synthetic demo cases, this architecture indicated materially reduced processing time compared to sequential manual workflows. Under the hood For the architect in the room, here are four design decisions worth knowing about: Foundry hosted agents: Each agent is independently containerized, versioned, and managed by Foundry’s runtime. The FastAPI backend is a pure HTTP dispatcher. All reasoning happens inside the agent containers, and there are no code changes between local (Docker Compose) and production (Foundry); the environment variable is the only switch. Structured output: Every agent uses MAF’s response_format enforcement to produce typed Pydantic schemas at the token level. No JSON parsing, no malformed fences, no free-form text. The orchestrator receives typed Python objects; the frontend receives a stable API contract. Keyless security: DefaultAzureCredential throughout, so no API keys are stored anywhere. Managed Identity handles production; azd tokens handle local development. Role assignments are provisioned automatically by Bicep at deploy time. Observability: All agents emit OpenTelemetry traces to Azure Application Insights. The Foundry portal shows per-agent spans correlated by case ID. End-to-end latency, per-agent contribution, and error rates are visible from day one with no additional configuration. For the full architecture documentation, agent specifications, Pydantic schemas, and extension guides, see the GitHub repository. Why this matters now Human-in-the-loop by design The system runs in LENIENT mode by default: it produces only APPROVE or PEND and is not designed to produce automated DENY outcomes in its default configuration. Every recommendation requires a clinician to Accept or Override with documented rationale before the decision is finalized. Override records flow to the audit PDF, notification letters, and downstream systems. This directly addresses the emerging wave of state legislation governing AI use in PA decisions. Domain experts own the rules Agent behavior is defined in markdown skill files, not Python code. When CMS updates a coverage determination or a plan changes its commercial policy, a clinician or compliance officer edits a text file and redeploys. No engineering PR required. Real-time healthcare data via MCP Agents connect to five MCP servers for real-time data: ICD-10 codes, NPI Registry, CMS Coverage policies, PubMed, and ClinicalTrials.gov. This incorporates real‑time clinical reference data sources to inform agent recommendations. Third-party MCP servers are included for demonstration with synthetic data only. Their inclusion does not constitute an endorsement by Microsoft. See the GitHub repository for production migration guidance. Audit-ready from day one Every case generates an 8-section audit justification PDF with per-criterion evidence, data source attribution, timestamps, and confidence breakdowns. Clinician overrides are recorded in Section 9. Notification letters (approval and pend) are generated automatically. These artifacts are designed to support CMS-0057-F documentation requirements. Deploy in under 15 minutes From the Foundry template gallery or from the command line: git clone https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator cd Prior-Authorization-Multi-Agent-Solution-Accelerator azd up That single command provisions Foundry, Azure Container Registry, Container Apps, builds all Docker images, registers the four agents, and runs health checks. The demo is live with a synthetic sample case as soon as deployment completes. What’s included What you customize 4 Foundry hosted agents Payer-specific coverage policies FastAPI orchestrator + Next.js frontend EHR/FHIR integration for clinical notes 5 MCP healthcare data connections Self-hosted MCP servers for production PHI Audit PDF + notification letter generation Authentication (Microsoft Entra ID) Full Bicep infrastructure-as-code Persistent storage (Cosmos DB / PostgreSQL) OpenTelemetry + App Insights observability Additional agents (Pharmacy, Financial) Built on Microsoft Foundry + Foundry hosted agents · Microsoft Agent Framework (MAF) · Azure OpenAI gpt-5.4 · Azure Container Apps · Azure Developer CLI + Bicep · OpenTelemetry + Azure Application Insights · DefaultAzureCredential (keyless, no secrets) Full architecture documentation, agent specifications, and extension guides are in the GitHub repository. Get started Foundry template gallery: Search “AI-Powered Prior Authorization for Healthcare” in the Foundry template section GitHub: github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator Disclaimers Not a medical device. This solution accelerator is not a medical device, is not FDA-cleared, and is not intended for autonomous clinical decision-making. All AI recommendations require qualified clinical review before any authorization decision is finalized. Not production-ready software. This is an open-source reference architecture (MIT License), not a supported Microsoft product. Customers are solely responsible for testing, validation, regulatory compliance, security hardening, and production deployment. Performance figures are illustrative. Metrics cited (including processing time reductions) are based on internal testing with synthetic demo data. Actual results will vary based on case complexity, infrastructure, and configuration. Third-party services included for demonstration only; not endorsed by Microsoft. Customers should evaluate providers against their compliance and data residency requirements. The demo uses synthetic data only. Customers deploying real patient data are responsible for HIPAA compliance and establishing appropriate Business Associate Agreements. This accelerator is intended to help customers align documentation workflows with CMS‑0057‑F requirements but has not been independently validated or certified for regulatory compliance.777Views0likes0CommentsMicrosoft 365 & Power Platform Community call
💡 Microsoft 365 & Power Platform Development bi-weekly community call focuses on different use cases and features within the Microsoft 365 and Power Platform - across Microsoft 365 Copilot, Copilot Studio, SharePoint, Power Apps and more. Demos in this call are presented by the community members. 👏 Looking to catch up on the latest news and updates, including cool community demos, this call is for you! 📅 On 30th of April we'll have following agenda: Latest on SharePoint Framework (SPFx) Latest on Copilot prompt of the week PnPjs CLI for Microsoft 365 Dev Proxy Reusable Controls for SPFx SPFx Toolkit VS Code extension PnP Search Solution Demos this time Adam Wójcik (Hitachi Energy) – SPFx Toolkit Showcase - Manage your SharePoint Online tenant and SPFx projects using SPFx Toolkit Language Model Tools Valentin Mazhar (MazAura) – Better Govern Agents with the Copilot Studio Monitor Reshmee Auckloo (Avanade) – Connect Power Automate to Agents Toolkit 📅 Download recurrent invite from https://aka.ms/community/m365-powerplat-dev-call-invite 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/m365-powerplat-dev-call-join 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 👋 See you in the call! 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!14Views0likes0CommentsMicrosoft 365 & Power Platform product updates call
💡Microsoft 365 & Power Platform product updates call concentrates on the different use cases and features within the Microsoft 365 and in Power Platform. Call includes topics like Microsoft 365 Copilot, Copilot Studio, Microsoft Teams, Power Platform, Microsoft Graph, Microsoft Viva, Microsoft Search, Microsoft Lists, SharePoint, Power Automate, Power Apps and more. 👏 Weekly Tuesday call is for all community members to see Microsoft PMs, engineering and Cloud Advocates showcasing the art of possible with Microsoft 365 and Power Platform. 📅 On the 28th of April we'll have following agenda: News and updates from Microsoft Together mode group photo Ayça Baş – Vision Analysis and Policy Search in Custom Engine Agents Sébastien Levert – Use coding agents to build your Copilot Chat declarative agent April Dunnam – Transforming your Power Apps to Copilot Agents 📞 & 📺 Join the Microsoft Teams meeting live at https://aka.ms/community/ms-speakers-call-join 🗓️ Download recurrent invite for this weekly call from https://aka.ms/community/ms-speakers-call-invite 👋 See you in the call! 💡 Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo 📖 Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home 🧡 Sharing is caring!17Views0likes0CommentsFrom access to adoption: Unlock the power of Microsoft 365
Change is the only constant, but adapting to change isn't always easy. If we consider the celestial speed at which technology is evolving, keeping up with every trending tool can be daunting. An ecosystem like Microsoft 365 designed to simplify work by bringing many tools into one connected workspace can also feel overwhelming. That is why you are invited to join us at the Microsoft 365 Community Conference, where you can connect with Microsoft leaders, product experts, and peers to drive adoption forward and deliver business results. Rolling out Microsoft 365 is only the beginning. Real success happens when people understand the value, adopt new ways of working, and feel confident using the tools every day. That’s why Adoption and Change Management sessions are a cornerstone of the Microsoft 365 Community Conference. These sessions focus on the human side of transformation—helping organizations move from deployment to sustained impact through practical strategies, real-world lessons, and community tested approaches. Whether you’re leading adoption for Microsoft 365 Copilot, driving collaboration change with Teams and SharePoint, or building Champions across your organization, these sessions are designed to meet you where you are. We know from our research that organizations who have a strong Champions program and invest in peer learning have accelerated business results. At the Microsoft 365 Community Conference, adoption sessions go beyond theory. They’re built around: What actually works in the real world Lessons learned from failed or stalled rollouts How to scale change across large, distributed organizations How community-led approaches drive long-term success You’ll hear directly from practitioners, Microsoft experts, and community leaders who have led adoption efforts inside enterprises, public sector organizations, and fast-moving teams. Check out these can't miss adoption and change management sessions Adoption power skills – Building a Champion Program with Karuana Gatimu, Director, Customer Advocacy - AI & Collaboration and Tiffany Lee, Customer Experience Product Manager, Microsoft Collaboration 2026 – The next generation of Teams experiences with Nicole Enders, Microsoft 365 MVP, Managing Consultant - Microsoft Solutions, CONET Solutions GmbH Community in the age of AI – Humans at the center of Copilot adoption with Allison Michels, Sr. Program Manager, Viva Engage, Microsoft, Sarah Lundy, Sr. Customer Experience Program Manager, Viva Engage, Microsoft, and Alex Snyder, Product Manager, Copilot Adoption, Footlocker Demystifying Copilot and AI experiences on Windows with Anupam Pattnaik, Microsoft How Microsoft does IT: Driving adoption of M365 Copilot and agents across Microsoft with Cadie Kneip, Senior Business Program Director and Copilot Champ Community Lead, Microsoft and Stephan Kerametlian, Senior Director, Microsoft Digital, Microsoft From pilot to copilot: Building a scalable AI adoption framework with Tiffany Songvilay, AI Workforce Lead, Avanade Leading workforce transformation: The art and science of skilling your people with Karuana Gatimu, Director, Customer Advocacy - AI & Collaboration and Jessie Hwang, Customer Experience PM, Microsoft 365 Customer Advocacy Group, AI & Collaboration, Microsoft OneDrive, SharePoint, Viva Engage, and Teams… Oh my! Understanding the many collaboration solutions with David Drever, Microsoft 365 MVP - Compliance, Data Protection, and M365 Specialist, Protiviti Start your adoption journey with adoption.microsoft.com with Jessie Hwang, Customer Experience PM, Microsoft 365 Customer Advocacy Group, AI & Collaboration, Microsoft Tools and best practices for accelerating Copilot & agent adoption with Jojo Wright, AI Business Solutions Architect, Microsoft and Karuana Gatimu, Director, Customer Advocacy - AI & Collaboration Be sure to check the Whova app for other meetups and experiences around Adoption and Microsoft 365 Champions. Master the art of adoption at the Microsoft 365 Community Conference Boost your adoption strategy by joining subject matter experts in Orlando, FL, for the Microsoft 365 Community Conference on April 21–23, 2026. Discover how Microsoft is shaping the future of work by empowering teams to achieve more than ever. When the world is moving quickly, with the right technology, we want you to lead the way!933Views0likes1CommentClaim your IQ Series: Foundry IQ badge
The IQ Series kicked off with three Foundry IQ episodes, each paired with a hands-on cookbook. If you've worked through all three or you're planning to, there's now a digital badge waiting for you to claim! What the badge represents The IQ Series: Foundry IQ badge recognizes developers who've completed the full Foundry IQ curriculum end-to-end: not just watched the episodes, but deployed the Azure resources, run every notebook, and built working knowledge bases against live data. Earners have: Deployed AI Search, Azure OpenAI, a Foundry project, and Azure Blob Storage with seeded sample data Connected structured and unstructured sources into Foundry IQ Built and queried multi-source AI knowledge bases Grounded agent responses in permission-aware enterprise knowledge Badges are issued by the Global AI Community, so you'll want an account there before you submit. What the three episodes cover Episode 1 — Unlocking Knowledge for Your Agents. Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of Foundry IQ that support knowledge-driven applications. Episode 2 — Building the Data Pipeline with Knowledge Sources. Focuses on Knowledge Sources and how different types of content flow into Foundry IQ across SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web. Episode 3 — Querying the Multi-Source AI Knowledge Bases. Dives into Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode is paired with a cookbook for you to learn hands-on and each of them reuses the same Azure deployment, so you set up once and build across all three. How to claim the badge Four steps, in order: Fork the IQ Series repo and work through all three episode cookbooks in your fork. Commit your notebooks with cell outputs saved! That's the proof of completion. Capture a final output screenshot for each episode. Your GitHub username or Azure resource name needs to be visible in the screenshot. Submit a badge request issue. The template walks you through fork URLs, screenshots, and one brief technical takeaway per episode. Complete the badge form. This step is required. Without the form, we can't issue the badge. Why this badge is worth your time The IQ Series recognizes your hands-on learning with real infrastructure, real indexed data, real agents and queries. If you're working on enterprise AI (grounding, retrieval, knowledge-aware agents), this is a concrete artifact that says: I've built this, end to end, on the actual platform. Work IQ and Fabric IQ are coming next, and each phase will have its own badge. Foundry IQ is your head start on the full IQ Series. 👉 Start with Episodes or jump straight to the cookbooks if you prefer to learn by doing. Questions along the way? Create and issue in the repo or drop into our Discord. The Foundry IQ team and community are there to help.From Playwright Automation to Agent Driven Testing (GHCP in Action)
What Is Agent-Driven Testing? Agent-driven testing represents a revolutionary shift from traditional, hardcoded test automation to intelligent, adaptive testing powered by AI agents. Unlike conventional Playwright tests that rely on static selectors and predefined workflows, agent-driven testing leverages GitHub Copilot (GHCP) agents with Model Context Protocol (MCP) to dynamically analyze web pages, discover elements intelligently, and create self-healing tests that adapt to UI changes. Traditional vs Agent-Driven Approach Comparison Traditional Playwright Agent-Driven (MCP-Enhanced) Hardcoded selectors AI-discovered elements Static test scripts Dynamic, adaptive tests Breaks with UI changes Self-healing automation Manual element analysis Intelligent page exploration Rule-based logic Context-aware decisions Limited fallback options Intelligent cascading strategies ❌ Traditional Approach - Brittle and static const searchInput = page.locator('input[name="q"]'); ✅ Agent-Driven Approach - Intelligent and adaptive Uses AI discovery const searchResult = await this.mcpClient.callTool({ name: 'playwright_find_element', arguments: { element_type: 'search_input', page_url: await this.page.url(), confidence_threshold: 0.8, generate_multiple_selectors: true The agent doesn't just execute tests—it thinks about them, analyzing page structure, scoring element reliability, and making intelligent decisions about the best interaction strategies. How Does Agent-Driven Testing Work with MCP? Agent-driven testing operates through a sophisticated Model Context Protocol (MCP) workflow that mimics human intelligence. Here's how the MCP server analyzes pages and makes intelligent decisions: 🔬 1. Intelligent Page Analysis The agent first explores the target website like a human tester would: MCP-Enhanced exploration from your implementation `` 🧠 2. Dynamic Element Discovery with Confidence Scoring Your implementation shows how MCP uses confidence scoring to intelligently identify elements: Intelligent scoring from your sample workflow.page.ts 🎯 3. MCP Server Page Analysis The MCP server analyzes page content and provides intelligent insights: MCP snapshot and analysis from your implementation ⚡ 4. Adaptive Fallback Strategies When primary strategies fail, the agent intelligently cascades through alternatives: Your implementation's intelligent fallback system async getSearchInput() { console.log('🔎 MCP: Using intelligently discovered search input...'); Try MCP-discovered element first (highest reliability) Real-time dynamic discovery (adaptive) Traditional selectors (fallback safety) How to Implement Agent-Driven Testing: Step-by-Step Guide Step 1: Create GitHub Agent Configuration Create the agent configuration that enables MCP capabilities: 1) Under your project directory mkdir -p .github/agents Create .github/agents/playwright-agent.md with your exact configuration: Step 2: Select Agent in GitHub Copilot Chat Open GitHub Copilot Chat in VS Code Click the agent selector at the top of the chat Choose Playwright Tester Mode from the dropdown The agent will now use MCP-enhanced capabilities Step 3: Create Test Using Natural Language Prompts Now you can create tests using natural language prompts to the agent: Prompt to GHCP Agent: > Create a test case that navigates to www.google.com, searches for 'playwright tutorial', and navigates to the Playwright homepage. Use MCP analysis to discover elements intelligently." The agent will generate a test like your implementation: Step 4: Execute and Monitor Agent Intelligence Run your MCP-enhanced tests: npm install @playwright/test npx playwright install npx playwright test --headed Watch the intelligent decision-making in action: 🚀 Starting MCP-Enhanced Test Journey... 🔬 Using MCP to explore and navigate to Google... 🧠 MCP: Analyzing target URL: https://www.google.com 📸 MCP: Taking page snapshot for element analysis... 🔍 MCP: Analyzing page elements dynamically... 🎯 MCP: Discovered search input: input[name="q"] 🔎 MCP: Using intelligently discovered search input... ✅ MCP: Using discovered selector: input[name="q"] ✅ MCP: Search executed using discovered elements Sample Test Case Results - Google to Playwright Navigation Based on your actual implementation, here's what the agent accomplishes: Test Execution Flow: 🔬 MCP Page Analysis MCP: Analyzing target URL: https://www.google.com MCP: Taking page snapshot for element analysis MCP: Analyzing page elements dynamically 🎯 Intelligent Element Discovery MCP: Discovered search input: input[name="q"] MCP: Discovered search button: input[value="Google Search"] 🔍 Confidence-Based Search Execution MCP: Using intelligently discovered search input MCP: Search executed using discovered elements 🧠 Adaptive Link Detection // From your discoverPlaywrightLinks implementation if (href.includes('playwright.dev')) confidence += 50; if (fullText.includes('playwright')) confidence += 20; Outcome Details 🎯 Performance Results Based on your test execution summary: Metric Traditional Approach MCP-Enhanced Approach Element Discovery Static, breaks easily 95% success with confidence scoring Maintenance Effort High (manual updates) 90% reduction** (self-healing) Bot Detection Handling Basic fallback Intelligent adaptive strategies Test Reliability 60-70% (UI changes) 85-90%** (AI adaptation) Debugging Time 2-4 hours per failure 20-30 minutes** (intelligent insights) 🚀 Key Benefits Achieved Self-Healing Tests - Tests adapt to UI changes automatically - Confidence scoring prevents false positives - Intelligent fallback strategies improve reliability Intelligent Element Discovery No more hardcoded selectors that break Instead: AI-powered discovery with scoring: if (name === 'q') score += 10; if (role === 'combobox') score += 7; if (placeholder?.includes('search')) score += 5; Enhanced Debugging & Insights ✅ MCP: Using discovered selector: input[name="q"] 🧠 MCP: Found 18 potential Playwright links Natural Language Test Creation - Write tests using prompts instead of code - Agent generates optimized, intelligent automation -Built-in best practices and error handling 🔮 The Future of Testing is Intelligent Agent-driven testing with GitHub Copilot and MCP represents the evolution from brittle, maintenance-heavy automation to intelligent, self-healing test suites. Your implementation demonstrates how AI can: - Think about element discovery instead of hardcoding selectors - Adapt to UI changes through confidence scoring - Learn from page analysis to improve over time - Heal automatically when traditional approaches fail The result? Tests that improve themselves, dramatically reducing maintenance overhead while increasing reliability and providing intelligent insights into application behavior. Start your journey from traditional Playwright automation to intelligent agent-driven testing today—your future self (and your QA team) will thank you! 🚀 Implementation Checklist ✅ Quick Start Checklist - Create github/agents/playwright-agent.md configuration file - Select "Playwright Tester Mode" agent in GitHub Copilot Chat - Install Playwright: `npm install @playwright/test` - Create MCP-enhanced Page Object Model with confidence scoring - Configure `playwright.config.ts` with proper reporting - Write tests using natural language prompts to the agent - Run tests and observe intelligent decision-making: `npx playwright test --headed` - Review MCP insights in console output and test reports 🎯 Success Metrics You'll know agent-driven testing is working when you see: - Console logs showing MCP analysis: "MCP: Analyzing page elements dynamically..." - Confidence scoring in action: "MCP: Found 18 potential Playwright links" - Adaptive behavior: "MCP: Using discovered selector: input[name='q']" - Self-healing: Tests passing even when UI changes occur - Reduced maintenance: 90% fewer test fix cycles