healthcare
532 TopicsDriving AI‑Powered Healthcare: A Data & AI Webinar and Workshop Series
Across these sessions, you’ll learn how healthcare organizations are using Microsoft Fabric, advanced analytics, and AI to unify fragmented data, modernize analytics, and enable intelligent, scalable solutions, from enterprise reporting to AI‑powered use cases. Whether you’re just getting started or looking to accelerate adoption, these sessions offer practical guidance, real‑world examples, and hands‑on learning to help you build a strong data foundation for AI in healthcare. Date Topic Details Location Registration Link May 6 Webinar: Microsoft Fabric Foundations - A Simple Path to Modern Analytics and AI Discover how Microsoft Fabric consolidates fragmented analytics into a single integrated data platform, making it easier to deliver trusted insights and adopt AI without added complexity. Virtual Register May 13 Webinar: Reduce BI Sprawl, Cut Cost and Build an AI-Ready Analytics Foundation Learn how Power BI enables enterprise BI consolidation, consistent metrics, and secure, scalable analytics that support both operational reporting and emerging AI use cases. Virtual Register May 19-20 In Person Workshop: Driving AI‑Powered Healthcare: Advanced Analytics, AI, and Real‑World Impact Attend this two‑day, in‑person event to learn how healthcare organizations use Microsoft Fabric to unify data, accelerate AI adoption, and deliver measurable clinical and operational value. Day 1 focuses on strategy, architecture, and real‑world healthcare use cases, while Day 2 offers hands‑on workshops to apply those concepts through guided labs and agent‑powered solutions. Chicago Register May 27 Webinar: Unified Data Foundation for AI & Analytics - Leveraging OneLake and Microsoft Fabric This session shows how organizations can simplify fragmented data architectures by using Microsoft Fabric and OneLake as a single, governed foundation for analytics and AI. Virtual Register June 3-4 In Person Workshop: Driving AI‑Powered Healthcare: Advanced Analytics, AI, and Real‑World Impact Attend this two‑day, in‑person event to learn how healthcare organizations use Microsoft Fabric to unify data, accelerate AI adoption, and deliver measurable clinical and operational value. Day 1 focuses on strategy, architecture, and real‑world healthcare use cases, while Day 2 offers hands‑on workshops to apply those concepts through guided labs and agent‑powered solutions. New York Register June 10 Webinar: From Data to Decisions: How AI Data Agents in Microsoft Fabric Redefine Analytics Join us to learn how Fabric Data Agents enable users to interact with enterprise data through AI‑powered, governed agents that understand both data and business context. Virtual Register June 17 Webinar: Building the Intelligent Factory: A Unified Data and AI Approach to Life Science Manufacturing Discover how life science & MedTech manufacturers use Microsoft Fabric to integrate operational, quality, and enterprise data and apply AI‑powered analytics for smarter, faster manufacturing decisions. Virtual Register June 23-24 In Person Workshop: Driving AI‑Powered Healthcare: Advanced Analytics, AI, and Real‑World Impact Attend this two‑day, in‑person event to learn how healthcare organizations use Microsoft Fabric to unify data, accelerate AI adoption, and deliver measurable clinical and operational value. Day 1 focuses on strategy, architecture, and real‑world healthcare use cases, while Day 2 offers hands‑on workshops to apply those concepts through guided labs and agent‑powered solutions. Dallas RegisterThe Agent Era Has Already Arrived in Healthcare. Are You Ready to Govern It?
Start here. Answer honestly. Right now, how many AI agents are running inside your organization? Who built them? Which patient data, claims information, or proprietary research are they configured to access? If your CISO walked into your office tomorrow and asked for a complete inventory of every agent in your enterprise, including each one's owner, the systems it is permitted to access, and the policies that govern how it operates, could you produce that inventory before lunch? When the analyst who built that clinical summarization agent moves to a new role next quarter, what happens to the agent? Does its access continue? Does anyone notice? If a regulator opened an audit tomorrow, could you prove that every AI agent operating in your environment is subject to the same lifecycle controls, identity standards, and data protection policies you apply to your human workforce? Could you disable a compromised agent enterprise-wide with a single click, the same way you would revoke a lost access credential? If those questions made you hesitate, you are not alone. Almost no healthcare or life sciences organization can answer them confidently today. And that gap is exactly where the next decade of risk, and the next decade of competitive advantage, will be decided. The quiet crisis nobody talks about yet Healthcare and life sciences leaders are caught in a paradox. You need AI to survive the operational pressures squeezing your organization from every direction. Physician burnout is at crisis levels, with 45.2% of US physicians reporting symptoms in recent Mayo Clinic research. Revenue cycle complexity continues to climb, and McKinsey now estimates that the cost to collect consumes 30 to 60 percent of net patient revenue at many provider organizations. Prior authorization backlogs delay care. Clinical trial timelines stretch into years. Documentation burden eats hours that belong to patients. So you started piloting Microsoft 365 Copilot. You experimented with agents in Copilot Studio. Maybe a clinical team built an agent to draft discharge summaries. A revenue cycle group spun up an agent to triage denials. A medical affairs team built one to comb through literature. Each one delivered value. Each one was approved on its own merits. And then a quiet thing happened. You lost track of how many agents you have. According to KPMG's AI Quarterly Pulse Survey, 88 percent of organizations are now exploring or piloting AI agents. IDC projects that 1.3 billion agents will be in operation by 2028. Inside your own walls, the number is climbing fast. Each new agent is a digital identity that authenticates into your environment, accesses your data, and executes work on behalf of your business. Most have no formal owner. Most have no documented access scope. Most have no decommissioning plan. Most have never been reviewed by Compliance. Microsoft's 2024 Data Security Index found that 84 percent of organizations lack confidence in their AI data security posture, and 40 percent have already experienced an AI related data security incident. That is not a future problem. That is a now problem. If shadow IT was the defining governance challenge of the last decade, agent sprawl is the defining challenge of this one. And in healthcare and life sciences, where ePHI, member PII, and proprietary clinical trial data are at stake, the consequences are not theoretical. They are existential. The reframe that changes everything Here is the counterintuitive truth that separates HLS organizations that scale AI from those stuck in pilot purgatory. Governance is not the brake on AI adoption. Governance is the accelerator. When security, identity, and agent oversight are engineered in from day one, your teams stop tiptoeing. They build with confidence because the guardrails are real. They expand into clinical use cases because Compliance trusts the foundation. They scale wall-to-wall because IT can prove every agent is accounted for. The organizations that lead with trust end up moving faster in the long run, not slower. This is the bet behind Microsoft Agent 365 and Microsoft 365 E7. What Agent 365 and Microsoft 365 E7 actually are Microsoft 365 E7, announced March 6, 2026 and now generally available, is the Frontier Suite. It is Microsoft's answer to a single question that every healthcare CIO, CISO, and COO is wrestling with: how do you run AI safely, at scale, across an entire organization? E7 is not another SKU on top of your existing stack. It is one cohesive platform that brings together four essential capabilities: Microsoft 365 E5 for your enterprise productivity, collaboration, and security foundation, including Microsoft Defender, Microsoft Purview, and Microsoft Intune. Microsoft 365 Copilot for AI grounded in your organizational data through Work IQ, embedded in the flow of work for clinicians, researchers, operations teams, and administrators. Microsoft Entra Suite for identity governance, Conditional Access, and Zero Trust network access, extended consistently across users, applications, and AI agents. Microsoft Agent 365 as the centralized control plane to observe, govern, and secure every AI agent, whether built by Microsoft, your internal teams, or external partners. Agent 365 is also available as a standalone capability. But the magic happens when it works alongside the rest of E7, because that is where AI, identity, security, and governance stop being separate disciplines and become one operating system for the agentic era. The mental model that unlocks everything: agents are first-class digital identities Here is the simplest way to understand what Agent 365 does. Microsoft 365 governs your enterprise identities. Agent 365 governs your agent identities. The same control plane disciplines apply to both. Think about the rigor you apply to any privileged identity in your environment, whether a service account, an API integration, or a third-party application connector. You issue it a unique identity in Microsoft Entra. You assign a human owner who is accountable. You scope its access to least privilege. You apply DLP, sensitivity labels, and Conditional Access. You monitor for anomalous behavior. You have a documented decommissioning path. Identities that no one watches over become identities that get exploited. Now ask yourself how the last AI agent in your environment was created. The honest answer at most organizations: someone opened Copilot Studio, pointed it at a SharePoint library of clinical protocols, gave it a name, and moved on. No documented owner. No access review. No retirement plan. Compliance was never consulted. You would never stand up a privileged service account that way. Yet that is exactly how most organizations are standing up the fastest-growing class of digital identities in their environment. Agent 365 closes that gap by extending the identity, security, and lifecycle controls you already trust for users and applications so they apply with the same rigor to AI agents. Every agent receives a unique Entra Agent ID, a first-class identity in Azure AD with the same governance primitives as any other privileged identity. Every agent has a designated human owner who is accountable for its scope and behavior. Access is granted explicitly through Conditional Access and policy templates, so each agent operates only against the resources its purpose requires. Microsoft Purview DLP and sensitivity labels govern which data the agent is permitted to read, generate, or share. Microsoft Defender monitors agent activity for anomalies and surfaces alerts the same way it does for any other identity-driven risk. Lifecycle rules flag or auto-retire agents that are dormant, orphaned, or risky, eliminating the unowned automations that quietly accumulate in every enterprise. This is not metaphor. It is the actual architecture. The fastest path to governing agents is to extend the identity infrastructure you already trust. The three pillars of Agent 365: Observe, Govern, Secure Pillar 1: Observe. Know what is actually happening. You cannot govern what you cannot see. The first job of Agent 365 is to give you complete, continuous visibility into every AI agent operating in your environment. The Agent Registry is the single authoritative inventory of every agent, whether built by Microsoft, custom developed by your team, deployed by a partner, or discovered as a shadow agent operating without oversight. Each entry shows the owner, purpose, capabilities, lifecycle status, and business context. Agent Analytics tracks adoption, quality, performance, and business impact. Agent Map visualizes how agents connect with other agents, people, tools, and data sources, surfacing dependencies and risk concentrations you would never spot in a spreadsheet. Real time monitoring flows directly into Microsoft Defender, so unusual agent behavior generates alerts the same way unusual user behavior does today. For a health system CISO, that means finally being able to answer the question: which agents are touching ePHI, and is every one of them authorized? For a life sciences compliance officer, it means audit ready visibility into every AI system operating across R&D, regulatory affairs, and commercial. For a payer operations leader, it means knowing which claims processing agents are actually delivering accuracy and throughput, and which are quietly underperforming. Pillar 2: Govern. Set the rules. Control the lifecycle. Visibility is the start. Control is what turns visibility into outcomes. Agent 365 ensures that every agent is approved, compliant, and accountable from creation through retirement. IT led onboarding workflows make sure each agent launches with the right identity, access, and ownership before it ever touches data. Policy templates enforce data handling, permission, and usage rules consistently from day one through Defender, Entra, and Purview. Rules based agent management gives admins an automated If This Then That interface. If an agent is unused for 90 days, auto retire it. If an agent is flagged as risky, block it and alert the security operations team. No human in the loop required for the routine cases, full alerting and override for the exceptions. Ownership enforcement requires every agent to have a designated human owner. When that owner leaves the organization, the platform flags the orphaned agent for bulk reassignment, so nothing operates without clear accountability. The Tools Gateway brokers and audits tool access for agents, enabling least privilege at the action level, not just the identity level. For HLS specifically, that translates to outcomes you can take to your board. A hospital CIO can ensure any agent touching Epic or Cerner goes through standardized approval. A pharma IT director can enforce that clinical trial matching agents only touch de identified data unless elevated permissions are explicitly granted and documented. A payer compliance team can automatically retire agents tied to a completed open enrollment campaign instead of letting them silently expand the attack surface. Pillar 3: Secure. Protect agents and data with the stack you already trust. The final pillar is what makes Agent 365 production grade for healthcare and life sciences. Security and compliance are not bolted on. They are the same proven Microsoft security stack you already run for your users, extended natively to agents. Microsoft Purview, your data security and compliance backbone: Data Security Posture Management for AI gives visibility into how agents interact with sensitive data and detects risky usage patterns. Data Loss Prevention stops agents from accessing or processing files labeled Highly Confidential, even when a user prompts them to. Sensitivity labels are inherited automatically by agent outputs, governing how data is viewed, extracted, or shared downstream. Insider Risk Management detects risky behavior by users interacting with agents, such as unusual prompt patterns or excessive access to sensitive data. Communication Compliance monitors AI driven interactions for regulatory or ethical violations and unauthorized disclosures. eDiscovery and Audit logs every agent interaction, giving legal, compliance, and IT teams the transparency required for HIPAA, GDPR, and FDA 21 CFR Part 11. Oversharing Assessments run weekly checks for sensitive data exposure across SharePoint sites and agent access patterns. Microsoft Entra, your identity control plane: Entra Agent ID gives every agent a unique identity in Azure AD, so Conditional Access, role based access, and risk based policies apply individually. Conditional Access for agents enforces policies like only allow this prior authorization agent to access claims data from approved devices and locations during business hours. Identity Governance provides access packages for agents with reduced scope permissions and least privilege defaults. Block at Scale lets you instantly disable all high-risk agents from Entra in a single action. Microsoft Defender, your threat protection layer: Security Posture Management identifies and remediates agent misconfigurations, such as agents running with no authentication. Threat Detection and Blocking monitors suspicious agent activity, generates alerts, and blocks unauthorized tool invocations. Threat Investigation and Hunting collects unified agent observability logs so SOC teams can forensically trace every action an agent took. One Click Kill Switch instantly disables any agent and surfaces the complete audit trail of every action it took before being stopped. For a hospital security operations team, that means the same DLP policies protecting patient records in email and Teams now protect agents that summarize clinical notes. For a life sciences data protection officer, it means agents accessing proprietary compound data respect the same sensitivity labels as human researchers. For a payer CISO, it means an anomalous claims agent can be killed in seconds, with a complete forensic record of every member record it touched. Why this only works as an integrated platform Individual capabilities are useful. Integration is what makes them transformative. Here is the contrast HLS leaders feel today versus what changes the moment E7 lights up. Without an integrated platform, you operate with: Fragmented tools for identity, security, compliance, and AI, each with its own console and its own gaps. No centralized agent inventory, forcing your IT and security teams to track bots and automations in spreadsheets. Inconsistent policy enforcement across agents, creating compliance gaps every audit team will eventually find. Blind spots where agents access data, invoke tools, or interact with other agents without any oversight. Manual triage when an incident hits, because nothing connects user identity, agent identity, and data classification in one view. With Microsoft 365 E7, you gain: A Unified Agent Registry providing a single source of truth for every agent, whether Microsoft built, custom developed, partner deployed, or shadow discovered. Entra Agent ID giving each agent a unique identity, so Conditional Access, role based access, and risk based policies apply at the individual agent level. Full lifecycle governance with standardized onboarding, periodic review, ownership transfers, auto retirement of dormant agents, and structured offboarding. Policy by design, where Purview DLP, sensitivity labels, and compliance rules extend to all agent interactions through pre built templates applied consistently from day one. One click disable to instantly freeze any agent, with Defender threat detection extended to agents and full audit trails for forensic investigation. Expanded threat coverage that addresses agent sprawl, overprivileged access, tool misuse, misconfiguration, and inter agent risk patterns no legacy tool was designed to see. Shared registry and controls that let IT, Security, and Compliance reference the same authoritative inventory across Defender, Entra, and Purview, eliminating the silos that slow incident response. This is the reason E7 exists as a platform, not a bundle. AI, identity, security, and governance stop being separate disciplines and start operating as one system. What this is actually worth: the Forrester numbers Microsoft commissioned Forrester to conduct a Total Economic Impact study of Microsoft 365 Copilot, published in March 2025. The composite organization in that study, modeled on real customer interviews, achieved: 132 percent three-year ROI with payback in under one year. 9 hours saved per Copilot user per month through automation of routine work like drafting, summarizing, and analysis. Up to 2.6 percent top line revenue lift through better qualified opportunities, improved win rates, and stronger retention in customer facing teams. 25 percent acceleration in new employee onboarding as new hires ramp faster on summarized institutional knowledge. Those are the verified numbers. The bigger story for HLS is what they look like when applied to clinical, claims, and research workflows where every reclaimed hour is an hour that goes back to patients, members, or science. AI is already defending AI The same agentic capabilities transforming clinical and operational workflows are now embedded in your security stack. Microsoft Security Copilot agents work alongside human analysts inside Defender, Entra, Purview, and Intune, accelerating threat response and absorbing the manual load that today drowns most security operations teams. Independent benchmarks back the impact. In a 162 admin randomized study published in 2025, the Conditional Access Optimization Agent in Microsoft Entra completed configuration tasks 43 percent faster and produced 48 percent more accurate Conditional Access policies than admins working without it. Security triage, alert investigation, and identity hygiene are following the same trajectory. For HLS security teams already stretched thin, that is hours reclaimed every week to focus on the threats that actually matter, with the same Agent 365 governance applying to the security agents themselves. The defenders are governed by the same rules as the workforce they defend. How HLS organizations are putting Agent 365 to work Here is how the value shows up across the three biggest HLS segments. For providers: reclaiming time for care The challenge: clinicians spend more time on documentation than on patients. Care coordination is fragmented. Burnout is gutting retention. The strategy: deploy agents that absorb administrative load while Agent 365 ensures every one of them respects ePHI boundaries. Clinical documentation agents integrated with Microsoft Dragon Copilot structure dictation against EHR requirements, apply billing codes, and flag missing elements before submission. Care coordination agents generate care plans, allocate tasks, and surface relevant patient context during multidisciplinary rounds, optimized for HL7 FHIR interoperability. Patient intake and scheduling agents built in Copilot Studio handle appointment booking, reminders, eligibility verification, and referral management. Handoff and shift summary agents pull from multiple systems to generate complete handoff summaries for nurses and physicians transitioning between shifts, reducing communication gaps that drive adverse events. The aha moment: applied across a 10,000 employee health system, nine hours per user per month is more than one million reclaimed hours a year. That is the equivalent of hundreds of full time clinicians, returned to direct patient care, with every agent governed under the same Conditional Access and DLP policies your IT team already manages today. For payers: transforming revenue cycle and member experience The challenge: prior auth backlogs delay care. Denial rates climb. Member services teams drown in volume. The strategy: agentic AI rewires the most expensive, most manual workflows in your operation while Agent 365 keeps every agent inside the lines on member PII. Prior authorization agents autonomously gather clinical documentation, cross reference medical policy, determine approval criteria, and route decisions, accelerating turnaround from days to hours. Claims processing agents automate billing and denial management. With cost to collect running 30 to 60 percent of net patient revenue at many organizations, even modest automation produces material margin recovery. Denial resolution and appeals agents analyze denial patterns, surface root causes, generate appeal documentation, and track success rates over time, turning a cost center into a continuous improvement engine. Member services agents integrated with Microsoft 365 Copilot Chat handle benefits inquiries, claims status, and self service triage, deflecting call volume and improving first contact resolution. Fraud detection and risk adjustment agents scan claims data for anomalies and optimize coding accuracy for Medicare Advantage and ACA populations. The aha moment: a payer CISO can disable an anomalous prior auth agent in one click and produce a complete forensic record of every member record it accessed, while Compliance simultaneously confirms the agent never violated DLP. That is regulatory readiness that legacy automation cannot deliver. For life sciences and pharma: accelerating discovery and commercialization The challenge: clinical trials take years. Regulatory submissions consume teams. Medical affairs cannot keep up with literature volume. The strategy: orchestrate agents across R&D, regulatory, medical, and commercial, with Agent 365 enforcing the data classification rules that proprietary IP and clinical data demand. Clinical trial matching agents scan patient profiles and eligibility criteria to surface trial opportunities, accelerating recruitment. Regulatory document preparation agents assemble submissions, cross reference data across modules, and ensure consistency in FDA, EMA, and global filings. Medical research and literature review agents powered by Microsoft GraphRAG retrieve research backed insights with verified source references, giving medical science liaisons trustworthy synthesis on demand. Pharmacovigilance agents monitor safety databases, flag potential adverse events, and generate timely case reports. Commercial insights and launch planning agents synthesize market data, payer policy, and HCP sentiment for sharper launch and field strategy. The aha moment: cutting even three months off a regulatory cycle on a single high revenue product can mean tens of millions in additional sales, while Purview sensitivity labels guarantee every agent accessing proprietary compound data respects the same data classification as your senior researchers. A phased path that actually works in regulated industries In regulated industries, a big bang AI rollout is a recipe for incidents. The HLS organizations getting this right are following a five-phase pattern that builds expertise and validates governance before scale. Establish. Form a cross-functional champion team across IT, Compliance, Clinical Operations, and Research. Define what risks you are mitigating and what outcomes you are unlocking. Inventory the agents already in flight. Configure. Stand up identity, DLP, and policy templates in Microsoft 365 Admin Center, Power Platform Admin Center, and Microsoft Purview. Enforce that any agent handling PHI runs in a secure environment with audit logging on by default. Pilot. Choose a small group of makers in a controlled environment. Start with non-critical workflows like internal reporting or scheduling before moving to clinical or member facing use cases. Run weekly reviews with Compliance and Security. Empower. Launch role specific training for clinicians, researchers, makers, and IT. Stand up a Center of Excellence to provide templates, best practices, and reusable patterns. Promote success stories internally to build momentum. Scale. Expand agent development across departments with governance as a guardrail, not a gate. Use pay as you go metering to track usage and optimize licensing. Refine policies continuously based on Purview signals and audit results. The strategic insight: organizations that lead with governance reach scale faster than those that lead with experimentation. Trust is the unlock, not the obstacle. Governance is a team sport Here is the pattern we see again and again. The HLS organizations that succeed with AI at scale are not the ones with the smartest IT shop or the boldest Compliance officer. They are the ones whose IT, Security, Compliance, Clinical, Research, and Operations leaders sit at the same table on agent strategy from week one. Agent 365 was designed for that table. The Agent Registry is the shared truth. Purview policies satisfy your Compliance officer. Entra controls reassure your CISO. The lifecycle workflows give your CIO confidence. The clinical and research outcomes give your COO and Chief Medical Officer the business case. Everyone gets the view they need from the same single source. Stand up an agent governance council. Meet every two weeks. Use the Agent Registry as your standing agenda. Make decisions in plain sight. The organizations that do this consistently outperform on both speed and safety. The ones that try to keep AI inside a single function fall behind on both. Who contributes what Think back to the mental model. You would never let a single function authorize, configure, and oversee a new privileged system on its own, not when it touches ePHI, claims, or proprietary research. Security, IT, Compliance, Clinical, and the relevant business owner all weigh in because the stakes are too high for any one seat to carry alone. Agent governance demands the same multidisciplinary scrutiny, and the council is where that happens. Each seat brings something the others cannot. CIO. Owns the agent strategy and the platform investment. Translates board-level AI ambition into an operating model the rest of the organization can execute against. CISO and Security Operations. Define agent identity standards, Conditional Access policies, and incident response playbooks. Without this seat, an anomalous agent touching ePHI becomes a breach instead of a contained event. Chief Compliance Officer and Privacy. Translate HIPAA, GDPR, FDA 21 CFR Part 11, and state regulations into Purview policies and audit requirements. This is the seat that keeps you out of an OCR investigation or a 483 letter. Chief Medical Officer and Clinical Operations. Validate that clinical agents are safe, accurate, and aligned with care standards. Own the clinical risk review for any agent that touches patient care, the same way you would for a new clinical protocol. Chief Research Officer or Head of R&D. Govern how agents interact with proprietary trial data, compound libraries, and scientific IP. The seat that protects the next decade of pipeline value. COO and Revenue Cycle Leadership. Prioritize the operational workflows where agents will move the needle on cost to collect, denial rates, and throughput, and own the business outcomes that justify the investment. Center of Excellence Lead. Maintains templates, reusable patterns, and maker enablement. Turns every council decision into a guardrail builders can actually use the next morning. Frontline champions. Clinicians, claims specialists, and researchers who pilot, give feedback, and carry credibility back to their peers. The seat that decides whether agents get adopted or quietly ignored. When every one of these voices is in the room, your governance council operates like a tumor board for AI. Different lenses, one shared decision, full accountability. That is how regulated industries make complex calls safely, and it is exactly the muscle Agent 365 was built to support. Seven questions to bring to your next leadership meeting If you want to know whether your organization is ready, run through these together. The places you hesitate are exactly where Agent 365 and E7 deliver the most value. Visibility. Do you know which AI agents, bots, and automations are running in your environment today, who built them, what they have access to, and whether they are still needed? Control. If someone on your team builds a new AI agent tomorrow, what is the actual process to make sure it is approved and secured? Or could they deploy it with wide open access? Security. What prevents an AI agent from reading or transmitting patient data it should not? Do you have a way to detect and stop a rogue or compromised agent? Accountability. Who owns the outputs of an AI agent's actions? What is the offboarding process when the agent or its creator leaves? Scale. Six months from now, you may have a hundred agents deployed across departments. Are your oversight and compliance structures ready for that volume? Cross-functional alignment. How are your IT, Security, and Compliance teams partnering on AI today? Governance is a team sport. Data readiness. How confident are you that your data estate is clean, labeled, and governed well enough for AI to surface accurate answers and not outdated or conflicting information? If you hesitated on even one of those, you have just identified where Agent 365 and Microsoft 365 E7 will pay for themselves the fastest. The path forward Here is the honest truth. The healthcare and life sciences organizations that lead in the next decade will not be the ones that adopted AI first. They will be the ones that adopted AI safely, compliantly, and at scale, with intelligence and trust woven into every layer. Microsoft Agent 365 and Microsoft 365 E7 give you the only integrated platform that brings AI, identity, security, and governance into one cohesive system, running in the flow of work you already use. This is not about adding another tool to your stack. It is about extending the investments you have already made in Microsoft 365, Entra, Defender, and Purview to cover the fastest-growing class of digital identities in your environment. The agent era has already arrived. The question is whether you will govern it with confidence or chase it with anxiety. We would love to help you lead. Take the next step Explore Microsoft Agent 365: The Control Plane for Agents Microsoft Entra Agent ID: aka.ms/EntraAgentID Learn more about Microsoft 365 E7, the Frontier Suite: Introducing Microsoft 365 E7 See Microsoft 365 Copilot in action: Microsoft 365 Copilot Read the Forrester TEI study: The Total Economic Impact of Microsoft 365 CopilotHealthcare Agent Orchestrator: Multi-agent Framework for Domain-Specific Decision Support
At Microsoft Build, we introduced the Healthcare Agent Orchestrator, now available in Azure AI Foundry Agent Catalog . In this blog, we unpack the science: how we structured the architecture, curated real tumor board data, and built robust agent coordination that brings AI into real healthcare workflows. Healthcare Agent Orchestrator assisting a simulated tumor board meeting. Introduction Healthcare is inherently collaborative. Critical decisions often require input from multiple specialists—radiologists, pathologists, oncologists, and geneticists—working together to deliver the best outcomes for patients. Yet most AI systems today are designed around narrow tasks or single-agent architectures, failing to reflect the real-world teamwork that defines healthcare practice. That’s why we developed the Healthcare Agent Orchestrator: an orchestrator and code sample built around Microsoft’s industry-leading healthcare AI models, designed to support reasoning and multidisciplinary collaboration -- enabling modular, interpretable AI workflows that mirror how healthcare teams actually work. The orchestrator brings together Microsoft healthcare AI models—such as MedImageParse for image recognition, CXRReportGen for automated radiology reporting, and MedImageInsight for retrieval and similarity analysis—into a unified, task-aware system that enables developers to build an agent that reflects real-word healthcare decision making pattern. This work was led by Yu (Aiden) Gu, Principal Applied Scientist at Microsoft Research, who conceived the study, defined the research direction, and led the design and development of the Healthcare Agent Orchestrator proof-of-concept. Healthcare Is Naturally Multi-Agent Healthcare decision-making often requires synthesizing diverse data types—radiologic images, pathology slides, genetic markers, and unstructured clinical narratives—while reconciling differing expert perspectives. In a molecular tumor board, for instance, a radiologist might highlight a suspicious lesion on CT imaging, a pathologist may flag discordant biopsy findings, and a geneticist could identify a mutation pointing toward an alternate treatment path. Effective collaboration in these settings hinges not on isolated analysis, but on structured dialogue—where evidence is surfaced, assumptions are challenged, and hypotheses are iteratively refined. To support the development of healthcare agent orchestrator, we partnered with a leading healthcare provider organization, who independently curated and de-identified a proprietary dataset comprising longitudinal patient records and real tumor board transcripts—capturing the complexity of multidisciplinary discussions. We provided guidance on data types most relevant for evaluating agent coordination, reasoning handoffs, and task alignment in collaborative settings. We then applied LLM-based structuring techniques to convert de-identified free-form transcripts into interpretable units, followed by expert review to ensure domain fidelity and relevance. This dataset provides a critical foundation for assessing agent coordination, reasoning handoffs, and task alignment in simulated collaborative settings. Why General-Purpose LLMs Fall Short for Healthcare Collaboration While general-purpose large language models have delivered remarkable results in many domains, they face key limitations in high-stakes healthcare environments: Precision is critical: Even small hallucinations or inconsistencies can compromise safety and decision quality Multi-modal integration is required: Many healthcare decisions involve interpreting and correlating diverse data types—images, reports, structured records—much of which is not available in public training sets Transparency and traceability matter: Users must understand how conclusions are formed and be able to audit intermediate steps The Healthcare Agent Orchestrator addresses these challenges by pairing general reasoning capabilities with specialized agents that operate over imaging, genomics, and structured EHRs—ensuring grounded, explainable results aligned with clinical expectations. Each agent contributes domain-specific expertise, while the orchestrator ensures coherence, oversight, and explainability—resulting in outputs that are both grounded and verifiable. Architecture: Coordinating Specialists Through Orchestration Healthcare Agent Orchestrator. Healthcare Agent Orchestrator’s multi-agent framework is built on modular AI infrastructure, designed for secure, scalable collaboration: Semantic Kernel: A lightweight, open-source development kit for building AI agents and integrating the latest AI models into C#, Python, or Java codebases. It acts as efficient middleware for rapidly delivering enterprise-grade solutions—modular, extensible, and designed to support responsible AI at scale. Model Context Protocol (MCP): an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. Magentic-One: Microsoft’s generalist multi-agent system for solving open-ended web and file-based tasks across domains—built on Microsoft AutoGen, our popular open-source framework for developing multi-agent applications. Each agent is orchestrated within the system and integrated via Semantic Kernel’s group chat infrastructure, with support for communication and modular deployment via Azure. This orchestration ensures that each model—whether interpreting a lung nodule, analyzing a biopsy image, or summarizing a genomic variant—is applied precisely where its expertise is most relevant, without overloading a single system with every task. The modularity of the framework also future-proofs: as new health AI models and tools emerge, they can be seamlessly incorporated into the ecosystem without disrupting existing workflows—enabling continuous innovation while maintaining clinical stability. Microsoft’s healthcare AI models at the Core Healthcare agent orchestrator also enables developers to explore the capabilities of Microsoft’s latest healthcare AI models: CXRReportGen: Integrates multimodal inputs—including current and prior X-ray images and report context—to generate grounded, interpretable radiology reports. The model has shown improved accuracy and transparency in automated chest X-ray interpretation, evaluated on both public and private data. MedImageParse 3 : A biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalities. MedImageInsight 4 : Facilitates fast retrieval of clinically similar cases, supports disease classification across broad range of medical image modalities, accelerating second opinion generation and diagnostic review workflows. Each model has the ability to act as a specialized agent within the system, contributing focused expertise while allowing flexible, context-aware collaboration orchestrated at the system level. CXRReportGen is included in the initial release and supports the development and testing of grounded radiology report generation. Other Microsoft healthcare models such as MedImageParse and MedImageInsight are being explored in internal prototypes to expand the orchestrator’s capabilities across segmentation, detection, and image retrieval tasks. Seamless Integration with Microsoft Teams Rather than creating new silos, Healthcare Agent Orchestrator integrates directly into the tools clinicians already use—specifically Microsoft Teams. Developers are investigating how clinicians can engage with agents through natural conversation, asking questions, requesting second opinions, or cross-validating findings—all without leaving their primary collaboration environment. This approach minimizes friction, improves user experience, and brings cutting-edge AI into real-world care settings. Building Toward Robust, Trustworthy Multi-Agent Collaboration Think of the orchestrator as managing a secure, structured group chat. Each participant is a specialized AI agent—such as a ‘Radiology’ agent, ‘PatientHistory’ agent, or 'ClinicalTrials‘ agent. At the center is the ‘Orchestrator’ agent, which moderates the interaction: assigning tasks, maintaining shared context, and resolving conflicting outputs. Agents can also communicate directly with one another, exchanging intermediate results or clarifying inputs. Meanwhile, the user can engage either with the orchestrator or with specific agents as needed. Each agent is configured with instructions (the system prompt that guides its reasoning), and a description (used by both the UI and the orchestrator to determine when the agent should be activated). For example, the Radiology agent is paired with the cxr_report_gen tool, which wraps Microsoft’s CXRReportGen model for generating findings from chest X-ray images. Tools like this are declared under the agent’s tools field and allow it to call foundation models or other capabilities on demand—such as the clinical_trials tool 5 for querying ClinicalTrials.gov. Only one agent is marked as facilitator, designating it as the moderator of the conversation; in this scenario, the Orchestrator agent fills that role. Early observations highlight that multi-agent orchestration introduces new complexities—even as it improves specialization and task alignment. To address these emergent challenges, we are actively evolving the framework across several dimensions: Mitigating Error Propagation Across Agents: Ensuring that early-stage errors by one agent do not cascade unchecked through subsequent reasoning steps. This includes introducing critical checkpoints where outputs from key agents are verified before being consumed by others. Optimizing Agent Selection and Specialization: Recognizing that more agents are not always better. Adding unnecessary or redundant agents can introduce noise and confusion. We’ve implemented a systematic framework that emphasizes a few highly suited agents per task —dynamically selected based on case complexity and domain needs—while continuously tracking performance gains and catching regressions early. Improving Transparency and Hand-off Clarity: Structuring agent interactions to make intermediate outputs and rationales visible, enabling developers (and the system itself) to trace how conclusions were reached, catch inconsistencies early, and intervene when necessary. Adapting General Frameworks for Healthcare Complexity Generic orchestration frameworks like Semantic Kernel provide a strong foundation—but healthcare demands more. The stakes are higher, the data more nuanced, and the workflows require precision, traceability, and regulatory compliance. Here’s how we’ve extended and adapted these systems to help address healthcare demands: Precision and Safety: We introduced domain-aware verification checkpoints and task-specific agent constraints to reduce inappropriate tool usage—supporting more reliable reasoning. To help uphold the high standards required in healthcare, we defined two complementary metric systems (Check Healthcare Agent Orchestrator Evaluation for more details): Core Metrics: monitor health agents selection accuracy, intent resolution, contextual relevance, and information aggregation RoughMetric: a composite score based on ROUGE that helps quantify the precision of generated outputs and conversation reliability. TBFact: A modified version of RadFact 2 that measures factuality of claims in agents' messages and helps identifying omissions and hallucination Domain-Specific Tool Planning: Healthcare agents must reason across multimodal inputs—such as chest X-rays, CT slices, pathology images, and structured EHRs. We’ve customized Semantic Kernel’s tool invocation and planning modules to reflect clinical workflows, not generic task chains. These infrastructure-level adaptations are designed to complement Microsoft Healthcare AI models—such as CXRReportGen, MedImageParse, and MedImageInsight—working together to enable coordinated, domain-aware reasoning across complex healthcare tasks. Enabling Collaborative, Trustworthy AI in Healthcare Healthcare demands AI systems that are as collaborative, adaptive, and trustworthy as the clinical teams they aim to support. The Healthcare Agent Orchestrator is a concrete step toward that vision—pairing specialized health AI models with a flexible, multi-agent coordination framework, purpose-built to reflect the complexity of real clinical decision-making. By aligning with existing healthcare workflows and enabling transparent, role-specific collaboration, this system shows promise to empower clinicians to work more effectively—with AI as a partner, not a replacement. Healthcare Multi-Agent Orchestrator and the Microsoft healthcare AI models are intended for research and development use. Healthcare Multi-Agent Orchestrator and the healthcare AI models not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and its performance for such purposes has not been established. You bear sole responsibility and liability for any use of Healthcare Multi-Agent Orchestrator or the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. 1 arXiv, Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale, February 2, 2025 2 arXiv, MAIRA-2: Grounded Radiology Report Generation, June 6, 2024 3 Nature Method, A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities, Nov 18, 2024 4 arXiv, Medimageinsight: An open-source embedding model for general domain medical imaging, Oct 9, 2024 5 Machine Learning for Healthcare Conference, Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology, August 4, 20237.9KViews2likes1CommentManaging data sharing and access in healthcare systems
I am looking for general guidance on how healthcare teams manage data sharing and user access across different systems. I am interested in understanding common approaches for keeping data secure while still allowing the right staff to access what they need. This is more about best practices and real-world experience rather than a specific product issue. Any insights from similar healthcare environments would be helpful.217Views0likes3CommentsModernizing Digital Health Record Governance with Microsoft Entra Identity Governance
With Entra Identity Governance Microsoft provides cloud-driven identity lifecycle automation, application provisioning, entitlement management, and access reviews that can be applied to users, guests, agents, groups, and enterprise applications—including EHR systems like Epic, Oracle Health (Cerner), and Meditech.Implementing Disaster Recovery for Azure App Service Web Applications
Starting March 31, 2025, Microsoft will no longer automatically place Azure App Service web applications in disaster recovery mode in the event of a regional disaster. This change emphasizes the importance of implementing robust disaster recovery (DR) strategies to ensure the continuity and resilience of your web applications. Here’s what you need to know and how you can prepare. Understanding the Change Azure App Service has been a reliable platform for hosting web applications, REST APIs, and mobile backends, offering features like load balancing, autoscaling, and automated management. However, beginning March 31, 2025, in the event of a regional disaster, Azure will not automatically place your web applications in disaster recovery mode. This means that you, as a developer or IT professional, need to proactively implement disaster recovery techniques to safeguard your applications and data. Why This Matters Disasters, whether natural or technical, can strike without warning, potentially causing significant downtime and data loss. By taking control of your disaster recovery strategy, you can minimize the impact of such events on your business operations. Implementing a robust DR plan ensures that your applications remain available and your data remains intact, even in the face of regional outages. Common Disaster Recovery Techniques To prepare for this change, consider the following commonly used disaster recovery techniques: Multi-Region Deployment: Deploy your web applications across multiple Azure regions. This approach ensures that if one region goes down, your application can continue to run in another region. You can use Azure Traffic Manager or Azure Front Door to route traffic to the healthy region. Multi-region load balancing with Traffic Manager and Application Gateway Highly available multi-region web app Regular Backups: Implement regular backups of your application data and configurations. Azure App Service provides built-in backup and restore capabilities that you can schedule to run automatically. Back up an app in App Service How to automatically backup App Service & Function App configurations Active-Active or Active-Passive Configuration: Set up your applications in an active-active or active-passive configuration. In an active-active setup, both regions handle traffic simultaneously, providing high availability. In an active-passive setup, the secondary region remains on standby and takes over only if the primary region fails. About active-active VPN gateways Design highly available gateway connectivity Automated Failover: Use automated failover mechanisms to switch traffic to a secondary region seamlessly. This can be achieved using Azure Site Recovery or custom scripts that detect failures and initiate failover processes. Add Azure Automation runbooks to Site Recovery recovery plans Create and customize recovery plans in Azure Site Recovery Monitoring and Alerts: Implement comprehensive monitoring and alerting to detect issues early and respond promptly. Azure Monitor and Application Insights can help you track the health and performance of your applications. Overview of Azure Monitor alerts Application Insights OpenTelemetry overview Steps to Implement a Disaster Recovery Plan Assess Your Current Setup: Identify all the resources your application depends on, including databases, storage accounts, and networking components. Choose a DR Strategy: Based on your business requirements, choose a suitable disaster recovery strategy (e.g., multi-region deployment, active-active configuration). Configure Backups: Set up regular backups for your application data and configurations. Test Your DR Plan: Regularly test your disaster recovery plan to ensure it works as expected. Simulate failover scenarios to validate that your applications can recover quickly. Document and Train: Document your disaster recovery procedures and train your team to execute them effectively. Conclusion While the upcoming change in Azure App Service’s disaster recovery policy may seem daunting, it also presents an opportunity to enhance the resilience of your web applications. By implementing robust disaster recovery techniques, you can ensure that your applications remain available and your data remains secure, no matter what challenges come your way. Start planning today to stay ahead of the curve and keep your applications running smoothly. Recover from region-wide failure - Azure App Service Reliability in Azure App Service Multi-Region App Service App Approaches for Disaster Recovery Feel free to share your thoughts or ask questions in the comments below. Let's build a resilient future together! 🚀Security Copilot Clinic: AI‑Driven Agentic Defense for Healthcare
Healthcare security teams are operating under unprecedented pressure. Ransomware continues to target clinical environments, identity‑based attacks are increasing in sophistication, and the risk of PHI exposure remains a constant concern — all while SOC teams face chronic staffing shortages. Microsoft Security Copilot is now available for organizations using Microsoft 365 E5, bringing generative AI assistance directly into the security tools healthcare teams already rely on. This clinic series is designed to show how Security Copilot changes day‑one operations — turning noisy alerts into clear, actionable investigations and faster containment. Why attend this clinic For healthcare CISOs, SOC leaders, and security architects, Security Copilot represents more than an AI assistant — it’s a shift in how investigations are conducted across endpoint, identity, email, data, and cloud workloads. In this session, you’ll see how Security Copilot helps healthcare security teams: Move faster with confidence by summarizing complex evidence across security signals Reduce investigation fatigue by standardizing analyst workflows Communicate risk clearly by translating technical findings into leadership‑ready insights Protect patient data without adding new tools or headcount All examples and demonstrations are grounded in real healthcare security scenarios. What we’ll explore See the full incident picture in one place Microsoft‑built Security Copilot agents embedded across Defender, Entra, Intune, and Purview automatically correlate signals from endpoint, identity, email, data, and cloud applications into a single investigation view — eliminating manual pivoting between tools. Move from alert to action faster Embedded agents analyze related signals in real time and surface prioritized investigation paths along with recommended containment actions directly in the analyst workflow. Standardize investigations and reduce noise Agent‑driven prompts and investigation structure help standardize analyst response, reduce alert fatigue, and create repeatable workflows that scale in lean SOC environments. Protect PHI and communicate risk with confidence Security Copilot uses embedded data and threat intelligence to produce leadership‑ready summaries that clearly articulate potential PHI exposure, attack progression, and business impact. Session format and audience Format 60‑minute live session End‑to‑end demo Interactive Q&A Who should attend CISOs and Security Leaders SOC Managers and Analysts Security and Cloud Architects Clinical IT and Infrastructure Leaders Upcoming sessions Date Time (ET) Registration March 13, 2026 12:00 – 1:00 PM Session #1 March 20, 2026 12:00 – 1:00 PM Session #2 March 27, 2026 12:00 – 1:00 PM Session #3 Secure healthcare — together Security Copilot enables healthcare organizations to respond faster, investigate smarter, and communicate risk more effectively — all within the Microsoft security ecosystem teams already trust. If you’re evaluating how AI‑driven, agentic defense can support your healthcare SOC, this clinic will give you practical insight you can apply immediately.Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval
Introduction Wound assessment and management are central tasks in clinical practice, requiring accurate documentation and timely decision-making. Clinicians and nurses often rely on visual inspection to evaluate wound characteristics such as size, color, tissue composition, and healing progress. However, when seeking comparable cases (e.g., to inform treatment choices, validate assessments, or support education), existing search methods have significant limitations. Traditional keyword-based systems require precise terminology, which may not align with the way wounds are described in practice. Moreover, textual descriptors cannot fully capture the variability of visual wound features, resulting in incomplete or imprecise retrieval. Recent advances in computer vision offer new opportunities to address these challenges through both image classification and image retrieval. Automated classification of wound images into clinically meaningful categories (e.g., wound type, tissue condition, infection status) can support standardized documentation and assist clinicians in making more consistent assessments. In parallel, image retrieval systems enable search based on visual similarity rather than textual input alone, allowing clinicians to query databases directly with wound images and retrieve cases with similar characteristics. Together, these AI-based functionalities have the potential to improve case comparison, facilitate consistent monitoring, and enhance clinical training by providing immediate access to relevant examples and structured decision support. The Data The WoundcareVQA dataset is a new multimodal multilingual dataset for Wound Care Visual Question Answering. The WoundcareVQA dataset is available at https://osf.io/xsj5u/ [1] Table 1 summarizes dataset statistics. WoundcareVQA contains 748 images associated with 447 instances (each instance/query includes one or more images). The dataset is split into training (279 instances, 449 images), validation (105 instances, 147 images), and test (93 instances, 152 images). The training set was annotated by a single expert, the validation set by two annotators, and the test set by three medical doctors. Each query is also labeled with wound metadata, covering seven categories: anatomic location (41 classes), wound type (8), wound thickness (6), tissue color (6), drainage amount (6), drainage type (5), and infection status (3). Table 1: Statistics about the WoundcareVQA Dataset We selected two tasks with the highest inter-annotator agreement: Wound Type Classification and Infection Detection (cf. Table 2). Table 3 lists the classification labels for these tasks. Table 2: Inter-Annotator Agreement in the WoundcareVQA Dataset Table 3: Classification Labels for the Tasks: Infection Detection & Wound Type Classification Methods 1. Foundation-Model-based Image Search This approach relies on an image similarity-based retrieval mechanism using a medical foundation model, MedImageInsight [2-3]. Specifically, it employs a k-nearest neighbors (k-NN) search to identify the top k training images most visually similar to a given query image. The image search system operates in two phases: Index Construction: Embeddings are extracted from all training images using a pretrained vision encoder (MedImageInsight). These embeddings are then indexed to enable efficient and scalable similarity search during retrieval. Query and Retrieval: At inference time, the test image is encoded to produce a query embedding. The system computes the Euclidean distances between this query vector and all indexed embeddings, retrieving the k nearest neighbors with the smallest distances. To address the computational demands of large-scale image datasets, the method leverages FAISS (Facebook AI Similarity Search), an open-source library designed for fast and scalable similarity search and clustering of high-dimensional vectors. 2. Vision-Language Models (VLMs) & Retrieval-Augmented Generation (RAG) We leverage vision-language models (e.g., GPT-4o, GPT-4.1), a recent class of multimodal foundation models capable of jointly reasoning over visual and textual inputs. These models can be used for wound assessment tasks due to their ability to interpret complex visual patterns in medical images while simultaneously understanding medical terminology. We evaluate three settings: Zero-shot: The model predicts directly from the query input without additional examples. Few-shot Prompting: A small number of examples (5) from the training dataset are randomly selected and embedded into the input prompt. These paired images and labels provide contextual cues that guide the model's interpretation of new inputs. Retrieval-Augmented Generation (RAG): The system first retrieves the Top-k visually similar wound images using the MedImageInsight-based image search described above. The language model then reasons over the retrieved examples and their labels to generate the final prediction. The implementation of the MedImageInsight-based image search and the RAG method for the infection detection task is available in our Samples Repository: https://aka.ms/healthcare-ai-examples rag_infection_detection.ipynb Evaluation We computed accuracy scores to evaluate the image search methods (Top-1 and Top-5 with majority vote), GPT-4o and GPT-4.1 models (zero-shot), as well as 5-shot and RAG-based methods. Table 4 reports accuracy for wound type classification and infection detection. Figure 1 presents examples of correct and incorrection predictions. Accuracy Image Search Top-1 Image Search Top-5 + majority vote GPT-4o (2023-07-01) GPT-4o (2024-11-20) GPT4.1 (2025-04-14) GPT4.1 5-shot Prompting GPT-4.1- RAG-5 Wound Type 0.7933 0.8333 0.4671 0.4803 0.5066 0.6118 0.7533 Infection 0.6800 0.7267 0.3947 0.3882 0.375 0.7237 0.7697 Table 4: Accuracy Scores for Wound Type Classification & Infection Detection Figure 1: Examples of Correct and Incorrection Predictions (GPT-4.1-RAG-5 Method) For wound type classification, image search with MedImageInsight embeddings performs best, achieving 0.7933 (Top-1) and 0.8333 (Top-5 + majority vote). GPT models alone perform substantially worse (0.4671-0.6118), while GPT-4.1 with retrieval augmentation (RAG-5), which uses the same MedImageInsight-based image search method to retrieve the Top-5 similar cases, narrows the gap (0.7533) but does not surpass direct image search. This suggests that categorical wound type is more effectively captured by visual similarity than by case-based reasoning with vision-language models. For infection detection, the trend reverses. Image search reaches 0.7267 (Top-5 + majority vote), while RAG-5 achieves the highest accuracy at 0.7697. In this case, the combination of visually similar cases with VLM-based reasoning outperforms both standalone image search and GPT prompting. This indicates that infection assessment depends on contextual or clinical cues that may not be fully captured by visual similarity alone but can be better interpreted when enriched with contextual reasoning over retrieved cases and their associated labels. Overall, these findings highlight complementary strengths: foundation-model-based image search excels at categorical visual classification (wound type), while retrieval-augmented VLMs leverage both visual similarity and contextual reasoning to improve performance on more nuanced tasks (infection detection). A hybrid system integrating both approaches may provide the most robust clinical support. Conclusion This study demonstrates the complementary roles of vision-language models in wound assessment. Image search using foundation-model embeddings shows strong performance on categorical tasks such as wound type classification, where visual similarity is most informative. In contrast, retrieval-augmented generation (RAG-5), which combines image search with case-based reasoning by a vision-language model, achieves the best results for infection detection, highlighting the value of integrating contextual interpretation with visual features. These findings suggest that a hybrid approach, leveraging both direct image similarity and retrieval-augmented reasoning, provides the most robust pathway for clinical decision support in wound care. Image Search Series: Blog Posts & Jupyter Notebooks Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub 2d_image_search.ipynb Image Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology | Microsoft Community Hub 3d_image_search.ipynb Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology | Microsoft Community Hub Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval | Microsoft Community Hub rag_infection_detection.ipynb Image Search Series Part V: Building Histopathology Image Search with Prov-GigaPath | Microsoft Community Hub 2d_pathology_image_search.ipynb The Microsoft healthcare AI models, including MedImageInsight, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. References Wen-wai Yim, Asma Ben Abacha, Robert Doerning, Chia-Yu Chen, Jiaying Xu, Anita Subbarao, Zixuan Yu, Fei Xia, M Kennedy Hall, Meliha Yetisgen. Woundcarevqa: A Multilingual Visual Question Answering Benchmark Dataset for Wound Care. Journal of Biomedical Informatics, 2025. Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaría-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie L. Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P. Lungren, Mu Wei: MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging. CoRR abs/2410.06542 (2024) Model catalog and collections in Azure AI Foundry portal https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overviewImage Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology
Introduction As the use of diagnostic 3D images increases, effective management and analysis of these large volumes of data grows in importance. Medical 3D image search systems can play a vital role by enabling clinicians to quickly retrieve relevant or similar images and cases based on the anatomical features and pathologies present in a query image. Unlike traditional 2D imaging, 3D imaging offers a more comprehensive view for examining anatomical structures from multiple planes with greater clarity and detail. This enhanced visualization has potential to assist doctors with improved diagnostic accuracy and more precise treatment planning. Moreover, advanced 3D image retrieval systems can support evidence-based and cohort-based diagnostics, demonstrating an opportunity for more accurate predictions and personalized treatment options. These systems also hold significant potential for advancing research, supporting medical education, and enhancing healthcare services. This blog offers guidance on using Azure AI Foundry and the recently launched healthcare AI models to design and test a 3D image search system that can retrieve similar radiology images from a large collection of 3D images. Along with this blog, we share a Jupyter Notebook with the the 3D image search system code, which you may use to reproduce the experiments presented here or start you own solution. 3D Image Search Notebook: http://aka.ms/healthcare-ai-examples-mi2-3d-image-search It is important to highlight that the models available on the AI Foundry Model Catalog are not designed to generate diagnostic-quality results. Developers are responsible for further developing, testing, and validating their appropriateness for specific tasks and eventually integrating these models into complete systems. The objective of this blog is to demonstrate how this can be achieved efficiently in terms of data and computational resources. The Problem Generally, the problem of 3D image search can be posed as retrieving cross-sectional (CS) imaging series (3D image results) that are similar to a given CS imaging series (query 3D image). Once posited this way, the key question becomes how to define such similarity? In the previous blog of this series, we worked with radiographs of the chest which constrained the notion of "similar" to the similarity between two 2D images, and a certain class of anatomy. In the case of 3D images, we are dealing with a volume of data, and a lot more variations of anatomy and pathologies, which expands the dimensions to consider for similarity; e.g., are we looking for similar anatomy? Similar pathology? Similar exam type? In this blog, we will discuss a technique to approximate the 3D similarity problem through a 2D image embedding model and some amount of supervision to constrain the problem to a certain class of pathologies (lesions) and cast it as "given cross-sectional MRI image , retrieve series with similar grade of lesions in similar anatomical regions". To build a search system for 3D radiology images using a foundation model (MedImageInsight) designed for 2D inputs, we explore the generation of representative 3D embedding vectors for the volumes with the foundation model embeddings of 2D slices to create a vector index from a large collection of 3D images. Retrieving relevant results for a given 3D image then consists in generating a representative 3D image embedding vector for the query image and searching for similar vectors in the index. An overview of this process is illustrated in Figure 1. Figure 1: Overview of the 3D image search process. The Data In the sample notebook that is provided alongside this blog, we use 3D CT images from the Medical Segmentation Decathlon (MSD) dataset [2-3] and annotations from the 3D-MIR benchmark [4]. The 3D-MIR benchmark offers four collections (Liver, Colon, Pancreas, and Lung) of positive and negative examples created from the MSD dataset with additional annotations related to the lesion flag (with/without lesion), and lesion group (1, 2, 3). The lesion grouping focuses on lesion morphology and distribution and considers the number, length, and volume of the lesions to define the three groups. It also adheres to the American Joint Committee on Cancer's Tumor, Node, Metastasis classification system’s recommendations for classifying cancer stages and provides a standardized framework for correlating lesion morphology with cancer stage. We selected the 3D-MIR Pancreas collection. 3D-MIR Benchmark: https://github.com/abachaa/3D-MIR Since the MSD collections only include unhealthy/positive volumes, each 3D-MIR collection was augmented with volumes randomly selected from the other datasets to integrate healthy/negative examples in the training and test splits. For instance, the Pancreas dataset was augmented using volumes from the Colon, Liver, and Lung datasets. The input images consist of CT volumes and associated 2D slices. The training set is used to create the index, and the test set is used to query and evaluate the 3D search system. 3D Image Retrieval Our search strategy, called volume-based retrieval, relies on aggregating the embeddings of the 2D slices of a volume to generate one representative 3D embedding vector for the whole volume. We describe additional search strategies in our 3D-MIR paper [4]. The 2D slice embeddings are generated using the MedImageInsight foundation model [5-6] from Azure AI Foundry model catalog [1]. In the search step, we generate the embeddings of the 3D query volumes according to the selected Aggregation method (Agg) and search for the top-k similar volumes/vectors in the corresponding 3D (Agg) index. We use the Median aggregation method to generate the 3D vectors and create the associated 3D index. We construct a 3D (Median) index using the training slices/volumes from the 3D-MIR Pancreas collection. Three other aggregation methods are available in the 3D image search notebook: Max Pooling, Average Pooling, and Standard Deviation. The search is performed following the k-Nearest Neighbors algorithm (or k-NN search) to find the k nearest neighbors of a given vector by calculating the distances between the query vector and all other vectors in the collection, then selecting the K vectors with the shortest distances. If the collection is large, the computation can be expensive, and it is recommended to use specific libraries for optimization. We use the FAISS (Facebook AI Similarity Search) library, an open-source library for efficient similarity search and clustering of high-dimensional vectors. Evaluation of the search results The 3D-MIR Pancreas test set consists of 32 volumes: 4 volumes with no lesion (lesion flag/group= -1) 3 volumes with lesion group 1 19 volumes with lesion group 2 6 volumes with lesion group 3 The training set consists of 269 volumes (with and without lesions) and was used to create the index. We evaluate the 3D search system by comparing the lesion group/category of the query volume and the top 10 retrieved volumes. We then compute Precision@k (P@k). Table 1 presents the P@1, P@3, P@5, P@10, and overall Precision. Table 1: Evaluation results on the 3D-MIR Pancreas test set The system accurately recognizes Healthy cases, consistently retrieving the correct label in test scenarios involving non-lesion pancreas images. However, performance varies for different lesion groups, reflecting challenges in precisely identifying smaller lesions (Group 1) or more advanced lesions (Group 3). This discrepancy highlights the complexity of lesion detection and underscores the importance of carefully tuning embeddings or adjusting the vector index to improve retrieval accuracy for specific lesion sizes. Visualization Figure 2 presents four different test queries from the Pancreas test set and the top 5 nearest neighbors retrieved by the volume-based search method. In each row, the first image is the query, followed by the retrieved images ranked by similarity. The visual overlays help in assessing retrieval accuracy; Blue indicates the pancreas organ boundaries, and Red highlights the mark regions corresponding to the pancreas tumor. Figure 2: Top 5 results for different queries from the Pancreas test set Table 2 presents additional results of the volume-based retrieval system [4] on other 3D-MIR datasets/organs (Liver, Colon, and Lung) using additional foundation models: BiomedCLIP [7], Med-Flamingo [8], and BiomedGPT [9]. When considering the macro-average across all datasets, MedImageInsight-based retrieval outperforms substantially other foundation models. Table 2: Evaluation Results on the 3D-MIR benchmark (Liver, Colon, Pancreas, and Lung) These results mirror a use case akin to lesion detection and severity measurement in a clinical context. In real-world applications—such as diagnostic support or treatment planning—it may be necessary to optimize the model to account for particular goals (e.g., detecting critical lesions early) or accommodate different imaging protocols. By refining search criteria, integrating more domain-specific data, or adjusting embedding methods, practitioners can enhance retrieval precision and better meet clinical requirements. Conclusion The integration of 3D image search systems in clinical environment can enhance and accelerate the retrieval of similar cases and provide better context to clinicians and researchers for accurate complex diagnoses, cohort selection, and personalized patient care. This 3D radiology image search blog and related notebook offers a solution based on 3D embedding generation for building and evaluating a 3D image search system using the MedImageInsight foundation model from Azure AI Foundry model catalog. Image Search Series: Blog Posts & Jupyter Notebooks Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub 2d_image_search.ipynb Image Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology | Microsoft Community Hub 3d_image_search.ipynb Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology | Microsoft Community Hub Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval | Microsoft Community Hub rag_infection_detection.ipynb Image Search Series Part V: Building Histopathology Image Search with Prov-GigaPath | Microsoft Community Hub 2d_pathology_image_search.ipynb The Microsoft healthcare AI models, including MedImageInsight, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. References Model catalog and collections in Azure AI Foundry portal https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overview Michela Antonelli et al. The medical segmentation decathlon. Nature Communications, 13(4128), 2022 https://www.nature.com/articles/s41467-022-30695-9 MSD: http://medicaldecathlon.com/ Asma Ben Abacha, Alberto Santamaría-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C. F. Codella, Ivan Tarapov: 3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology. CoRR abs/2311.13752, 2023 https://arxiv.org/abs/2311.13752 Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaría-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P. Lungren, Mu Wei: MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging. CoRR abs/2410.06542, 2024 https://arxiv.org/abs/2410.06542 MedImageInsight: https://aka.ms/mi2modelcard Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. NEJM AI 2025; 2(1) https://ai.nejm.org/doi/full/10.1056/AIoa2400640 Moor, M., Huang, Q., Wu, S., Yasunaga, M., Dalmia, Y., Leskovec, J., Zakka, C., Reis, E.P., Rajpurkar, P.: Med-flamingo: a multimodal medical few-shot learner. Machine Learning for Health, ML4H@NeurIPS 2023, 10 December 2023, New Orleans, Louisiana, USA. Proceedings of Machine Learning Research, vol. 225, pp. 353–367. PMLR, (2023) https://proceedings.mlr.press/v225/moor23a.html Zhang, K., Zhou, R., Adhikarla, E., Yan, Z., Liu, Y., Yu, J., Liu, Z., Chen, X., Davison, B.D., Ren, H., et al.: A generalist vision–language foundation model for diverse biomedical tasks. Nature Medicine, 1–13 (2024) https://www.nature.com/articles/s41591-024-03185-2Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology
Introduction Dermatology is inherently visual, with diagnosis often relying on morphological features such as color, texture, shape, and spatial distribution of skin lesions. However, the diagnostic process is complicated by the large number of dermatologic conditions, with over 3,000 identified entities, and the substantial variability in their presentation across different anatomical sites, age groups, and skin tones. This phenotypic diversity presents significant challenges, even for experienced clinicians, and can lead to diagnostic uncertainty in both routine and complex cases. Image-based retrieval systems represent a promising approach to address these challenges. By enabling users to query large-scale image databases using a visual example, these systems can return semantically or visually similar cases, offering useful reference points for clinical decision support. However, dermatology image search is uniquely demanding. Systems must exhibit robustness to variations in image quality, lighting, and skin pigmentation while maintaining high retrieval precision across heterogeneous datasets. Beyond clinical applications, scalable and efficient image search frameworks provide valuable support for research, education, and dataset curation. They enable automated exploration of large image repositories, assist in selecting challenging examples to enhance model robustness, and promote better generalization of machine learning models across diverse populations. In this post, we continue our series on using healthcare AI models in Azure AI Foundry to create efficient image search systems. We explore the design and implementation of such a system for dermatology applications. As a baseline, we first present an adapter-based classification framework for dermatology images by leveraging fixed embeddings from the MedImageInsight foundation model, available in the Azure AI Foundry model catalog. We then introduce a Retrieval-Augmented Generation (RAG) method that enhances vision-language models through similarity-based in-context prompting. We use the MedImageInsight foundation model to generate image embeddings and retrieve the top-k visually similar training examples via FAISS. The retrieved image-label pairs are included in the Vision-LLM prompt as in-context examples. This targeted prompting guides the model using visually and semantically aligned references, enhancing prediction quality on fine-grained dermatological tasks. It is important to highlight that the models available on the AI Foundry Model Catalog are not designed to generate diagnostic-quality results. Developers are responsible for further developing, testing, and validating their appropriateness for specific tasks and eventually integrating these models into complete systems. The objective of this blog is to demonstrate how this can be achieved efficiently in terms of data and computational resources. The Data The DermaVQA-IIYI [2] dermatology image dataset is a de-identified, diverse collection of nearly 1,000 patient records and nearly 3,000 dermatological images, created to support research in skin condition recognition, classification, and visual question answering. DermaVQA-IIYI dataset: https://osf.io/72rp3/files/osfstorage (data/iiyi) The dataset is split into three subsets: Training Set: 2,474 images associated with 842 patient cases Validation Set: 157 images associated with 56 cases Test Set: 314 images associated with 100 cases Total Records: 2,945 images (998 patient cases) Patient Demographics: Out of 998 patient cases: Sex – F: 218, M: 239, UNK: 541 Age (available for 398 patients): Mean: 31 yrs | Min: 0.08 yrs | Max: 92 yrs This wide range supports studies across all age groups, from infants to the elderly. A total of 2,945 images are associated with the patient records, with an average of 2.9 images per patient. This multiplicity enables the study of skin conditions from different perspectives and at various stages. Image Count per Entry: 1 image: 225 patients 2 images: 285 patients 3 images: 200 patients 4 or more images: 288 patients The dataset includes additional annotations for anatomic location, comprising 39 distinct labels (e.g., back, fingers, fingernail, lower leg, forearm, eye region, unidentifiable). Each image is associated with one or multiple labels. We use these annotations to evaluate the performance of various methods across different anatomical regions. Image Embeddings We generate image embeddings using the MedImageInsight foundation model [1] from the Azure AI Foundry model catalog [3]. We apply Uniform Manifold Approximation and Projection (UMAP) to project high-dimensional image embeddings produced by the MedImageInsight model into two dimensions. The visualization is generated using embeddings extracted from both the DermaVQA training and test sets, which covers 39 anatomical regions. For clarity, only the most frequent anatomical labels are displayed in the projection. Figure 1. UMAP projection of image embeddings produced by the MedImageInsight Model on the DermaVQA dataset. The resulting projection reveals that the MedImageInsight model captures meaningful anatomical distinctions: visually distinct regions such as fingers, face, fingernail, and foot form well-separated clusters, indicating high intra-class consistency and inter-class separability. Other anatomically adjacent or visually similar regions, such as back, arm, and abdomen, show moderate overlap, which is expected due to shared visual features or potential labeling ambiguity. Overall, the embeddings exhibit a coherent and interpretable organization, suggesting that the model has learned to encode both local and global anatomical structures. This supports the model’s effectiveness in capturing anatomy-specific representations suitable for downstream tasks such as classification and retrieval. Enhancing Visual Understanding We explore two strategies for enhancing visual understanding through foundation models. I. Training an Adapter-based Classifier We build an adapter-based classification framework designed for efficient adaptation to medical imaging tasks (see our prior posts for introduction into the topic of adapters: Unlocking the Magic of Embedding Models: Practical Patterns for Healthcare AI | Microsoft Community Hub). The proposed adapter model builds upon fixed visual features extracted from the MedImageInsight foundation model, enabling task-specific fine-tuning without requiring full model retraining. The architecture consists of three main components: MLP Adapter: A two-layer feedforward network that projects 1024-dimensional embeddings (generated by the MedImageInsight model) into a 512-dimensional latent space. This module utilizes GELU activation and Layer Normalization to enhance training stability and representational capacity. As a bottleneck adapter, it facilitates parameter-efficient transfer learning. Convolutional Retrieval Module: A sequence of two 1D convolutional layers with GELU activation, applied to the output of the MLP adapter. This component refines the representations by modeling local dependencies within the transformed feature space. Prediction Head: A linear classifier that maps the 512-dimensional refined features to the task-specific output space (e.g., 39 dermatology classes). The classifier is trained for 10 epochs (approximately 48 seconds) using only CPU resources. Built on fixed image embeddings extracted from the MedImageInsight model, the adapter efficiently tailors these representations for downstream classification tasks with minimal computational overhead. By updating only the adapter components, while keeping the MedImageInsight backbone frozen, the model significantly reduces computational and memory overhead. This design also mitigates overfitting, making it particularly effective in medical imaging scenarios with limited or imbalanced labeled data. A Jupyter Notebook detailing the construction and training of an MedImageInsight -based adapter model is available in our Samples Repository: https://aka.ms/healthcare-ai-examples-mi2-adapter Figure 3: MedImageInsight-based Adapter Model II. Boosting Vision-Language Models with in-Context Prompting We leverage vision-language models (e.g., GPT-4o, GPT-4.1), which represent a recent class of multimodal foundation models capable of jointly reasoning over visual and textual inputs. These models are particularly promising for dermatology tasks due to their ability to interpret complex visual patterns in medical images while simultaneously understanding domain-specific medical terminology. 1. Few-shot Prompting In this setting, a small number of examples from the training dataset are randomly selected and embedded into the input prompt. These examples, consisting of paired images and corresponding labels, are intended to guide the model's interpretation of new inputs by providing contextual cues and examples of relevant dermatological features. 2. MedImageInsight-based Retrieval-Augmented Generation (RAG) This approach enhances vision-language model performance by integrating a similarity-based retrieval mechanism rooted in MedImageInsight (Medical Image-to-Image) comparison. Specifically, it employs a k-nearest neighbors (k-NN) search to identify the top k dermatological training images that are most visually similar to a given query image. The retrieved examples, consisting of dermatological images and their corresponding labels, are then used as in-context examples in the Vision-LLM prompt. By presenting visually similar cases, this approach provides the model with more targeted contextual references, enabling it to generate predictions grounded in relevant visual patterns and associated clinical semantics. As illustrated in Figure 2, the system operates in two phases: Index Construction: Embeddings are extracted from all training images using a pretrained vision encoder (MedImageInsight). These embeddings are then indexed to enable efficient and scalable similarity search during retrieval. Query and Retrieval: At inference time, the test image is encoded similarly to produce a query embedding. The system computes the Euclidean distance between this query vector and all indexed embeddings, retrieving the k nearest neighbors with the smallest distances. To handle the computational demands of large-scale image datasets, the method leverages FAISS (Facebook AI Similarity Search), an open-source library designed for fast and scalable similarity search and clustering of high-dimensional vectors. The implementation of the image search method is available in our Samples Repository: https://aka.ms/healthcare-ai-examples-mi2-2d-image-search Figure 2: MedImageInsight-based Retrieval-Augmented Generation Evaluation Table 1 presents accuracy scores for anatomic location prediction on the DermaVQA-iiyi test set using the proposed modeling approaches. The adapter model achieves a baseline accuracy of 31.73%. Vision-language models perform better, with GPT-4o (2024-11-20) achieving an accuracy of 47.11%, and GPT-4.1 (2025-04-14) improving to 50%. However, incorporating few-shot prompting with five randomly selected in-context examples (5-shot) slightly reduces GPT-4.1’s performance to 48.72%. This decline suggests that unguided example selection may introduce irrelevant or low-quality context, potentially reducing the effectiveness of the model’s predictions for this specialized task. The best performance among the vision-language approaches is achieved using the retrieval-augmented generation (RAG) strategy. In this setup, GPT-4.1 is prompted with five nearest-neighbor examples retrieved using the MedImageInsight-based search method (RAG-5), leading to a notable accuracy increase to 51.60%. This improvement over GPT-4.1’s 50% accuracy without retrieval showcases the relevance of the MedImageInsight-based RAG method. We expect larger performance gains when using a more extensive dermatology dataset, compared to the relatively small dataset used in this example -- a collection of 2,474 images associated with 842 patient cases which served as the basis for selecting relevant cases and similar images. Dermatology is a particularly challenging domain, marked by a high number of distinct conditions and significant variability in skin tone, texture, and lesion appearance. This diversity makes robust and representative example retrieval especially critical for enhancing model performance. The results underscore the importance of example relevance in few-shot prompting, demonstrating that similarity-based retrieval can effectively guide the model toward more accurate predictions in complex visual reasoning tasks. Table 1: Comparative Accuracy of Anatomic Location Prediction on DermaVQA-iiyi Figure 2: Confusion Matrix of Anatomical Location Predictions by the trained MLP adapter: The matrix illustrates the model's performance in classifying wound images across 39 anatomical regions. Strong diagonal values indicate correct classifications, while off-diagonal entries highlight common misclassifications, particularly among anatomically adjacent or visually similar regions such as 'lowerback' vs. 'back' and 'hand' vs. 'fingers'. Figure 3. Examples of correct anatomical predictions by the RAG approach. Each image depicts a case where the model's predicted anatomical region exactly matches the ground truth. Shown are examples from visually and anatomically distinct areas including the eye region, lips, lower leg, and neck. Figure 4. Examples of misclassifications by the RAG approach. Each image displays a case where the predicted anatomical label differs from the ground truth. In several examples, predictions are anatomically close to the correct regions (e.g., hand vs. hand-back, lower leg vs. foot, palm vs. fingers), suggesting that misclassifications often occur between adjacent or visually similar areas. These cases highlight the challenge of precise localization in fine-grained anatomical classification and the importance of accounting for anatomical ambiguity in both modeling and evaluation. Conclusion Our exploration of scalable image retrieval and advanced prompting strategies demonstrates the growing potential of vision-language models in dermatology. A particularly challenging task we address is anatomic location prediction, which involves 39 fine-grained classes of dermatology images, imbalanced training data, and frequent misclassifications between adjacent or visually similar regions. By leveraging Retrieval-Augmented Generation (RAG) with similarity-based example selection using image embeddings from the MedImageInsight foundation model, we show that relevant contextual guidance can significantly improve model performance in such complex settings. These findings underscore the importance of intelligent image retrieval and prompt construction for enhancing prediction accuracy in fine-grained medical tasks. As vision-language models continue to evolve, their integration with retrieval mechanisms and foundation models holds substantial promise for advancing clinical decision support, medical research, and education at scale. In the next blog of this series, we will shift focus to the wound care subdomain of dermatology, and we will release accompanying Jupyter notebooks for the adapter-based and RAG-based methods to provide a reproducible reference implementation for researchers and practitioners. The Microsoft healthcare AI models, including MedImageInsight, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. Image Search Series: Blog Posts & Jupyter Notebooks Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub 2d_image_search.ipynb Image Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology | Microsoft Community Hub 3d_image_search.ipynb Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology | Microsoft Community Hub Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval | Microsoft Community Hub rag_infection_detection.ipynb Image Search Series Part V: Building Histopathology Image Search with Prov-GigaPath | Microsoft Community Hub 2d_pathology_image_search.ipynb The Microsoft healthcare AI models, including MedImageInsight, available in the Microsoft Foundry model catalog, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. References Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaría-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie L. Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P. Lungren, Mu Wei: MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging. CoRR abs/2410.06542 (2024) Wen-wai Yim, Yujuan Fu, Zhaoyi Sun, Asma Ben Abacha, Meliha Yetisgen, Fei Xia: DermaVQA: A Multilingual Visual Question Answering Dataset for Dermatology. MICCAI (5) 2024: 209-219 Model catalog and collections in Azure AI Foundry portal https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overview