healthcare
499 TopicsHealthcare Agent Orchestrator: Multi-agent Framework for Domain-Specific Decision Support
At Microsoft Build, we introduced the Healthcare Agent Orchestrator, now available in Azure AI Foundry Agent Catalog . In this blog, we unpack the science: how we structured the architecture, curated real tumor board data, and built robust agent coordination that brings AI into real healthcare workflows. Healthcare Agent Orchestrator assisting a simulated tumor board meeting. Introduction Healthcare is inherently collaborative. Critical decisions often require input from multiple specialists—radiologists, pathologists, oncologists, and geneticists—working together to deliver the best outcomes for patients. Yet most AI systems today are designed around narrow tasks or single-agent architectures, failing to reflect the real-world teamwork that defines healthcare practice. That’s why we developed the Healthcare Agent Orchestrator: an orchestrator and code sample built around Microsoft’s industry-leading healthcare AI models, designed to support reasoning and multidisciplinary collaboration -- enabling modular, interpretable AI workflows that mirror how healthcare teams actually work. The orchestrator brings together Microsoft healthcare AI models—such as MedImageParse for image recognition, CXRReportGen for automated radiology reporting, and MedImageInsight for retrieval and similarity analysis—into a unified, task-aware system that enables developers to build an agent that reflects real-word healthcare decision making pattern. This work was led by Yu (Aiden) Gu, who defined the research direction and led a cross-functional team in designing and developing the Healthcare Agent Orchestrator proof-of-concept. Healthcare Is Naturally Multi-Agent Healthcare decision-making often requires synthesizing diverse data types—radiologic images, pathology slides, genetic markers, and unstructured clinical narratives—while reconciling differing expert perspectives. In a molecular tumor board, for instance, a radiologist might highlight a suspicious lesion on CT imaging, a pathologist may flag discordant biopsy findings, and a geneticist could identify a mutation pointing toward an alternate treatment path. Effective collaboration in these settings hinges not on isolated analysis, but on structured dialogue—where evidence is surfaced, assumptions are challenged, and hypotheses are iteratively refined. To support the development of healthcare agent orchestrator, we partnered with a leading healthcare provider organization, who independently curated and de-identified a proprietary dataset comprising longitudinal patient records and real tumor board transcripts—capturing the complexity of multidisciplinary discussions. We provided guidance on data types most relevant for evaluating agent coordination, reasoning handoffs, and task alignment in collaborative settings. We then applied LLM-based structuring techniques to convert de-identified free-form transcripts into interpretable units, followed by expert review to ensure domain fidelity and relevance. This dataset provides a critical foundation for assessing agent coordination, reasoning handoffs, and task alignment in simulated collaborative settings. Why General-Purpose LLMs Fall Short for Healthcare Collaboration While general-purpose large language models have delivered remarkable results in many domains, they face key limitations in high-stakes healthcare environments: Precision is critical: Even small hallucinations or inconsistencies can compromise safety and decision quality Multi-modal integration is required: Many healthcare decisions involve interpreting and correlating diverse data types—images, reports, structured records—much of which is not available in public training sets Transparency and traceability matter: Users must understand how conclusions are formed and be able to audit intermediate steps The Healthcare Agent Orchestrator addresses these challenges by pairing general reasoning capabilities with specialized agents that operate over imaging, genomics, and structured EHRs—ensuring grounded, explainable results aligned with clinical expectations. Each agent contributes domain-specific expertise, while the orchestrator ensures coherence, oversight, and explainability—resulting in outputs that are both grounded and verifiable. Architecture: Coordinating Specialists Through Orchestration Healthcare Agent Orchestrator. Healthcare Agent Orchestrator’s multi-agent framework is built on modular AI infrastructure, designed for secure, scalable collaboration: Semantic Kernel: A lightweight, open-source development kit for building AI agents and integrating the latest AI models into C#, Python, or Java codebases. It acts as efficient middleware for rapidly delivering enterprise-grade solutions—modular, extensible, and designed to support responsible AI at scale. Model Context Protocol (MCP): an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. Magentic-One: Microsoft’s generalist multi-agent system for solving open-ended web and file-based tasks across domains—built on Microsoft AutoGen, our popular open-source framework for developing multi-agent applications. Each agent is orchestrated within the system and integrated via Semantic Kernel’s group chat infrastructure, with support for communication and modular deployment via Azure. This orchestration ensures that each model—whether interpreting a lung nodule, analyzing a biopsy image, or summarizing a genomic variant—is applied precisely where its expertise is most relevant, without overloading a single system with every task. The modularity of the framework also future-proofs: as new health AI models and tools emerge, they can be seamlessly incorporated into the ecosystem without disrupting existing workflows—enabling continuous innovation while maintaining clinical stability. Microsoft’s healthcare AI models at the Core Healthcare agent orchestrator also enables developers to explore the capabilities of Microsoft’s latest healthcare AI models: CXRReportGen: Integrates multimodal inputs—including current and prior X-ray images and report context—to generate grounded, interpretable radiology reports. The model has shown improved accuracy and transparency in automated chest X-ray interpretation, evaluated on both public and private data. MedImageParse 3 : A biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalities. MedImageInsight 4 : Facilitates fast retrieval of clinically similar cases, supports disease classification across broad range of medical image modalities, accelerating second opinion generation and diagnostic review workflows. Each model has the ability to act as a specialized agent within the system, contributing focused expertise while allowing flexible, context-aware collaboration orchestrated at the system level. CXRReportGen is included in the initial release and supports the development and testing of grounded radiology report generation. Other Microsoft healthcare models such as MedImageParse and MedImageInsight are being explored in internal prototypes to expand the orchestrator’s capabilities across segmentation, detection, and image retrieval tasks. Seamless Integration with Microsoft Teams Rather than creating new silos, Healthcare Agent Orchestrator integrates directly into the tools clinicians already use—specifically Microsoft Teams. Developers are investigating how clinicians can engage with agents through natural conversation, asking questions, requesting second opinions, or cross-validating findings—all without leaving their primary collaboration environment. This approach minimizes friction, improves user experience, and brings cutting-edge AI into real-world care settings. Building Toward Robust, Trustworthy Multi-Agent Collaboration Think of the orchestrator as managing a secure, structured group chat. Each participant is a specialized AI agent—such as a ‘Radiology’ agent, ‘PatientHistory’ agent, or 'ClinicalTrials‘ agent. At the center is the ‘Orchestrator’ agent, which moderates the interaction: assigning tasks, maintaining shared context, and resolving conflicting outputs. Agents can also communicate directly with one another, exchanging intermediate results or clarifying inputs. Meanwhile, the user can engage either with the orchestrator or with specific agents as needed. Each agent is configured with instructions (the system prompt that guides its reasoning), and a description (used by both the UI and the orchestrator to determine when the agent should be activated). For example, the Radiology agent is paired with the cxr_report_gen tool, which wraps Microsoft’s CXRReportGen model for generating findings from chest X-ray images. Tools like this are declared under the agent’s tools field and allow it to call foundation models or other capabilities on demand—such as the clinical_trials tool 5 for querying ClinicalTrials.gov. Only one agent is marked as facilitator, designating it as the moderator of the conversation; in this scenario, the Orchestrator agent fills that role. Early observations highlight that multi-agent orchestration introduces new complexities—even as it improves specialization and task alignment. To address these emergent challenges, we are actively evolving the framework across several dimensions: Mitigating Error Propagation Across Agents: Ensuring that early-stage errors by one agent do not cascade unchecked through subsequent reasoning steps. This includes introducing critical checkpoints where outputs from key agents are verified before being consumed by others. Optimizing Agent Selection and Specialization: Recognizing that more agents are not always better. Adding unnecessary or redundant agents can introduce noise and confusion. We’ve implemented a systematic framework that emphasizes a few highly suited agents per task —dynamically selected based on case complexity and domain needs—while continuously tracking performance gains and catching regressions early. Improving Transparency and Hand-off Clarity: Structuring agent interactions to make intermediate outputs and rationales visible, enabling developers (and the system itself) to trace how conclusions were reached, catch inconsistencies early, and intervene when necessary. Adapting General Frameworks for Healthcare Complexity Generic orchestration frameworks like Semantic Kernel provide a strong foundation—but healthcare demands more. The stakes are higher, the data more nuanced, and the workflows require precision, traceability, and regulatory compliance. Here’s how we’ve extended and adapted these systems to help address healthcare demands: Precision and Safety: We introduced domain-aware verification checkpoints and task-specific agent constraints to reduce inappropriate tool usage—supporting more reliable reasoning. To help uphold the high standards required in healthcare, we defined two complementary metric systems (Check Healthcare Agent Orchestrator Evaluation for more details): Core Metrics: monitor health agents selection accuracy, intent resolution, contextual relevance, and information aggregation RoughMetric: a composite score based on ROUGE that helps quantify the precision of generated outputs and conversation reliability. TBFact: A modified version of RadFact 2 that measures factuality of claims in agents' messages and helps identifying omissions and hallucination Domain-Specific Tool Planning: Healthcare agents must reason across multimodal inputs—such as chest X-rays, CT slices, pathology images, and structured EHRs. We’ve customized Semantic Kernel’s tool invocation and planning modules to reflect clinical workflows, not generic task chains. These infrastructure-level adaptations are designed to complement Microsoft Healthcare AI models—such as CXRReportGen, MedImageParse, and MedImageInsight—working together to enable coordinated, domain-aware reasoning across complex healthcare tasks. Enabling Collaborative, Trustworthy AI in Healthcare Healthcare demands AI systems that are as collaborative, adaptive, and trustworthy as the clinical teams they aim to support. The Healthcare Agent Orchestrator is a concrete step toward that vision—pairing specialized health AI models with a flexible, multi-agent coordination framework, purpose-built to reflect the complexity of real clinical decision-making. By aligning with existing healthcare workflows and enabling transparent, role-specific collaboration, this system shows promise to empower clinicians to work more effectively—with AI as a partner, not a replacement. Healthcare Multi-Agent Orchestrator and the Microsoft healthcare AI models are intended for research and development use. Healthcare Multi-Agent Orchestrator and the healthcare AI models not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and its performance for such purposes has not been established. You bear sole responsibility and liability for any use of Healthcare Multi-Agent Orchestrator or the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. 1 arXiv, Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale, February 2, 2025 2 arXiv, MAIRA-2: Grounded Radiology Report Generation, June 6, 2024 3 Nature Method, A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities, Nov 18, 2024 4 arXiv, Medimageinsight: An open-source embedding model for general domain medical imaging, Oct 9, 2024 5 Machine Learning for Healthcare Conference, Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology, August 4, 20236.6KViews2likes0CommentsIntroducing the Dragon Copilot experience for nurses
Nursing is at a crossroads. Rising patient acuity, persistent staffing shortages, and high turnover have pushed health systems to draw a hard line against digital workflows that distract nurses from care delivery. The solution isn’t layering AI onto outdated workflows—it’s reinventing how nurses work, making technology invisible and care effortless. Just as the shift from paper to digital introduced a generational opportunity for redesign, bringing ambient AI to nursing introduces a rare window for fundamental transformation: one that aligns technology with high value care delivery and gives nurses the same efficiency gains physicians have realized. Microsoft Dragon Copilot brings that vision to life—empowering nurses to “care out loud” and capturing documentation seamlessly while technology fades into the background. Nurses aren’t just the most trusted professionals; they deliver most of the direct patient care. Every interaction is a chance to engage, apply critical thinking, and improve outcomes. By reducing time spent navigating screens and freeing nurses to focus on presence and compassion, Dragon Copilot unlocks their full potential—creating powerful opportunities to elevate care at the bedside. How Dragon Copilot supports nursing staff Dragon Copilot turns real-time bedside conversations and observations into the clinical documentation nurses need—without pulling focus from the patient. Nurses speak and the interaction is ambiently recorded on a mobile EHR app; Dragon Copilot structures, summarizes, and surfaces what matters next, reducing clicks and cognitive load. What it delivers: Flowsheet documentation drafted from captured interactions and mapped to each hospital’s unique schema without needing to specify the template up front, making it easy for nurses to review in workflow, edit as needed, and file into the EHR. Nurse notes for incidents and status changes, ready to copy into the EHR or communication systems, saves nurses time on narrative documentation. Concise summaries of major findings and next steps to keep nurses organized amid constant interruptions. Beyond documentation: Query transcripts for details not typically captured in EHR documentation. Ask clinical questions and get answers from organization-trusted sources like FDA and MedlinePlus—supporting patient education at the bedside. Integrating into complex nursing workflows Transforming nursing workflow isn’t just about adding new technology—it’s about addressing the realities of nursing practice. That means supporting diverse workflows and preferences, managing the complexity of structured flowsheet documentation, and navigating the human side of change for an already strained workforce. Real impact requires more than layering large language models onto existing applications or repurposing physician-focused solutions—it demands a transformation that reduces administrative burden and aligns with how nurses deliver their best care. Here’s how we’ve met these challenges—and continue to refine the experience: Flexible interaction styles Every nurse works differently, and patient needs vary. Dragon Copilot adapts seamlessly by capturing documentation during patient conversations or when nurses record thoughts outside the room—without requiring extra steps or mode selection. Through the Epic Rover mobile app, Dragon Copilot supports: Conversational recordings captured naturally during patient interactions Stream-of-consciousness voice recordings captured outside the room This flexibility supports nurses in capturing observations in the way that fits their workflow without adding friction. Advanced AI purpose-built for nursing Dragon Copilot converts even ambiguous voice recordings into accurate, structured flowsheet entries--correctly mapping to the appropriate row and value out of thousands of possibilities. Our AI is adapted to nursing vocabulary and optimized for speed and accuracy. The result? Performance far beyond what generic frontier models can deliver. Seamless integration with existing flowsheet templates No need to migrate to a standard flowsheet format. Dragon Copilot works with your organization’s existing flowsheet schema. In addition, during the setup process, the solution: Automatically surfaces flowsheet optimization opportunities—such as duplicate rows—within the Dragon Admin Center, helping administrators enhance usability and improve AI accuracy during implementation. Enables rapid administrator validation of AI outputs, before go-live or after template change, so potential problems are resolved proactively without disrupting workflows In-app support for documentation integrity For every AI-extracted flowsheet value, nurses can: See the exact transcript section that supports it Access organization-configured guides during recording Review a preview of AI results upon pausing, with gaps highlighted, so they can complete documentation before sending values to the EHR Nurses stay in control Only AI-generated flowsheet values reviewed and accepted by the nurse become part of the medical record, giving nurses control of their EHR documentation. Scalable deployment through automation The Dragon admin center now supports the Dragon Copilot experience for nurses, providing a unified platform to manage deployments, configure product settings, create EHR instances, and access analytics for adoption tracking and change management. Administrators can define which flowsheet templates and rows are ambiently enabled, as well as manage user access to them. The platform also includes licensing and organization lifecycle management, giving organizations a comprehensive oversight of their Dragon Copilot implementation Transformation services Transformation isn’t just about technology—it’s about enabling change. Our services approach supports organizations in implementing change management practices proven by early customers to drive results. Designing with nurses at the helm Our collaboration with nurse leaders and frontline teams, paired with continuous investment in nurse-driven design, remains central to our approach to delivering meaningful transformation. From innovation to impact The Dragon Copilot experience for nurses is part of our Dragon Copilot AI clinical assistant—developed based on decades of clinical expertise with Microsoft’s scale, security, and relentless commitment to innovation. Microsoft was first to bring ambient solutions to physicians, and now we’re delivering the first ambient solution purpose-built for nursing—shaped through years of collaboration with nurse leaders and frontline teams. Our mission is simple yet bold: empower nurses to deliver exceptional care by reducing administrative burden and honoring human connection at the heart of care. Fulfilling this mission demands collaboration, relentless innovation, and a shared commitment to getting it right—because nurses deserve nothing less. Join us on this journey in shaping the future of nursing. Learn more at aka.ms/dragoncopilot andMicrosoft Learn.Ushering in the Next Era of Cloud-Native AI Capabilities for Radiology
Introducing Dragon Copilot, your AI companion for PowerScribe One For radiologists, the reporting workflow of the future is here. At RSNA 2025, in Chicago, we’re showcasing Dragon Copilot, a cloud-native companion for PowerScribe One. Currently in preview, Dragon Copilot builds on the trusted capabilities of PowerScribe One to accelerate innovation and modernize reporting workflows while unlocking extensibility for radiology teams and partners. Why we built it: Technical drivers for a new era With growing demand for imaging services coupled with a workforce shortage, healthcare professionals face increased workloads and burnout while patients experience greater wait times. With our breadth of healthcare industry experience combined with our AI expertise and development at Microsoft, we immediately understood how we could help address these challenges. For radiologists, we sought to plugin into existing reporting workflows with rapid innovation, scalable AI, and open extensibility. How we built it: Modern architecture and extensibility By delivering Dragon Copilot as cloud-native solution built on Azure, we can enable new services globally. We apply the full capabilities of Azure for compute, storage, and security for high availability and compliance. Our modular architecture enables fast delivery of new features with APIs at the core to allow seamless integration, extensibility, and partner innovation. To imbue the workflow with AI through our platform, we harness the latest generative, multimodal, and agentic AI (both internal and through our partners) to support clinical reporting, workflow automation, and decision support. Key architectural highlights: AI services: Integrated large language models (LLMs) and vision-language models (VLMs) for multimodal data processing. API-first design: RESTful APIs expose core functions (draft report content generation, prior summarization, quality checks and chat) enabling partners and developers to build extensions and custom workflows. Extensibility framework: Open platform for 1st- and 3rd-party extensions, supporting everything from custom AI models to workflow agents. Inside the innovation Dragon Copilot alongside PowerScribe provides a unified AI experience. Radiologists can take advantage of the latest AI advancements without disruption to their workflows. They do not need another widget taking up room on their desktop. Instead, they need AI that fits seamlessly into existing workflows connecting their data to the cloud. Our cloud-first approach brings increased reliability, stability, and performance to a radiologists’ workflow. I’m thrilled to highlight the key capabilities of this dynamic duo: PowerScribe One with Dragon Copilot. Prior report summary: Automatically summarizes relevant prior reports, surfacing key findings, and context for the current study. AI-generated draft reports and quality checks: The most transformative aspect of Dragon Copilot is its open, extensible architecture for AI integration. We don’t limit radiology teams to a single set of AI tools. We enable seamless plug-ins for AI apps & agents from both Microsoft and our growing ecosystem of 3rd-parties. We provide a single surface for all your AI needs. This approach will enable radiology departments to discover, acquire, & deploy new AI-powered extensions. We’re enthusiastic about embarking on this journey with partners. We're also excited about collaborations with developers and academic innovators to bring their own AI models and services directly into the Dragon Copilot experience. Integrated chat experience with credible knowledge sources and medical safeguards: This chat interface connects radiologists to credible, clinically validated sources from Radiopedia and Radiology Assistant. It enables agentic orchestration and safeguards provided by Azure's Healthcare Agent Services for PHI and clinical accuracy. In the future, we expect to have a variety of other sources for radiology customers to choose from as well as the ability for organizations to add their own approved policies and protocols. This chat is designed to route questions to the right agent, provide evidence for claims, and filter responses for clinical validity. Over time, it will include extensions with custom agents powered by Copilot Studio. Help us shape what’s next As we continue to evolve Dragon Copilot alongside PowerScribe One, we invite innovators, developer partners, and academics to join us in shaping the future of radiology workflow. Dragon Copilot is more than a product; it’s a solution for rapid, responsible innovation in radiology. By combining cloud-native architecture, advanced AI capabilities, and open extensibility, we’re enabling radiology teams to work smarter, faster, and with greater confidence. Ready to see it in action? Visit us at RSNA 2025 (November 30–December 4), booth #1311 South Hall. Or contact our team to join the journey.Protect patient privacy across languages with the de-identification service's preview expansion
Machine learning and analytics are transforming healthcare by streamlining clinical workflows, powering AI models and unlocking new insights from patient data. These innovations are fueled by textual data rich in Protected Health Information (PHI). To be used for research, innovation and operational improvements, this data must be responsibly de-identified to protect patient privacy. Manual de-identification can be slow, expensive, and error-prone, creating bottlenecks that delay progress and limit collaboration. De-identification is more than a compliance standard; it is the key to unlocking healthcare data’s full potential while maintaining patient privacy and trust. Today, we are excited to announce the expansion of the Azure Health Data Services de-identification service to support five new preview language-locale combinations: Spanish (United States) German (Germany) French (France) French (Canada) English (United Kingdom) This language expansion enables global healthcare organizations to unlock insights from data beyond English while continuing to adhere to regulatory standards. Why Language Support Matters Healthcare data is generated in many languages around the world, and each one comes with its own linguistic structure, formatting, and privacy considerations. By expanding support to multiple preview languages such as Spanish, French, German, and English, our de-identification service allows organizations to unlock data from a broader range of countries and regions. But language alone isn’t the whole story. Different locales within the same language (French in France vs. Canada, or English in the UK vs. the US) often format PHI in unique ways. Addresses, medical institutions, and identifiers can all look different depending on the region. Our service is designed to recognize and accurately de-identify these locale-specific patterns, supporting privacy and compliance wherever the data originates. How It Works The Azure Health Data Service de-identification service empowers healthcare organizations to protect patient data through three key operations: TAG detects and annotates PHI from unstructured text. REDACT obfuscates PHI to prevent exposure. SURROGATE replaces PHI with realistic, synthetic surrogates, preserving data utility while ensuring privacy. Our service leverages state-of-the-art machine learning models to identify and handle sensitive information, supporting compliance with HIPAA's Safe Harbor standards and unlinked pseudonymization aligned with GDPR principle. By maintaining entity consistency and temporal relationships, organizations can use de-identified data for research, analytics, and machine learning without compromising patient privacy. Unlocking New Use Cases By expanding the service's language support, organizations can now address some of the most pressing data challenges in healthcare: Reduce organizational liability by meeting evolving privacy standards. Enable secure data sharing across institutions and regions. Unlock AI opportunities by training models on multilingual, de-identified data. Share de-identified data across institutions to create larger, more diverse datasets. Conduct longitudinal research while preserving patient privacy. Proven Accuracy Researchers at the University of Oxford recently conducted a comprehensive comparative study evaluating multiple automated de-identification systems across 3,650 UK hospital records. Their analysis compared both task-specific transformer models and general-purpose large language models. The Azure Health Data Services de-identification service achieved the highest overall performance among the 9 evaluated tools, demonstrating a recall score of 0.95. The study highlights how robust de-identification enables large-scale, privacy-preserving EHR research and supports the responsible use of AI in healthcare. Read the full study here: Benchmarking transformer-based models for medical record deidentification Preview: Your Feedback Matters This multilingual feature is now available in preview. We invite healthcare organizations, research institutions, and clinicians to: Try it out Overview of the de-identification service in Azure Health Data Services | Microsoft Learn. Provide feedback to help refine the service: Azure Health Data Service multilingual de-identification Service Feedback – Fill out form. Join us in shaping the future of privacy-preserving healthcare innovation. At Microsoft, we are committed to helping healthcare providers, payors, researchers, and life sciences companies unlock the value of data while maintaining the highest standards of patient privacy. Azure Health Data Services de-identification service empowers organizations to accelerate AI and analytics initiatives safely, supporting innovation and improving patient outcomes across the healthcare ecosystem. Explore Azure Health Data Services to see how our solutions help organizations transform care, research, and operational efficiency.1.2KViews2likes0CommentsFine-Tuning Healthcare AI Models: Custom Segmentation for Your Healthcare Data
This post is part of our healthcare AI fine-tuning series: MedImageInsight Fine-Tuning - Embeddings and classification MedImageParse Fine-Tuning - Segmentation and spatial understanding (you are here) CXRReportGen Fine-Tuning - Clinical findings generation Introduction MedImageParse now supports fine-tuning, allowing you to adapt Microsoft’s open-source biomedical foundation model to your healthcare use cases and data. Adapting this model can take as little as an hour to add new segmentation targets, add new modalities or boost performance significantly on your data. We’ll demonstrate how we achieved large performance gains across multiple metrics on a public dataset. Biomedical clinical apps often need highly specialized models, but training one from scratch is expensive and data-intensive. Traditional approaches require thousands of annotated images, weeks of compute time, and deep machine learning expertise just to get started. Fine-tuning offers a practical alternative. By starting with a strong foundation model and adapting it to your specific domain, you can achieve production-ready performance with hundreds of examples and hours of training time. Everything you need to start finetuning is available now, including a ready-to-use AzureML pipeline, complete workflow notebooks, and deployment capabilities. We fine-tuned MedImageParse on the CDD-CESM mammography dataset (specialized CESM modality for lesion segmentation) to demonstrate domain adaptation on data under‑represented in pre-training. Follow along: The complete example is in our GitHub repository as a ready-to-run notebook. What is MedImageParse? MedImageParse (MIP) is Microsoft’s open-source implementation of BiomedParse that comes with a permissive MIT license and is designed for integration into commercial products. It is a powerful and flexible foundation model for text-prompted medical imaging segmentation. MIP accepts an image and one or more prompts (e.g. “neoplastic cells in breast pathology” or “inflammatory cells,”) then accurately identifies and segments the corresponding structures within the input image. Trained on a wide range of biomedical imaging datasets and tasks, MIP captures robust feature representations that are highly transferrable to new domains. Furthermore, it operates efficiently on a single GPU, making it a practical tool for research laboratories without extensive computational resources. Built with adaptability in mind, the model can be fine-tuned using your own datasets to refine segmentation targets, accommodate unique imaging modalities, or improve performance on local data distributions. Its modest computational footprint, paired with this flexibility, positions MIP as a strong starting point for custom medical imaging solutions. When to Fine-tune (and When NOT to) Fine-tuning can transform MedImageParse into your own clinical asset that's aligned with your institution’s needs. But how do you know if that’s the right approach for your use case? Fine-tuning makes sense when you’re working with specialized imaging protocols (custom equipment or acquisition parameters), rare structures not well-represented in general datasets, or when you need high precision for quantitative analysis. You’ll need some high-quality annotated examples to see meaningful improvements; more is better, but thousands aren’t required. Simpler approaches might work instead if the pre-trained model already performs reasonably well on standard anatomies and common pathologies. If you’re still in exploratory mode figuring out what to measure, start with the base model first to establish a strong baseline for your use case. Our example shows how fine-tuning can deliver significant performance gains even with modest resources. With about one hour of GPU time and 200-500 annotated images, fine-tuning showed a significant improvement across multiple metrics. The Fine-tuning Pipeline: From Data to Deployed Model To demonstrate fine-tuning in action, we used the CDD-CESM mammography dataset: a collection of Contrast-Enhanced Spectral Mammography (CESM) images with expert-annotated breast lesion masks. CESM is a specialized imaging modality that wasn’t well represented in MedImageParse’s original training data. The dataset 1 (can be downloaded from our HuggingFace location or from its original TCIA page) includes predefined splits with high-quality segmentation annotations. Why AzureML Pipelines? Before diving into the workflow, it’s worth understanding why we use AzureML pipelines for this process. Every experiment is tracked with full versioning; you always know exactly what you ran and can reproduce results months later. The pipeline handles multi-GPU distribution automatically without code changes, making it easy to scale up. The modular design lets you mix and match components for your specific needs, swap data preprocessing, adjust training parameters, or change deployment strategies independently. Training metrics, validation curves, and resource utilization are logged automatically, giving you full visibility into the process. Learn more about Azure ML pipelines. Fine-Tuning Workflow Setup: Upload data and configure compute The first step uploads your training data and configuration to AzureML as versioned assets. You’ll configure a GPU compute cluster (H100 or A100 instances recommended) that will handle the training workload. # Create and upload training data folder training_data = Data( path="CDD-CESM", type=AssetTypes.URI_FOLDER, description=f"{name} training data", name=f"{name}-training_data", ) training_data = ml_client.data.create_or_update(training_data) # Create and upload parameters file parameters = Data( path="parameters.yaml", type=AssetTypes.URI_FILE, description=f"{name} parameters", name=f"{name}-parameters", ) parameters = ml_client.data.create_or_update(parameters) Fine-tuning: The medimageparse_finetune component The fine-tuning component takes three inputs: The pre-trained MedImageParse model (foundation weights) Your annotated dataset Training configuration (learning rate, batch size, augmentation settings) During training the pipeline applies augmentation, tracks validation metrics, and checkpoints periodically. The output is an MLflow-packaged model, a portable artifact that includes the model weights, preprocessing code that is ready to deploy in AzureML or AI Foundry. The pipeline uses parameter-efficient fine-tuning techniques to adapt the model while preserving the broad knowledge from pre-training. This means you get specialized performance without catastrophic forgetting of the base model’s capabilities. # Get the pipeline component finetune_pipeline_component = ml_registry.components.get( name="medimageparse_finetune", label="latest" ) # Get the latest MIP model model = ml_registry.models.get(name="MedImageParse", label="latest") # Create the pipeline @pipeline(name="medimageparse_finetuning" + str(random.randint(0, 100000))) def create_pipeline(): mip_pipeline = finetune_pipeline_component( pretrained_mlflow_model=model.id, data=data_assets["training_data"].id, config=data_assets["parameters"].id, ) return {"mlflow_model_folder": mip_pipeline.outputs.mlflow_model_folder} # Submit the pipeline pipeline_object = create_pipeline() pipeline_object.compute = compute.name pipeline_object.settings.continue_on_step_failure = False pipeline_job = ml_client.jobs.create_or_update( pipeline_object, experiment_name="medimageparse_finetune_experiment" ) Deployment: Register and serve the model After training, the model can be registered in your AzureML workspace with version tracking. From there, deployment to a managed online endpoint takes a single command. The endpoint provides a scalable REST API backed by GPU compute for optimal inference performance. # Register the Model run_model = Model( path=f"azureml://jobs/{pipeline_job.name}/outputs/mlflow_model_folder", name=f"MIP-{name}-{pipeline_job.name}", description="Model created from run.", type=AssetTypes.MLFLOW_MODEL, ) run_model = ml_client.models.create_or_update(run_model) # Create endpoint and deployment with the classification model endpoint = ManagedOnlineEndpoint(name=name) endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result() deployment = ManagedOnlineDeployment( name=name, endpoint_name=endpoint.name, model=run_model.id, instance_type="Standard_NC40ads_H100_v5", instance_count=1, ) deployment = ml_client.online_deployments.begin_create_or_update(deployment).result( Testing: Text-prompted inference With the endpoint deployed, you can send test images along with text prompts describing what to segment. For the CDD-CESM example, we use text prompts: “neoplastic cells in breast pathology & inflammatory cells”. The model returns multiple segmentation masks for different detected regions. Text-prompting lets you switch focus on the fly (e.g., “tumor boundary” vs. “inflammatory infiltration”) without retraining or reconfiguring the model. Results Fine-tuning made a huge difference in how well the model works. The Dice Score, which shows how closely the model’s results match the actual regions, more than doubled, from 0.198 to 0.486. The IoU, another measure of overlap, nearly tripled, going from 0.139 to 0.383. Sensitivity jumped from 0.251 to 0.535, which means the model found more real positives. Metric Base Fine-tuned Δ Abs Δ Rel Dice (F1) 0.198 0.486 +0.288 +145% IoU 0.139 0.383 +0.244 +176% Sensitivity 0.251 0.535 +0.284 +113% Specificity 0.971 0.987 +0.016 +1.6% Accuracy 0.936 0.963 +0.027 +2.9% These improvements really matter in practice. When the Dice and IoU scores go up, it means the model is better at outlining the exact shape and size of problem areas, which helps doctors get more accurate measurements and track changes over time. The jump in sensitivity means the model is finding more actual lesions, while keeping specificity above 98% makes sure there aren’t a lot of false alarms. The improvement accuracy is impressive, but the more significant upgrades in overlap and recall are most impressive and matter most for getting precise results in medical images. Try It on Your Own Data To successfully implement this solution in your organization, focus first on the core requirements and resources that will ensure a seamless transition. The following section outlines these essential steps so you can move efficiently from planning to deployment and set your team up for optimal results. Dataset size: Start with 200-500 annotated images. This is enough to see meaningful performance improvements without requiring massive data collection efforts. More data generally helps, but you don’t need thousands of examples to get started. Annotation quality: High-quality segmentation masks are critical. Invest in precise boundary delineations (pixel-level accuracy where possible), consistent annotation protocols across all images, and quality control reviews to catch and correct errors. Annotation effort: Budget enough time per image for careful annotation. Apply active learning approaches to focus effort on the most informative samples and start with a smaller pilot dataset (100-150 images) to validate the approach before scaling up. Training compute: A100 or H100 recommended (one device with multiple GPUs is sufficient for a few hundred image runs). For the CDD-CESM dataset, we used NC-series VMs (single-node) with 8 GPUs and training on 300 images took around 30 minutes for 10 epochs. If you’re training on larger datasets (thousands of images), consider upgrading to ND-series VMs, which offer better multi-node performance and allow you to train on large volumes of data faster. Where to Go from Here? So, what does this mean for your workflows and clinical teams? Foundation models like MedImageParse provide significant power and performance. They’re flexible with text-prompted multi-task capabilities that can integrate into existing workflows without retooling and are relatively cheap to use for inference. This means faster review, more precise assessments, and independence from vendor development timelines. But these models are not adapted to your institution and use cases out of the box, but developing a foundation model from scratch is prohibitively expensive. Fine-tuning bridges that gap: you can boost performance on your data and adapt it to your use case at a fraction of the cost. You control what the model learns, how it fits your workflow, and its validation for your context. We’ve provided the complete tools to do that: the fine-tuning notebook walks through the entire process, from data preparation to deployment. By following this workflow and collecting annotated data from your institution (see “Try It on Your Own Data” above for requirements), you can deploy MedImageParse tailored to your institution and use cases. References Khaled R., Helal M., Alfarghaly O., Mokhtar O., Elkorany A., El Kassas H., Fahmy A. Categorized Digital Database for Low energy and Subtracted Contrast Enhanced Spectral Mammography images [Dataset]. (2021) The Cancer Imaging Archive. DOI: 10.7937/29kw-ae92 https://www.cancerimagingarchive.net/collection/cdd-cesm/Azure OpenAI GPT model to review Pull Requests for Azure DevOps
In recent months, the use of Generative Pre-trained Transformer (GPT) models for natural language processing (NLP) has gained significant traction. GPT models, which are based on the Transformer architecture, can generate text from arbitrary sources of input data and can be trained to identify errors and detect anomalies in text. As such, GPT models are increasingly being used for a variety of applications, ranging from natural language understanding to text summarization and question-answering. In the software development world, developers use pull requests to submit proposed changes to a codebase. However, reviews by other developers can sometimes take a long time and not accurate, and in some cases, these reviews can introduce new bugs and issues. In order to reduce this risk, During my research I found the integration of GPT models is possible and we can add Azure OpenAI service as pull request reviewers for Azure Pipelines service. The GPT models are trained on developer codebases and are able to detect potential coding issues such as typos, syntax errors, style inconsistencies and code smells. In addition, they can also assess code structure and suggest improvements to the overall code quality. Once the GPT models have been trained, they can be integrated into the Azure Pipelines service so that they can automatically review pull requests and provide feedback. This helps to reduce the time taken for code reviews, as well as reduce the likelihood of introducing bugs and issues.45KViews4likes13CommentsFine-Tuning Healthcare AI Models: Discovering the Power of Finetuning MedImageInsight on Your Data
This post is part of our healthcare AI fine-tuning series: MedImageInsight Fine-Tuning - Embeddings and classification (you are here) MedImageParse Fine-Tuning - Segmentation and spatial understanding CXRReportGen Fine-Tuning - Clinical findings generation Introduction That’s the promise of MedImageInsight (MI2), Microsoft’s open-source foundation model that’s revolutionizing medical imaging analysis. Developed by Microsoft Health and Life Sciences, MedImageInsight is designed as a "generalist" foundation model, offering capabilities across diverse medical imaging fields. MI2 achieves state-of-the-art or human expert-level results in tasks like classification, image search, and 3D medical image retrieval. Its features include: Multi-domain versatility: Trained on medical images from fourteen different domains such as X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. State-of-the-art (SOTA) performance: Achieves SOTA or human expert-level results in tasks like classification, image-image search, and fine-tuning on public datasets, with proven excellence in CT 3D medical image retrieval, disease classification for chest X-ray, dermatology, OCT imaging, and even bone age estimation. Regulatory-ready features: When used on downstream tasks, MI2 allows for sensitivity/specificity adjustments to meet clinical regulatory requirements. Transparent decision-making: Provides evidence-based decision support through image-image search, image-text search, enhancing explainability. Efficient report generation: When paired with a text decoder, it delivers near state-of-the-art report generation using only 7% of the parameters compared to similar models. 3D capability: Leverages 3D image-text pre-training to achieve state-of-the-art performance for 3D medical image retrieval. Fairness: Out-performs other models in AI fairness evaluations across age and gender in independent clinical assessments. MI2 is available now through the Azure AI Foundry model catalog (docs) and has already demonstrated its value across numerous applications. We’ve made it even easier for you to explore its capabilities with our repository full of examples and code for you to try. It covers: Outlier detection: Encoding CT/MR series to spot anomalies. Zero-shot classification with text labels: Identifying conditions without prior training. Adapter training: Specializing in specific classification tasks. Exam parameter detection: Normalizing MRI series and extracting critical details. Multimodal adapter analysis: Merging insights from radiology and pathology Image search: Finding similar cases to aid diagnosis using both 2D images and 3D volumes (cross-sectional imaging). Model monitoring: Ensuring consistent performance over time (code coming soon). While these capabilities are impressive on their own, the true potential of MI2 lies in its adaptability. This is where fine-tuning comes in: the ability to customize this powerful foundation model for specific clinical applications at your institution. Fine-tuning currently available in public preview, can transform this foundation model into production-ready, clinical-grade assets tailored to your specific needs and workflow while maintaining regulatory compliance. Note: This blog post demonstrates how MedImageInsight can be fine-tuned for new data. This example is illustrative; however, the same process can be used to develop production-ready clinical assets when following appropriate regulatory guidelines. Teaching an Old (Actually New) AI New Tricks MedImageInsight’s architecture offers distinct advantages for fine-tuning: Lightweight design: MI2 utilizes a DaViT image encoder (360M parameters) and language encoder (252M parameters) Efficient scale: With a total of only 0.61B parameters compared to multi-billion parameter alternatives, MI2 requires significantly less computational resources than comparable models. Training flexibility: The model supports both image-text and image-label pairs for different training approaches. Solid foundation: Pre-trained on 3.7M+ diverse medical images, MI2 starts with robust domain knowledge. MI2 is ideal for fine-tuning to specific medical imaging domains, allowing for clinical applications that integrate into healthcare workflows after validation. The model maintains its strengths while adapting to specialized patterns and requirements. Using AzureML Pipelines for an MI2 Glow Up The Azure Machine Learning (AzureML) pipeline streamlines the fine-tuning process for MI2. This end-to-end workflow, available now as a public preview, manages everything from data preparation to model registration in a reproducible manner: To finetune MI2, we use an AzureML pipeline which streamlines the fine-tuning process with distributed training on GPU clusters. We’ve released five components into public preview to enable you to fine-tune MI2 and simplify related processes like generating a classifier model: MedImageInsight model finetuning core component (component) is the core component of the fine-tuning process that trains the MedImageInsight model. This component requires four separate TSV files as input: an image TSV and a text TSV for training, plus the same two files for evaluation, TSV file of the all the possible text strings and a training configuration YAML file. This component supports distributed training on a multi-GPU cluster. MedImageInsight embedding generation component (component) creates embeddings from images using the MedImageInsight model. It allows customization of image quality and dimensions, and outputs a pickled NumPy array containing embeddings for all processed images. MedImageInsight adapter finetune component (component) takes NumPy arrays of training and validation data along with their associated text labels (from TSV). It trains a specialized 3-layer model designed for classification tasks and optimizes performance for specific domains while maintaining MI2's core capabilities. MedImageInsight image classifier assembler component (component) combines your fine-tuned embedding model with a label file into a deployable image classifier. This component takes the finetune MI2 embedding model, text labels and an optional adapter model then packages them into a unified MLFlow model ready for deployment. The resulting model package can operate in either zero-shot mode or with a custom adapter model. MedImageInsight pipeline component (component) provides an end-to-end pipeline component that integrates all components into one workflow. It is a simple pipeline that trains, evaluates, and outputs the embedding and classification models Example Dataset: GastroVision To demonstrate MI2's fine-tuning capabilities, we're using the GastroVision dataset [1] as a real-world example. It's important to note that our goal here is not to build the ultimate gastroenterology classifier, but rather to showcase how MI2 can be effectively fine-tuned. The techniques demonstrated can be applied to your institution's data to create customized embedding models that support not only classification, but all the applications we’ve mentioned, from zero shot classification and outlier detection to image search and multimodal analysis. The GastroVision dataset offers an excellent test case for several reasons: Open-access dataset: 8,000 endoscopy images collected from two hospitals in Norway and Sweden Diverse classes: Spans 27 distinct classes of gastrointestinal findings with significant class imbalance Real-world challenges: High similarity between certain classes, multi-center variability, and rare findings with limited examples Recent publication: Published in 2023 and wasn't included in MI2's original training data. With approximately 8,000 endoscopic images labeled across 27 different classes, this dataset provides a practical context for fine-tuning MI2's embedding capabilities. By demonstrating how MI2 can adapt to new data, we illustrate how you might fine-tune the model on your own data to create production-ready, clinical-grade specialized embedding models tailored to your unique imaging environments, equipment, and patient populations. The Code: Getting the Data Prep’d The first step in fine-tuning MI2 is preparing your dataset. For the GastroVision dataset, we need to preprocess the images and structure the data in a format suitable for training: def gastro_folder_to_text(folder_name): label = folder_to_label[folder_name] view = labels_to_view[label] return f"endoscopy gastrointestinal {view} {label}" gastrovision_root_directory = "/home/azureuser/data/Gastrovision" text_to_label = {} folders = os.listdir(gastrovision_root_directory) for folder in folders: label = folder_to_label[folder] text = gastro_folder_to_text(folder) text_to_label[text] = label data = [] files = list(glob.glob(os.path.join(gastrovision_root_directory, "**/*.jpg"), recursive=True)) for file_path in tqdm(files, ncols=120): folder = os.path.basename(os.path.dirname(file_path)) filename = os.path.basename(file_path) text = gastro_folder_to_text(folder) with Image.open(file_path) as img: img = img.resize((512, 512)).convert("RGB") buffered = BytesIO() img.save(buffered, format="JPEG", quality=95) img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") data.append( [f"{folder}/{filename}-{os.path.basename(file_path)}", img_str, text] ) df = pd.DataFrame(data, columns=["filename", "image", "text"]) This preprocessing pipeline does several important tasks: Resizing and standardizing images to 512x512 pixels Converting images to base64 encoding for efficient storage. Then we convert the encoded images into the TSV: # Function to format text as JSON def format_text_json(row): return json.dumps( { "class_id": text_index[row["text"]], "class_name": row["text"], "source": "gastrovision", "task": "classification", } ) # Filter the dataframe to only include the top 22 text captions df_filtered = df[df["text"].isin(df["text"].value_counts().index[:22])].reset_index( drop=True ) # Get unique texts from the filtered dataframe unique_texts = df_filtered["text"].unique() # Save the unique texts to a text file with open("unique_texts.txt", "w") as f: for text in unique_texts: f.write(text + "\n") # Create a dictionary to map text labels to indices text_index = {label: index for index, label in enumerate(unique_texts)} # Apply the formatting function to the text column df_filtered["text"] = df_filtered.apply(format_text_json, axis=1) # Split the dataframe into training, validation, and test sets train_df, val_test_df = train_test_split( df_filtered, test_size=0.4, random_state=42, stratify=df_filtered["text"] ) validation_df, test_df = train_test_split( val_test_df, test_size=0.5, random_state=42, stratify=val_test_df["text"] ) # Create separate dataframes for images and labels and save the dataframes to TSV files def split_and_save_tsvs(aligned_df, prefix): image_df = aligned_df[["filename", "image"]] text_df = aligned_df[["filename", "text"]] text_df.to_csv( f"{prefix}_text.tsv", sep="\t", index=False, header=False, quoting=csv.QUOTE_NONE, ) image_df.to_csv(f"{prefix}_images.tsv", sep="\t", index=False, header=False) split_and_save_tsvs(train_df, "train") split_and_save_tsvs(validation_df, "validation") split_and_save_tsvs(test_df, "test") Filtering to include only classes with sufficient samples. Creating label mappings for classification. Splitting data into training, validation, and test sets. Exporting processed data as TSV files for AzureML. After preparing the datasets, we need to upload them to AzureML as data assets: name = "gastrovision" assets = { "image_tsv": "train_images.tsv", "text_tsv": "train_text.tsv", "eval_image_tsv": "validation_images.tsv", "eval_text_tsv": "validation_text.tsv", "label_file": "unique_texts.txt", } data_assets = { key: Data( path=value, type=AssetTypes.URI_FILE, description=f"{name} {key}", name=f"{name}-{key}", ) for key, value in assets.items() } for key, data in data_assets.items(): data_assets[key] = ml_client.data.create_or_update(data) print( f"Data asset {key} created or updated.", data_assets[key].name, data_assets[key].version, ) These uploaded assets are versioned in AzureML, allowing for reproducibility and tracking of which specific data was used for each training run. The Code: Cue the Training Montage In the notebook, we demonstrate a straightforward example of finetuning using the pipeline component, but you can integrate these components into larger pipelines that train more complex downstream tasks such as exam parameter classification, report generation, analysis of 3D scans, etc. conf_file = "train-gastrovision.yaml" data = Data( path=conf_file, type=AssetTypes.URI_FILE, description=f"{name} conf_files", name=f"{name}-conf_files", ) data_assets["conf_files"] = ml_client.data.create_or_update(data) # Get the pipeline component finetune_pipline_component = ml_registry.components.get( name="medimage_insight_ft_pipeline", label="latest" ) # Get the latest MI2 model model = ml_registry.models.get(name="MedImageInsight", label="latest") @pipeline(name="medimage_insight_ft_pipeline_job" + str(random.randint(0, 100000))) def create_pipeline(): mi2_pipeline = finetune_pipline_component( mlflow_embedding_model_path=model.id, compute_finetune=compute.name, instance_count=8, **data_assets, ) return { "classification_model": mi2_pipeline.outputs.classification_mlflow_model, "embedding_model": mi2_pipeline.outputs.embedding_mlflow_model, } pipeline_object = create_pipeline() pipeline_object.compute = compute.name pipeline_object.settings.continue_on_step_failure = False pipeline_job = ml_client.jobs.create_or_update(pipeline_object, experiment_name=name) pipeline_job_run_id = pipeline_job.name pipeline_job This pipeline approach offers several advantages: Access to modular components (you can use only parts of the pipeline if needed) Distributed training across multiple compute instances Built-in monitoring and logging Seamless integration with the AzureML model registry The Code: Saving and Deploying your Models After the training job is completed, we register the model in the AzureML registry and deploy it as an online endpoint: # Create a Model to register run_model = Model( path=f"azureml://jobs/{pipeline_job.name}/outputs/classification_model", name=f"classifier-{name}-{pipeline_job.name}", description="Model created from run.", type=AssetTypes.MLFLOW_MODEL, ) # Register the Model run_model = ml_client.models.create_or_update(run_model) # Create endpoint and deployment with the classification model endpoint = ManagedOnlineEndpoint(name=name) endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result() deployment = ManagedOnlineDeployment( name=name, endpoint_name=endpoint.name, model=run_model.id, instance_type="Standard_NC6s_v3", instance_count=1, ) deployment = ml_client.online_deployments.begin_create_or_update(deployment).result() This deployment process creates a scalable API endpoint that can be integrated you’re your workflows, with built-in monitoring and scaling capabilities. Results and Making Sure It Works After fine-tuning MI2 on the GastroVision dataset, we can validate the quality of the resulting embeddings by evaluating their performance on a classification task. Method Macro Average Micro Average MCC mAUC Prec. Recall F1 Prec. Recall F1 ResNet-50 [2] 0.437 0.437 0.433 0.681 0.681 0.681 0.641 - Pre-trained DenseNet-121 [3] 0.738 0.623 0.650 0.820 0.820 0.820 0.798 - Greedy Soup (GenAI Aug)[4] 0.675 0.600 0.615 0.812 0.812 0.812 0.790 - Greedy Soup (Basic Aug) [4] 0.762 0.639 0.666 0.832 0.830 0.830 0.809 - MI Finetune 0.736 0.772 0.740 0.834 0.860 0.847 0.819 0.990 Using a KNN classifier, we achieve an impressive mAUC of 0.990 and SOTA in the other metrics. Though our goal was not to create the ultimate gastroenterology classifier, these results demonstrate that with minimal fine-tuning, MI2 produces embeddings that can power a state-of-the-art using only a KNN classifier. The real potential here goes far beyond classification. Imagine applying this same fine-tuning approach to your institution's specific imaging data. The resulting domain-adapted model would provide enhanced performance across all MI2's capabilities: More accurate outlier detection in your specific patient population More precise image retrieval for similar cases in your database Better multimodal analysis combining your radiology and pathology data Enhanced report generation tailored to your clinical workflows MI2's efficient architecture (0.36B/0.25B parameters for image/text encoder respectively) can be effectively adapted to specialized domains while maintaining its full range of capabilities. The classification metrics validate that the fine-tuning process has successfully adapted the embedding space to better represent your specific medical domain. Your Turn to Build! Fine-tuning MedImageInsight represents a significant opportunity to extend the capabilities of this powerful foundation model into specialized medical imaging domains and subspecialties. Through our demonstration with the GastroVision dataset, we have shown how MI2’s architecture, with just 0.36B and 0.25B parameters for the image and text encoder respectively, can be efficiently adapted to new tasks with competitive or superior performance compared to traditional approaches. The key features of fine-tuning MI2 include: Efficiency: Achieving high performance with minimal data and computational resources Versatility: Adapting to specialized domains while preserving multi-domain capabilities Practicality: Streamlined workflow from training to deployment using AzureML The fine-tuning process described here provides a pathway for healthcare institutions to develop production-ready, clinical-grade AI assets. By finetuning MedImageInsight and incorporating appropriate validation, testing, and regulatory compliance measures, the model can be transformed from a foundation model into specialized clinical tools optimized for your specific use cases and patient populations. With your finetuned, you gain several distinct advantages: Enhanced domain adaptation: Models that better understand the unique characteristics of your patient population and imaging equipment Improved rare condition detection: Higher sensitivity for conditions specific to your specialty or patient demographics Reduced false positives: Better differentiation between similar-appearing conditions common in your practice Customized explanations: More relevant evidence-based decisions through image-image search from your own database As healthcare institutions increasingly adopt AI for medical imaging analysis, the ability to fine-tune models for specific patient populations, imaging equipment, and clinical specialties become crucial. MedImageInsight’s efficient architecture and adaptability make it an ideal foundation for building specialized medical imaging solutions that can be deployed in resource-constrained environments. We encourage you to try fine-tuning MedImageInsight with your own specialized datasets using our sample Jupyter Notebook as your starting point. The combination of MI2’s regulatory-friendly features with domain-specific adaptations opens new possibilities for transparent, efficient, and effective AI-assisted medical imaging analysis. [1] Jha, D. et al. (2023). GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection. ICML Workshop on Machine Learning for Multimodal Healthcare Data (ML4MHD 2023). [2] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp. 770–778 (2016) [3] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) [4] Fahad, M. et al. (2025). Deep insights into gastrointestinal health. Biomedical Signal Processing and Control, 102, 107260.Microsoft Azure continues to expand scalability for Healthcare EHR Workloads
Microsoft Azure has reached a new milestone for Epic Chronicles Operational Database (ODB) scalability with the Standard_M416bs_v3 (Mbv3) VM. It can now scale up to 110 million GRefs/s (Global References per second) in the ECP configuration and up to 39 million GRefs/s in the SMP configuration, improving upon the previous Azure benchmarks of 65 million GRefs/s and 20 million GRefs/s respectively. Microsoft Azure now can host 96% of the Epic customer base, enabling healthcare organizations to run their EHR systems on Azure. New VM Size Purpose-Built for Epic’s Chronicles ODB The Standard_M416bs_v3 VM, newly added to Azure’s Mbv3 series, is purpose-built to meet the growing performance and scalability demands of large healthcare EHR environments. With higher CPU capacity, expanded memory and improved remote storage throughput, it delivers the reliability needed for mission-critical workloads at scale. Key specifications include: Mbv3 Processor Performance: Built on 4th Gen Intel® Xeon® Scalable processors, the Mbv3 series is optimized for high memory and storage performance, supporting workloads up to 4 TB of RAM with an NVMe interface for faster remote disk access. Compute Capacity: The Standard_M416bs_v3 delivers 416 vCPUs - more than twice the capacity of previous Mbv3 sizes, delivering stronger performance. Storage Performance: Achieves up to 550,000 IOPS and 10 GBps remote disk bandwidth using Azure Ultra Disk. Performance Optimization: Enhanced by Azure Boost, the M416bs_v3 provides low-latency, high remote storage performance, making it ideal for storage throughput-intensive applications such as Epic ODB, relational databases and analytics workloads. Available Regions: M416bs_v3 is available in 4 regions - East US, East US 2, Central US, and West US 2. Explore Epic on Azure to learn more. Epic and Chronicles are trademarks of Epic Systems Corporation.1.8KViews2likes1Comment
