healthcare
516 TopicsAzure OpenAI GPT model to review Pull Requests for Azure DevOps
In recent months, the use of Generative Pre-trained Transformer (GPT) models for natural language processing (NLP) has gained significant traction. GPT models, which are based on the Transformer architecture, can generate text from arbitrary sources of input data and can be trained to identify errors and detect anomalies in text. As such, GPT models are increasingly being used for a variety of applications, ranging from natural language understanding to text summarization and question-answering. In the software development world, developers use pull requests to submit proposed changes to a codebase. However, reviews by other developers can sometimes take a long time and not accurate, and in some cases, these reviews can introduce new bugs and issues. In order to reduce this risk, During my research I found the integration of GPT models is possible and we can add Azure OpenAI service as pull request reviewers for Azure Pipelines service. The GPT models are trained on developer codebases and are able to detect potential coding issues such as typos, syntax errors, style inconsistencies and code smells. In addition, they can also assess code structure and suggest improvements to the overall code quality. Once the GPT models have been trained, they can be integrated into the Azure Pipelines service so that they can automatically review pull requests and provide feedback. This helps to reduce the time taken for code reviews, as well as reduce the likelihood of introducing bugs and issues.45KViews4likes13CommentsFine-Tuning Healthcare AI Models: Discovering the Power of Finetuning MedImageInsight on Your Data
This post is part of our healthcare AI fine-tuning series: MedImageInsight Fine-Tuning - Embeddings and classification (you are here) MedImageParse Fine-Tuning - Segmentation and spatial understanding CXRReportGen Fine-Tuning - Clinical findings generation Introduction That’s the promise of MedImageInsight (MI2), Microsoft’s open-source foundation model that’s revolutionizing medical imaging analysis. Developed by Microsoft Health and Life Sciences, MedImageInsight is designed as a "generalist" foundation model, offering capabilities across diverse medical imaging fields. MI2 achieves state-of-the-art or human expert-level results in tasks like classification, image search, and 3D medical image retrieval. Its features include: Multi-domain versatility: Trained on medical images from fourteen different domains such as X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. State-of-the-art (SOTA) performance: Achieves SOTA or human expert-level results in tasks like classification, image-image search, and fine-tuning on public datasets, with proven excellence in CT 3D medical image retrieval, disease classification for chest X-ray, dermatology, OCT imaging, and even bone age estimation. Regulatory-ready features: When used on downstream tasks, MI2 allows for sensitivity/specificity adjustments to meet clinical regulatory requirements. Transparent decision-making: Provides evidence-based decision support through image-image search, image-text search, enhancing explainability. Efficient report generation: When paired with a text decoder, it delivers near state-of-the-art report generation using only 7% of the parameters compared to similar models. 3D capability: Leverages 3D image-text pre-training to achieve state-of-the-art performance for 3D medical image retrieval. Fairness: Out-performs other models in AI fairness evaluations across age and gender in independent clinical assessments. MI2 is available now through the Azure AI Foundry model catalog (docs) and has already demonstrated its value across numerous applications. We’ve made it even easier for you to explore its capabilities with our repository full of examples and code for you to try. It covers: Outlier detection: Encoding CT/MR series to spot anomalies. Zero-shot classification with text labels: Identifying conditions without prior training. Adapter training: Specializing in specific classification tasks. Exam parameter detection: Normalizing MRI series and extracting critical details. Multimodal adapter analysis: Merging insights from radiology and pathology Image search: Finding similar cases to aid diagnosis using both 2D images and 3D volumes (cross-sectional imaging). Model monitoring: Ensuring consistent performance over time (code coming soon). While these capabilities are impressive on their own, the true potential of MI2 lies in its adaptability. This is where fine-tuning comes in: the ability to customize this powerful foundation model for specific clinical applications at your institution. Fine-tuning currently available in public preview, can transform this foundation model into production-ready, clinical-grade assets tailored to your specific needs and workflow while maintaining regulatory compliance. Note: This blog post demonstrates how MedImageInsight can be fine-tuned for new data. This example is illustrative; however, the same process can be used to develop production-ready clinical assets when following appropriate regulatory guidelines. Teaching an Old (Actually New) AI New Tricks MedImageInsight’s architecture offers distinct advantages for fine-tuning: Lightweight design: MI2 utilizes a DaViT image encoder (360M parameters) and language encoder (252M parameters) Efficient scale: With a total of only 0.61B parameters compared to multi-billion parameter alternatives, MI2 requires significantly less computational resources than comparable models. Training flexibility: The model supports both image-text and image-label pairs for different training approaches. Solid foundation: Pre-trained on 3.7M+ diverse medical images, MI2 starts with robust domain knowledge. MI2 is ideal for fine-tuning to specific medical imaging domains, allowing for clinical applications that integrate into healthcare workflows after validation. The model maintains its strengths while adapting to specialized patterns and requirements. Using AzureML Pipelines for an MI2 Glow Up The Azure Machine Learning (AzureML) pipeline streamlines the fine-tuning process for MI2. This end-to-end workflow, available now as a public preview, manages everything from data preparation to model registration in a reproducible manner: To finetune MI2, we use an AzureML pipeline which streamlines the fine-tuning process with distributed training on GPU clusters. We’ve released five components into public preview to enable you to fine-tune MI2 and simplify related processes like generating a classifier model: MedImageInsight model finetuning core component (component) is the core component of the fine-tuning process that trains the MedImageInsight model. This component requires four separate TSV files as input: an image TSV and a text TSV for training, plus the same two files for evaluation, TSV file of the all the possible text strings and a training configuration YAML file. This component supports distributed training on a multi-GPU cluster. MedImageInsight embedding generation component (component) creates embeddings from images using the MedImageInsight model. It allows customization of image quality and dimensions, and outputs a pickled NumPy array containing embeddings for all processed images. MedImageInsight adapter finetune component (component) takes NumPy arrays of training and validation data along with their associated text labels (from TSV). It trains a specialized 3-layer model designed for classification tasks and optimizes performance for specific domains while maintaining MI2's core capabilities. MedImageInsight image classifier assembler component (component) combines your fine-tuned embedding model with a label file into a deployable image classifier. This component takes the finetune MI2 embedding model, text labels and an optional adapter model then packages them into a unified MLFlow model ready for deployment. The resulting model package can operate in either zero-shot mode or with a custom adapter model. MedImageInsight pipeline component (component) provides an end-to-end pipeline component that integrates all components into one workflow. It is a simple pipeline that trains, evaluates, and outputs the embedding and classification models Example Dataset: GastroVision To demonstrate MI2's fine-tuning capabilities, we're using the GastroVision dataset [1] as a real-world example. It's important to note that our goal here is not to build the ultimate gastroenterology classifier, but rather to showcase how MI2 can be effectively fine-tuned. The techniques demonstrated can be applied to your institution's data to create customized embedding models that support not only classification, but all the applications we’ve mentioned, from zero shot classification and outlier detection to image search and multimodal analysis. The GastroVision dataset offers an excellent test case for several reasons: Open-access dataset: 8,000 endoscopy images collected from two hospitals in Norway and Sweden Diverse classes: Spans 27 distinct classes of gastrointestinal findings with significant class imbalance Real-world challenges: High similarity between certain classes, multi-center variability, and rare findings with limited examples Recent publication: Published in 2023 and wasn't included in MI2's original training data. With approximately 8,000 endoscopic images labeled across 27 different classes, this dataset provides a practical context for fine-tuning MI2's embedding capabilities. By demonstrating how MI2 can adapt to new data, we illustrate how you might fine-tune the model on your own data to create production-ready, clinical-grade specialized embedding models tailored to your unique imaging environments, equipment, and patient populations. The Code: Getting the Data Prep’d The first step in fine-tuning MI2 is preparing your dataset. For the GastroVision dataset, we need to preprocess the images and structure the data in a format suitable for training: def gastro_folder_to_text(folder_name): label = folder_to_label[folder_name] view = labels_to_view[label] return f"endoscopy gastrointestinal {view} {label}" gastrovision_root_directory = "/home/azureuser/data/Gastrovision" text_to_label = {} folders = os.listdir(gastrovision_root_directory) for folder in folders: label = folder_to_label[folder] text = gastro_folder_to_text(folder) text_to_label[text] = label data = [] files = list(glob.glob(os.path.join(gastrovision_root_directory, "**/*.jpg"), recursive=True)) for file_path in tqdm(files, ncols=120): folder = os.path.basename(os.path.dirname(file_path)) filename = os.path.basename(file_path) text = gastro_folder_to_text(folder) with Image.open(file_path) as img: img = img.resize((512, 512)).convert("RGB") buffered = BytesIO() img.save(buffered, format="JPEG", quality=95) img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") data.append( [f"{folder}/{filename}-{os.path.basename(file_path)}", img_str, text] ) df = pd.DataFrame(data, columns=["filename", "image", "text"]) This preprocessing pipeline does several important tasks: Resizing and standardizing images to 512x512 pixels Converting images to base64 encoding for efficient storage. Then we convert the encoded images into the TSV: # Function to format text as JSON def format_text_json(row): return json.dumps( { "class_id": text_index[row["text"]], "class_name": row["text"], "source": "gastrovision", "task": "classification", } ) # Filter the dataframe to only include the top 22 text captions df_filtered = df[df["text"].isin(df["text"].value_counts().index[:22])].reset_index( drop=True ) # Get unique texts from the filtered dataframe unique_texts = df_filtered["text"].unique() # Save the unique texts to a text file with open("unique_texts.txt", "w") as f: for text in unique_texts: f.write(text + "\n") # Create a dictionary to map text labels to indices text_index = {label: index for index, label in enumerate(unique_texts)} # Apply the formatting function to the text column df_filtered["text"] = df_filtered.apply(format_text_json, axis=1) # Split the dataframe into training, validation, and test sets train_df, val_test_df = train_test_split( df_filtered, test_size=0.4, random_state=42, stratify=df_filtered["text"] ) validation_df, test_df = train_test_split( val_test_df, test_size=0.5, random_state=42, stratify=val_test_df["text"] ) # Create separate dataframes for images and labels and save the dataframes to TSV files def split_and_save_tsvs(aligned_df, prefix): image_df = aligned_df[["filename", "image"]] text_df = aligned_df[["filename", "text"]] text_df.to_csv( f"{prefix}_text.tsv", sep="\t", index=False, header=False, quoting=csv.QUOTE_NONE, ) image_df.to_csv(f"{prefix}_images.tsv", sep="\t", index=False, header=False) split_and_save_tsvs(train_df, "train") split_and_save_tsvs(validation_df, "validation") split_and_save_tsvs(test_df, "test") Filtering to include only classes with sufficient samples. Creating label mappings for classification. Splitting data into training, validation, and test sets. Exporting processed data as TSV files for AzureML. After preparing the datasets, we need to upload them to AzureML as data assets: name = "gastrovision" assets = { "image_tsv": "train_images.tsv", "text_tsv": "train_text.tsv", "eval_image_tsv": "validation_images.tsv", "eval_text_tsv": "validation_text.tsv", "label_file": "unique_texts.txt", } data_assets = { key: Data( path=value, type=AssetTypes.URI_FILE, description=f"{name} {key}", name=f"{name}-{key}", ) for key, value in assets.items() } for key, data in data_assets.items(): data_assets[key] = ml_client.data.create_or_update(data) print( f"Data asset {key} created or updated.", data_assets[key].name, data_assets[key].version, ) These uploaded assets are versioned in AzureML, allowing for reproducibility and tracking of which specific data was used for each training run. The Code: Cue the Training Montage In the notebook, we demonstrate a straightforward example of finetuning using the pipeline component, but you can integrate these components into larger pipelines that train more complex downstream tasks such as exam parameter classification, report generation, analysis of 3D scans, etc. conf_file = "train-gastrovision.yaml" data = Data( path=conf_file, type=AssetTypes.URI_FILE, description=f"{name} conf_files", name=f"{name}-conf_files", ) data_assets["conf_files"] = ml_client.data.create_or_update(data) # Get the pipeline component finetune_pipline_component = ml_registry.components.get( name="medimage_insight_ft_pipeline", label="latest" ) # Get the latest MI2 model model = ml_registry.models.get(name="MedImageInsight", label="latest") @pipeline(name="medimage_insight_ft_pipeline_job" + str(random.randint(0, 100000))) def create_pipeline(): mi2_pipeline = finetune_pipline_component( mlflow_embedding_model_path=model.id, compute_finetune=compute.name, instance_count=8, **data_assets, ) return { "classification_model": mi2_pipeline.outputs.classification_mlflow_model, "embedding_model": mi2_pipeline.outputs.embedding_mlflow_model, } pipeline_object = create_pipeline() pipeline_object.compute = compute.name pipeline_object.settings.continue_on_step_failure = False pipeline_job = ml_client.jobs.create_or_update(pipeline_object, experiment_name=name) pipeline_job_run_id = pipeline_job.name pipeline_job This pipeline approach offers several advantages: Access to modular components (you can use only parts of the pipeline if needed) Distributed training across multiple compute instances Built-in monitoring and logging Seamless integration with the AzureML model registry The Code: Saving and Deploying your Models After the training job is completed, we register the model in the AzureML registry and deploy it as an online endpoint: # Create a Model to register run_model = Model( path=f"azureml://jobs/{pipeline_job.name}/outputs/classification_model", name=f"classifier-{name}-{pipeline_job.name}", description="Model created from run.", type=AssetTypes.MLFLOW_MODEL, ) # Register the Model run_model = ml_client.models.create_or_update(run_model) # Create endpoint and deployment with the classification model endpoint = ManagedOnlineEndpoint(name=name) endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result() deployment = ManagedOnlineDeployment( name=name, endpoint_name=endpoint.name, model=run_model.id, instance_type="Standard_NC6s_v3", instance_count=1, ) deployment = ml_client.online_deployments.begin_create_or_update(deployment).result() This deployment process creates a scalable API endpoint that can be integrated you’re your workflows, with built-in monitoring and scaling capabilities. Results and Making Sure It Works After fine-tuning MI2 on the GastroVision dataset, we can validate the quality of the resulting embeddings by evaluating their performance on a classification task. Method Macro Average Micro Average MCC mAUC Prec. Recall F1 Prec. Recall F1 ResNet-50 [2] 0.437 0.437 0.433 0.681 0.681 0.681 0.641 - Pre-trained DenseNet-121 [3] 0.738 0.623 0.650 0.820 0.820 0.820 0.798 - Greedy Soup (GenAI Aug)[4] 0.675 0.600 0.615 0.812 0.812 0.812 0.790 - Greedy Soup (Basic Aug) [4] 0.762 0.639 0.666 0.832 0.830 0.830 0.809 - MI Finetune 0.736 0.772 0.740 0.834 0.860 0.847 0.819 0.990 Using a KNN classifier, we achieve an impressive mAUC of 0.990 and SOTA in the other metrics. Though our goal was not to create the ultimate gastroenterology classifier, these results demonstrate that with minimal fine-tuning, MI2 produces embeddings that can power a state-of-the-art using only a KNN classifier. The real potential here goes far beyond classification. Imagine applying this same fine-tuning approach to your institution's specific imaging data. The resulting domain-adapted model would provide enhanced performance across all MI2's capabilities: More accurate outlier detection in your specific patient population More precise image retrieval for similar cases in your database Better multimodal analysis combining your radiology and pathology data Enhanced report generation tailored to your clinical workflows MI2's efficient architecture (0.36B/0.25B parameters for image/text encoder respectively) can be effectively adapted to specialized domains while maintaining its full range of capabilities. The classification metrics validate that the fine-tuning process has successfully adapted the embedding space to better represent your specific medical domain. Your Turn to Build! Fine-tuning MedImageInsight represents a significant opportunity to extend the capabilities of this powerful foundation model into specialized medical imaging domains and subspecialties. Through our demonstration with the GastroVision dataset, we have shown how MI2’s architecture, with just 0.36B and 0.25B parameters for the image and text encoder respectively, can be efficiently adapted to new tasks with competitive or superior performance compared to traditional approaches. The key features of fine-tuning MI2 include: Efficiency: Achieving high performance with minimal data and computational resources Versatility: Adapting to specialized domains while preserving multi-domain capabilities Practicality: Streamlined workflow from training to deployment using AzureML The fine-tuning process described here provides a pathway for healthcare institutions to develop production-ready, clinical-grade AI assets. By finetuning MedImageInsight and incorporating appropriate validation, testing, and regulatory compliance measures, the model can be transformed from a foundation model into specialized clinical tools optimized for your specific use cases and patient populations. With your finetuned, you gain several distinct advantages: Enhanced domain adaptation: Models that better understand the unique characteristics of your patient population and imaging equipment Improved rare condition detection: Higher sensitivity for conditions specific to your specialty or patient demographics Reduced false positives: Better differentiation between similar-appearing conditions common in your practice Customized explanations: More relevant evidence-based decisions through image-image search from your own database As healthcare institutions increasingly adopt AI for medical imaging analysis, the ability to fine-tune models for specific patient populations, imaging equipment, and clinical specialties become crucial. MedImageInsight’s efficient architecture and adaptability make it an ideal foundation for building specialized medical imaging solutions that can be deployed in resource-constrained environments. We encourage you to try fine-tuning MedImageInsight with your own specialized datasets using our sample Jupyter Notebook as your starting point. The combination of MI2’s regulatory-friendly features with domain-specific adaptations opens new possibilities for transparent, efficient, and effective AI-assisted medical imaging analysis. [1] Jha, D. et al. (2023). GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection. ICML Workshop on Machine Learning for Multimodal Healthcare Data (ML4MHD 2023). [2] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp. 770–778 (2016) [3] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) [4] Fahad, M. et al. (2025). Deep insights into gastrointestinal health. Biomedical Signal Processing and Control, 102, 107260.Microsoft Azure continues to expand scalability for Healthcare EHR Workloads
Microsoft Azure has reached a new milestone for Epic Chronicles Operational Database (ODB) scalability with the Standard_M416bs_v3 (Mbv3) VM. It can now scale up to 110 million GRefs/s (Global References per second) in the ECP configuration and up to 39 million GRefs/s in the SMP configuration, improving upon the previous Azure benchmarks of 65 million GRefs/s and 20 million GRefs/s respectively. Microsoft Azure now can host 96% of the Epic customer base, enabling healthcare organizations to run their EHR systems on Azure. New VM Size Purpose-Built for Epic’s Chronicles ODB The Standard_M416bs_v3 VM, newly added to Azure’s Mbv3 series, is purpose-built to meet the growing performance and scalability demands of large healthcare EHR environments. With higher CPU capacity, expanded memory and improved remote storage throughput, it delivers the reliability needed for mission-critical workloads at scale. Key specifications include: Mbv3 Processor Performance: Built on 4th Gen Intel® Xeon® Scalable processors, the Mbv3 series is optimized for high memory and storage performance, supporting workloads up to 4 TB of RAM with an NVMe interface for faster remote disk access. Compute Capacity: The Standard_M416bs_v3 delivers 416 vCPUs - more than twice the capacity of previous Mbv3 sizes, delivering stronger performance. Storage Performance: Achieves up to 550,000 IOPS and 10 GBps remote disk bandwidth using Azure Ultra Disk. Performance Optimization: Enhanced by Azure Boost, the M416bs_v3 provides low-latency, high remote storage performance, making it ideal for storage throughput-intensive applications such as Epic ODB, relational databases and analytics workloads. Available Regions: M416bs_v3 is available in 4 regions - East US, East US 2, Central US, and West US 2. Explore Epic on Azure to learn more. Epic and Chronicles are trademarks of Epic Systems Corporation.1.6KViews2likes1CommentTransforming Clinical Workflows with Microsoft Dragon Copilot’s Partner-Driven AI Innovations
Healthcare software integration is notoriously complex, demanding technical rigor, regulatory compliance, organizational alignment, and workflow transformation. Clinicians face an overwhelming cognitive load—during a typical primary care visit, a physician may juggle over 150 discrete data points and make a dozen crucial decisions in less than 15 minutes. In this environment, AI must do more than offer intelligence—it must deliver the right information, at the right time, without increasing administrative burden or disrupting the clinician’s focus on patient care. Microsoft is expanding Dragon Copilot by enabling an open ecosystem of developer-built AI solutions for specific tasks and specialties, allowing healthcare organizations to access clinical intelligence at the point of care while further streamlining workflows, reducing administrative burden, boosting productivity, and enhancing revenue integrity. Dragon Copilot’s partner-driven AI extensibility empowers technology providers to deliver specialized solutions directly into clinical workflows, accelerating AI adoption at the point of care and enabling better outcomes for patients and clinicians. By opening Dragon Copilot, Microsoft enables the creation of seamless, scalable solutions that help clinicians reduce administrative tasks and improve patient care. Developers gain an opportunity to collaborate with industry leaders, shape healthcare technology, and deliver impactful AI applications for clinicians and patients. Partner Success Spotlight: Canary Health’s Innovative Solution Canary Speech is a Utah-based, AI-powered voice biomarker health tech company, utilizing patented real-time vocal analysis to screen for mental health and neurological disorders. Canary’s technology swiftly captures and analyzes speech data supporting clinicians in identifying cognitive and behavioral changes. Recently, Canary Speech launched Canary Ambient™, an API-first solution for real-time voice analysis in healthcare and contact centers. This software provides actionable insights from patient-clinician conversations by tracking speech patterns for real-time assessments of cognitive and behavioral health conditions. Canary Speech advances speech and language applications across health systems, payers, and pharmaceutical markets. The collaboration with Dragon Copilot is seamless and secure. Canary Speech receives patient audio—ideally during the encounter—via infrastructure designed to enable HIPAA compliance and newly developed endpoints. Their AI processes the audio, then surfaces summarized cognitive and behavioral assessments using Microsoft’s Adaptive Card framework. These interactive cards appear natively within Dragon Copilot, enabling clinicians to quickly review findings and to initiate follow-up assessments or treatment plans—all without leaving their workflow. The Canary Speech solution with Dragon Copilot is currently and being validated with real clinicians and patients. It demonstrates how Dragon Copilot can share not only encounter audio, but also clinical notes, patient context, and relevant history with trusted partners—expanding the universe of AI-powered insights available at the point of care. How AI Applications Work: Technical Enablement for Continuous Innovation Dragon Copilot’s extensibility framework supports partners in surfacing AI insights and enrichments at multiple points in the clinical workflow: Pre-encounter – Provide AI-generated insights to clinicians before they see the patient, supporting preparation and personalized care planning. During encounter – Surface real-time, contextual recommendations while the provider is examining the patient, ensuring decisions are timely and informed. Post-encounter – Deliver follow-up insights after the clinical note is generated, enabling comprehensive care coordination and outcome tracking. Microsoft’s Adaptive Card framework makes it easy for developers to build interactive platform-agnostic content blocks that integrate directly into Dragon Copilot. These cards support rich text, reference links, action buttons, and in-card feedback, creating a unified and intuitive user experience for clinicians. The following diagram shows the data flow between Microsoft Dragon Copilot and partner endpoints that provides extensions such as Canary, via infrastructure designed to enable HIPAA compliance, access to the encounter audio data. In addition to audio, other data elements such as turn transcript and the clinical note can also be made available to extensions. Driving Innovation Together through Partnership Our commitment to driving innovation through collaboration is sustained by actively engaging with partners and continually analyzing the market to keep our solutions effective and aligned with evolving needs. At this time, we have a growing list of early partners that have signaled interest in bringing their solutions to the point of care, through Dragon Copilot and help to represent the leading edge of healthcare technology, each bringing specialized expertise and transformative solutions to the table. Their work exemplifies how collaborative innovation can address real-world challenges across the care continuum. Artisight’s Smart Hospital Platform, brings hands-free, device-free documentation directly to the bedside. By pairing a clinician’s unique voice profile with their secure RTLS badge, Artisight eliminates logins and shared device interactions to allow care teams to focus on patients, not paperwork. Ambient documentation runs seamlessly in the background, generating accurate clinical notes while freeing clinicians to deliver more meaningful, face-to-face care. This integration streamlines workflows, reduces administrative burden, and drives measurable improvements in efficiency, compliance, and the overall patient experience. Atropos Health is the leader in translating real-world clinical data into personalized real-world evidence (RWE) and insights. Atropos Health is the developer of GENEVA OS®, the operating system for rapid healthcare evidence across a robust network of real-world data. Healthcare and life science organizations work with Atropos Health to close evidence gaps from bench to bedside, improving individual patient outcomes with data-driven care, expediting research that advances the field of medicine, and more. We aim to transform healthcare with timely, relevant real-world evidence. Cohere Health’s prior authorization application automates care requests, eliminating the administrative burden for clinicians and enabling real-time care approvals. Cohere's solution provides critical transparency right at the point of care while collecting relevant clinical documentation, patient summaries, and orders. It shares prior authorization requirements and offers in-the-moment guidance to physicians, helping them secure faster approvals and ensure patients receive the care they need more quickly. Elsevier's ClinicalKey AI delivers fast, reliable clinical insights at the point of care—combining proprietary medical content, leading textbooks, top scientific journals, government publications, and clinical guidelines into one trusted platform. Powered by advanced AI and backed by expert human review, it ensures answers are clinically relevant and backed by research. With privacy built in and citations clinicians can trust, ClinicalKey AI helps healthcare professionals make confident, informed decisions faster. Ensemble’s revenue cycle intelligence engine, EIQ, helps health systems proactively prevent payment denials, improve revenue integrity, and accelerate cash flow. Drawing on insights from more than 80,000 denial audit letters and 80 million annual claim transactions, EIQ analyzes clinical documentation against historic payer patters to detect nuanced issues that could trigger DRG downgrades or medical necessity rejections. It flags issues for operator intervention with context-aware prompts designed to ensure diagnoses are backed by precise clinical evidence and meet payer documentation requirements. Beyond denial prevention, EIQ strengthens the documentation foundation needed for downstream audits, ensuring that every claim is defensible and complete. hellocare.ai is a leading provider of AI-assisted virtual care solutions. Headquartered in Clearwater, FL, the company supports more than 80 health systems across the U.S. and is rapidly expanding globally. hellocare.ai helps health systems deliver high-quality, patient-centered care while improving clinical efficiency and staff wellbeing. Its fully integrated platform includes AI-Assisted Virtual Nursing, Virtual Sitting, Patient Engagement, Digital Whiteboard, Digital Room Signage, Ambient Documentation, Hospital-at-Home, Remote Patient Monitoring (RPM), and Digital Clinic, seamlessly embedding into existing healthcare EHRs, infrastructure, and care delivery models to power the next generation of healthcare. Humata Health is revolutionizing clinical clearance, moving beyond the narrow confines of traditional prior authorization to address the entire patient journey, from referrals and specialty drugs to post-acute care. By bringing providers and payers together on a single, shared infrastructure, its technology uses powerful AI and automation to streamline fragmented approvals. This collaborative approach unlocks faster, smarter, and more confident decisions, ensuring patients receive the appropriate care they deserve, exactly when they need it. Driven by a vision to see care move forward, Humata Health is built for yes. Lightbeam Health Solutions helps healthcare organizations manage and scale population health programs to improve performance within risk-based contracts. Leveraging clinical, health plan, and social determinants of health data, Lightbeam identifies and delivers actionable insights to clinicians and patients for proactive intervention. Proven outcomes include quality improvement, risk adjustment accuracy, reduced cost of care and avoidable utilization, and a better experience for clinicians and patients alike. Optum is collaborating with Microsoft to integrate ambient listening technology into clinical settings with Optum Real, helping physicians reduce documentation time and focus more on patient care. This integration supports smarter, real-time clinical workflows through AI-powered automation. Pangaea Data’s AI platform helps close care gaps by finding untreated and under-treated patients – including those who are undiagnosed, misdiagnosed, or miscoded – across hard-to–diagnose conditions. Seamlessly integrated with electronic health record (EHR) systems, the platform processes patient’s medical records along with their conversation with the treating clinician at the point of care, empowering clinicians to address care gaps without disrupting workflows. This enables better outcomes, lower costs through earlier diagnosis, smarter triaging, and more appropriate levels of revenue through faster pre-authorizations and treatment initiation. Built and deployed on Microsoft Azure, the platform ensures full compliance with privacy standards. Press Ganey turns patient-clinician conversations into actionable insights. By analyzing both content and tone alongside patient experience data, the platform helps organizations address concerns early and gives clinicians a more complete understanding of their patients. As AI reshapes healthcare, Regard is working towards its mission to ensure every patient receives exceptional care while giving physicians more time with their patients. Regard’s Proactive Documentation platform reviews all data in the medical record, and from physician-patient conversations, to recommend diagnoses to providers and surface critical clinical context. This diagnosis-first approach delivers accurate documentation at the point of care, supporting both clinical and revenue cycle teams, by ensuring no critical information is missed. Rhyme eliminates prior authorization through touchless workflows and a new sustainable approach to gold carding. They help 89 of the largest hospital systems and over 300 payers work together to process over 5 million auths per year, save millions of dollars, reduce claim denials, lower patient costs, and accelerate time to care. RhythmX AI enables clinicians to deliver hyper-personalized treatments that expand access and capture measurable value. The platform optimizes medical economics in both fee-for-service and value-based care settings. Built on more than $1B in R&D and validated through coding reviews and 25,000+ clinical assessments, it unifies 8+ data sources – including clinical, financial, payer, and formulary data – and can incorporate health system–specific guidelines and pathways into real-time care recommendations. RhythmX AI drives care orchestration across primary, specialty, and emergency settings, enabling comprehensive diagnoses, optimized specialist referrals and measurable ROI (e.g., $57M annually for 200 PCPs). With access to 300M patient records, 4.4B annual claims, 1.8M clinicians and 300K facilities, the platform provides broad data coverage, consistent accuracy in suspected condition identification and scalable precision-care capabilities across diverse care environments. Through these strategic collaborations, Dragon Copilot continues to expand its reach and deepen its impact, supporting a dynamic environment where developers, clinicians, and organizations can thrive. The collective expertise of our partners is not only shaping the future of healthcare technology but also driving meaningful change for patients and clinicians everywhere. Join Us in Shaping the Future of Healthcare AI Enabling partner-driven AI applications into Dragon Copilot ushers in a transformative period of collaborative innovation in healthcare. Through Dragon Copilot’s extensibility framework, we’re accelerating the delivery of clinical intelligence, reducing friction, and ensuring that every provider has the insights they need to deliver the highest quality care. Whether you’re building the next breakthrough AI application or seeking to bring transformative solutions to clinicians worldwide, we invite you to join our community. Ready to shape the future of healthcare AI? Explore our developer resources, connect with our team, and become a Dragon Copilot extensibility partner today. Read through our documentation on how extensions for Dragon Copilot work and how to build your own - https://learn.microsoft.com/en-us/industry/healthcare/dragon-copilot/extensions Check out the sample repo with sample code, and more – https://github.com/microsoft/dragon-copilot-extension-samples Contact dragon_extensions@microsoft.com Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations. Generative AI Disclaimer: Generative AI does not always provide accurate or complete information. AI outputs do not reflect the opinions of Microsoft. Customers/partners will need to thoroughly test and evaluate whether an AI tool is fit for the intended use and identify and mitigate any risks to end users associated with its use. Customers/partners should thoroughly review the product documentation for each tool.Healthcare agent service in Microsoft Copilot Studio is now Generally Available
Healthcare organizations continue to face immense challenges: workforce shortages, rising costs, and growing demands for patient care. The clinical staff is overburdened, leading to stress, burnout, and staff shortages. Generative AI presents a powerful opportunity when it can automate administrative workflows, surface relevant insights, and assist the clinical staff with contextual, credible and up-to-date information. With that opportunity, we are excited to announce General Availability (GA) of healthcare agent service in Microsoft Copilot Studio. Building responsible, AI-powered healthcare agents With healthcare agent service, organizations can create healthcare-specialized AI applications that use generative AI within a framework that promotes trust, compliance, and real-world clinical scenarios. Agents combine built-in credible medical sources, such as FDA, CDC, MedlinePlus, MSD Manuals, DailyMed and more, with the organization’s own knowledge sources and plugins, while leveraging healthcare-specific actions. Customers can define the intended healthcare roles, such as healthcare professionals or patients, so the behavior is relevant and appropriate for the audience and use case. Pre-built use cases include clinical documentation assistance, patient self-service, helping healthcare professionals triage by organizing information, finding medication information, accessing recent clinical guidelines information, and more. Because responsible AI in healthcare is a top priority, healthcare agent service is infused with safeguards that are reinforced by a healthcare-adapted orchestrator optimized for safety. Clinical, chat, and compliance safeguards help keep interactions evidence-based and trustworthy, increasing the reliability and accuracy of generated responses and adherence to the highest standards of safety, privacy, and regulatory compliance. Healthcare agent service underscores our ongoing commitment to responsible AI in healthcare, by offering customers a reliable, production-ready foundation for healthcare solutions that can be used to help support patients and medical professionals. Extending Dragon Copilot with conversational solutions Healthcare agent service provides a framework for building conversational AI applications that can be integrated directly into Dragon Copilot, giving partners and healthcare organizations the ability to extend its functionality in a scalable, compliant way. Today, Information Assist in Dragon Copilot, built on healthcare agent service, delivers safeguarded generative AI answers grounded in trusted sources and enriched with patient history and context, ensuring clinicians receive accurate, timely, and context-aware insights. Clinicians can effortlessly access a broad range of clinical topics directly within their workflow using natural language, surfacing insights from leading, trusted healthcare content partners that promote more informed clinical decisions with less administrative work. Partners and healthcare organizations can use healthcare agent service to create tailored solutions with built-in safeguards that help ensure output meets healthcare standards and supports safe decision-making at the point of care. These solutions can be integrated directly into Dragon Copilot to enhance both clinical and financial performance. Real-world impact with customers Healthcare organizations are already adopting healthcare agent service to bring generative AI into real-world care settings. Early adopters are seeing meaningful impact in reducing administrative burden, improving patient experience, and empowering clinicians with trusted information. Bayer Pharmaceuticals has recently worked with Microsoft to enable new agentic AI workflows for drug submission using healthcare agent service in Copilot Studio: “We have collaborated with Microsoft to build an AI-powered multi-agent decision board using the healthcare agent service in Copilot Studio. This multi-agent decision board revolutionizes how we strategize drug submissions, pricing, and patient targeting for global market access. By simulating expert board discussions and synthesizing diverse data—from regulatory approvals to health economics and real-world evidence—the system streamlines the complex process of securing drug reimbursement. Healthcare agent service helped us get results quicker, empowering teams to make smarter, data-driven decisions without replacing human expertise, which would enable better access to life-changing therapies for patients worldwide. Importantly, this tool is not limited to pharmaceutical companies. It also supports decision-making for health authorities, NGOs, and other stakeholders across the healthcare ecosystem—enabling more informed, collaborative, and impactful choices that benefit public health at large.” — Shay Zohar, local Market Access Director and member of Bayer Pharmaceutical’s global Early Access team Allgemeines Krankenhaus (AKH) Wien, the largest hospital in Vienna, Austria and the Medical University of Vienna collaborated with Microsoft to extend Dragon Copilot with healthcare agent service, to automate pre-anesthesia intake. “Transforming pre-anesthesia assessments with AI agents for greater efficiency has a great potential to decrease the administrative burden on anesthesiologists. In this project we used healthcare agent service to extend Dragon Copilot with AI-powered agents that automate pre-anesthesia intake to enhance clinical documentation, significantly reducing the administrative workload for anesthesiologists. By orchestrating conversational and workflow agents, the solution interacts with patients, completes assessments, checks for data conflicts, and generates clinical notes, all consolidated for physician review in Dragon Copilot.” — Dr. Oliver Kimberger, Professor for Perioperative Information Management at the Department of General Anesthesia and Intensive Care Medicine, AKH Wien. Empowering healthcare innovation Healthcare agent service offers a low-code interface for building and deploying custom AI solutions with chat, compliance and clinical safeguards that support safety and accuracy in generative AI. With seamless integration and the ability to extend the capabilities of Dragon Copilot, you gain the flexibility to tailor solutions to your organization’s evolving needs. Learn more in healthcare agent service in Copilot Studio documentation Explore the possibilities with Microsoft Copilot Studio Expand your knowledge about Microsoft for Healthcare Discover how we are shaping the future of health with cutting-edge solutions and collaborative efforts here Medical Device Disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations. Generative AI Disclaimer: Generative AI does not always provide accurate or complete information. AI outputs do not reflect the opinions of Microsoft. Customers/partners will need to thoroughly test and evaluate whether an AI tool is fit for the intended use and identify and mitigate1.5KViews0likes0CommentsAgentic AI in Healthcare
Healthcare organizations are at a crossroads where rising patient loads, complex data, and administrative burdens demand new solutions. Agentic AI – AI systems capable of autonomous action – is emerging as a catalyst for transformation, promising to act not just as tools but as collaborative digital team members. Microsoft’s ecosystem of AI technologies provides a robust foundation to harness agentic AI in healthcare. This report offers a comprehensive overview of agentic AI, distinguishes it from traditional AI, and explores its role in clinical workflows, administrative efficiency, patient engagement, and data governance. It also examines how Microsoft’s offerings (Microsoft 365 Copilot, Azure Health Data Services, Microsoft Fabric, Copilot Studio, and more) enable these advances responsibly and in compliance with healthcare regulations like HIPAA.Copilot Chat: Prompting
To start a new prompt, head over to Copilot Chat and hit the blue chat button in the upper right corner. 🔄 When should I start a new chat? A good rule of thumb: hit that button whenever you're switching contexts or subject areas. This helps keep Copilot focused and prevents information from getting muddled.🍸 🧪 How do I improve my prompts? To get the best results, use the GCSE Formula—that’s: Goal: What do you want Copilot to do? Context: What background info will help? Source: Where should Copilot pull from? Expectations: What kind of output do you want? 🧩 Example Here’s a basic prompt: Give me a concise summary of recent news about Pfizer. Now let’s expand it using the GCSE Formula: Summarize the latest news about Pfizer from reputable sources like Reuters or Bloomberg. Focus on developments in their vaccine pipeline and financial performance. Keep it concise—under 150 words. Let's see how this might look like in practice. This is my prompt: Give me a concise summary of recent news about Pfizer. Let's try expanding this to include our other key ingredients: 🎯 Challenge Try using the GCSE Formula in your next prompt and compare it to using just the goal. See how your results stack up!Copilot Chat: Downloads
On PC and Mac: Follow the download links below to install the Copilot Chat desktop app. Double-click the installer when prompted, and you're in. Windows: Microsoft 365 Copilot - Free download and install on Windows | Microsoft Store MacOS: Microsoft 365 Copilot on the App Store On Mobile: Scan the QR code to download the app to your device. In Your Browser: Prefer not to download anything? You can also access Copilot Chat from Microsoft 365 Copilot Chat. Once you're in, try starting a conversation in the prompt box. Not sure where to begin? No worries—use or tweak one of the suggested prompts to get going. Here are a few other handy entry points:1.4KViews4likes0CommentsImage Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology
Introduction As the use of diagnostic 3D images increases, effective management and analysis of these large volumes of data grows in importance. Medical 3D image search systems can play a vital role by enabling clinicians to quickly retrieve relevant or similar images and cases based on the anatomical features and pathologies present in a query image. Unlike traditional 2D imaging, 3D imaging offers a more comprehensive view for examining anatomical structures from multiple planes with greater clarity and detail. This enhanced visualization has potential to assist doctors with improved diagnostic accuracy and more precise treatment planning. Moreover, advanced 3D image retrieval systems can support evidence-based and cohort-based diagnostics, demonstrating an opportunity for more accurate predictions and personalized treatment options. These systems also hold significant potential for advancing research, supporting medical education, and enhancing healthcare services. This blog offers guidance on using Azure AI Foundry and the recently launched healthcare AI models to design and test a 3D image search system that can retrieve similar radiology images from a large collection of 3D images. Along with this blog, we share a Jupyter Notebook with the the 3D image search system code, which you may use to reproduce the experiments presented here or start you own solution. 3D Image Search Notebook: http://aka.ms/healthcare-ai-examples-mi2-3d-image-search It is important to highlight that the models available on the AI Foundry Model Catalog are not designed to generate diagnostic-quality results. Developers are responsible for further developing, testing, and validating their appropriateness for specific tasks and eventually integrating these models into complete systems. The objective of this blog is to demonstrate how this can be achieved efficiently in terms of data and computational resources. The Problem Generally, the problem of 3D image search can be posed as retrieving cross-sectional (CS) imaging series (3D image results) that are similar to a given CS imaging series (query 3D image). Once posited this way, the key question becomes how to define such similarity? In the previous blog of this series, we worked with radiographs of the chest which constrained the notion of "similar" to the similarity between two 2D images, and a certain class of anatomy. In the case of 3D images, we are dealing with a volume of data, and a lot more variations of anatomy and pathologies, which expands the dimensions to consider for similarity; e.g., are we looking for similar anatomy? Similar pathology? Similar exam type? In this blog, we will discuss a technique to approximate the 3D similarity problem through a 2D image embedding model and some amount of supervision to constrain the problem to a certain class of pathologies (lesions) and cast it as "given cross-sectional MRI image , retrieve series with similar grade of lesions in similar anatomical regions". To build a search system for 3D radiology images using a foundation model (MedImageInsight) designed for 2D inputs, we explore the generation of representative 3D embedding vectors for the volumes with the foundation model embeddings of 2D slices to create a vector index from a large collection of 3D images. Retrieving relevant results for a given 3D image then consists in generating a representative 3D image embedding vector for the query image and searching for similar vectors in the index. An overview of this process is illustrated in Figure 1. Figure 1: Overview of the 3D image search process. The Data In the sample notebook that is provided alongside this blog, we use 3D CT images from the Medical Segmentation Decathlon (MSD) dataset [2-3] and annotations from the 3D-MIR benchmark [4]. The 3D-MIR benchmark offers four collections (Liver, Colon, Pancreas, and Lung) of positive and negative examples created from the MSD dataset with additional annotations related to the lesion flag (with/without lesion), and lesion group (1, 2, 3). The lesion grouping focuses on lesion morphology and distribution and considers the number, length, and volume of the lesions to define the three groups. It also adheres to the American Joint Committee on Cancer's Tumor, Node, Metastasis classification system’s recommendations for classifying cancer stages and provides a standardized framework for correlating lesion morphology with cancer stage. We selected the 3D-MIR Pancreas collection. 3D-MIR Benchmark: https://github.com/abachaa/3D-MIR Since the MSD collections only include unhealthy/positive volumes, each 3D-MIR collection was augmented with volumes randomly selected from the other datasets to integrate healthy/negative examples in the training and test splits. For instance, the Pancreas dataset was augmented using volumes from the Colon, Liver, and Lung datasets. The input images consist of CT volumes and associated 2D slices. The training set is used to create the index, and the test set is used to query and evaluate the 3D search system. 3D Image Retrieval Our search strategy, called volume-based retrieval, relies on aggregating the embeddings of the 2D slices of a volume to generate one representative 3D embedding vector for the whole volume. We describe additional search strategies in our 3D-MIR paper [4]. The 2D slice embeddings are generated using the MedImageInsight foundation model [5-6] from Azure AI Foundry model catalog [1]. In the search step, we generate the embeddings of the 3D query volumes according to the selected Aggregation method (Agg) and search for the top-k similar volumes/vectors in the corresponding 3D (Agg) index. We use the Median aggregation method to generate the 3D vectors and create the associated 3D index. We construct a 3D (Median) index using the training slices/volumes from the 3D-MIR Pancreas collection. Three other aggregation methods are available in the 3D image search notebook: Max Pooling, Average Pooling, and Standard Deviation. The search is performed following the k-Nearest Neighbors algorithm (or k-NN search) to find the k nearest neighbors of a given vector by calculating the distances between the query vector and all other vectors in the collection, then selecting the K vectors with the shortest distances. If the collection is large, the computation can be expensive, and it is recommended to use specific libraries for optimization. We use the FAISS (Facebook AI Similarity Search) library, an open-source library for efficient similarity search and clustering of high-dimensional vectors. Evaluation of the search results The 3D-MIR Pancreas test set consists of 32 volumes: 4 volumes with no lesion (lesion flag/group= -1) 3 volumes with lesion group 1 19 volumes with lesion group 2 6 volumes with lesion group 3 The training set consists of 269 volumes (with and without lesions) and was used to create the index. We evaluate the 3D search system by comparing the lesion group/category of the query volume and the top 10 retrieved volumes. We then compute Precision@k (P@k). Table 1 presents the P@1, P@3, P@5, P@10, and overall Precision. Table 1: Evaluation results on the 3D-MIR Pancreas test set The system accurately recognizes Healthy cases, consistently retrieving the correct label in test scenarios involving non-lesion pancreas images. However, performance varies for different lesion groups, reflecting challenges in precisely identifying smaller lesions (Group 1) or more advanced lesions (Group 3). This discrepancy highlights the complexity of lesion detection and underscores the importance of carefully tuning embeddings or adjusting the vector index to improve retrieval accuracy for specific lesion sizes. Visualization Figure 2 presents four different test queries from the Pancreas test set and the top 5 nearest neighbors retrieved by the volume-based search method. In each row, the first image is the query, followed by the retrieved images ranked by similarity. The visual overlays help in assessing retrieval accuracy; Blue indicates the pancreas organ boundaries, and Red highlights the mark regions corresponding to the pancreas tumor. Figure 2: Top 5 results for different queries from the Pancreas test set Table 2 presents additional results of the volume-based retrieval system [4] on other 3D-MIR datasets/organs (Liver, Colon, and Lung) using additional foundation models: BiomedCLIP [7], Med-Flamingo [8], and BiomedGPT [9]. When considering the macro-average across all datasets, MedImageInsight-based retrieval outperforms substantially other foundation models. Table 2: Evaluation Results on the 3D-MIR benchmark (Liver, Colon, Pancreas, and Lung) These results mirror a use case akin to lesion detection and severity measurement in a clinical context. In real-world applications—such as diagnostic support or treatment planning—it may be necessary to optimize the model to account for particular goals (e.g., detecting critical lesions early) or accommodate different imaging protocols. By refining search criteria, integrating more domain-specific data, or adjusting embedding methods, practitioners can enhance retrieval precision and better meet clinical requirements. Conclusion The integration of 3D image search systems in clinical environment can enhance and accelerate the retrieval of similar cases and provide better context to clinicians and researchers for accurate complex diagnoses, cohort selection, and personalized patient care. This 3D radiology image search blog and related notebook offers a solution based on 3D embedding generation for building and evaluating a 3D image search system using the MedImageInsight foundation model from Azure AI Foundry model catalog. References Model catalog and collections in Azure AI Foundry portal https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overview Michela Antonelli et al. The medical segmentation decathlon. Nature Communications, 13(4128), 2022 https://www.nature.com/articles/s41467-022-30695-9 MSD: http://medicaldecathlon.com/ Asma Ben Abacha, Alberto Santamaría-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C. F. Codella, Ivan Tarapov: 3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology. CoRR abs/2311.13752, 2023 https://arxiv.org/abs/2311.13752 Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaría-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P. Lungren, Mu Wei: MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging. CoRR abs/2410.06542, 2024 https://arxiv.org/abs/2410.06542 MedImageInsight: https://aka.ms/mi2modelcard Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. NEJM AI 2025; 2(1) https://ai.nejm.org/doi/full/10.1056/AIoa2400640 Moor, M., Huang, Q., Wu, S., Yasunaga, M., Dalmia, Y., Leskovec, J., Zakka, C., Reis, E.P., Rajpurkar, P.: Med-flamingo: a multimodal medical few-shot learner. Machine Learning for Health, ML4H@NeurIPS 2023, 10 December 2023, New Orleans, Louisiana, USA. Proceedings of Machine Learning Research, vol. 225, pp. 353–367. PMLR, (2023) https://proceedings.mlr.press/v225/moor23a.html Zhang, K., Zhou, R., Adhikarla, E., Yan, Z., Liu, Y., Yu, J., Liu, Z., Chen, X., Davison, B.D., Ren, H., et al.: A generalist vision–language foundation model for diverse biomedical tasks. Nature Medicine, 1–13 (2024) https://www.nature.com/articles/s41591-024-03185-2 Image Search Series Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub Image Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology | Microsoft Community Hub Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology | Microsoft Community Hub Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval | Microsoft Community Hub The Microsoft healthcare AI models are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals.Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology
Introduction Dermatology is inherently visual, with diagnosis often relying on morphological features such as color, texture, shape, and spatial distribution of skin lesions. However, the diagnostic process is complicated by the large number of dermatologic conditions, with over 3,000 identified entities, and the substantial variability in their presentation across different anatomical sites, age groups, and skin tones. This phenotypic diversity presents significant challenges, even for experienced clinicians, and can lead to diagnostic uncertainty in both routine and complex cases. Image-based retrieval systems represent a promising approach to address these challenges. By enabling users to query large-scale image databases using a visual example, these systems can return semantically or visually similar cases, offering useful reference points for clinical decision support. However, dermatology image search is uniquely demanding. Systems must exhibit robustness to variations in image quality, lighting, and skin pigmentation while maintaining high retrieval precision across heterogeneous datasets. Beyond clinical applications, scalable and efficient image search frameworks provide valuable support for research, education, and dataset curation. They enable automated exploration of large image repositories, assist in selecting challenging examples to enhance model robustness, and promote better generalization of machine learning models across diverse populations. In this post, we continue our series on using healthcare AI models in Azure AI Foundry to create efficient image search systems. We explore the design and implementation of such a system for dermatology applications. As a baseline, we first present an adapter-based classification framework for dermatology images by leveraging fixed embeddings from the MedImageInsight foundation model, available in the Azure AI Foundry model catalog. We then introduce a Retrieval-Augmented Generation (RAG) method that enhances vision-language models through similarity-based in-context prompting. We use the MedImageInsight foundation model to generate image embeddings and retrieve the top-k visually similar training examples via FAISS. The retrieved image-label pairs are included in the Vision-LLM prompt as in-context examples. This targeted prompting guides the model using visually and semantically aligned references, enhancing prediction quality on fine-grained dermatological tasks. It is important to highlight that the models available on the AI Foundry Model Catalog are not designed to generate diagnostic-quality results. Developers are responsible for further developing, testing, and validating their appropriateness for specific tasks and eventually integrating these models into complete systems. The objective of this blog is to demonstrate how this can be achieved efficiently in terms of data and computational resources. The Data The DermaVQA-IIYI [2] dermatology image dataset is a de-identified, diverse collection of nearly 1,000 patient records and nearly 3,000 dermatological images, created to support research in skin condition recognition, classification, and visual question answering. DermaVQA-IIYI dataset: https://osf.io/72rp3/files/osfstorage (data/iiyi) The dataset is split into three subsets: Training Set: 2,474 images associated with 842 patient cases Validation Set: 157 images associated with 56 cases Test Set: 314 images associated with 100 cases Total Records: 2,945 images (998 patient cases) Patient Demographics: Out of 998 patient cases: Sex – F: 218, M: 239, UNK: 541 Age (available for 398 patients): Mean: 31 yrs | Min: 0.08 yrs | Max: 92 yrs This wide range supports studies across all age groups, from infants to the elderly. A total of 2,945 images are associated with the patient records, with an average of 2.9 images per patient. This multiplicity enables the study of skin conditions from different perspectives and at various stages. Image Count per Entry: 1 image: 225 patients 2 images: 285 patients 3 images: 200 patients 4 or more images: 288 patients The dataset includes additional annotations for anatomic location, comprising 39 distinct labels (e.g., back, fingers, fingernail, lower leg, forearm, eye region, unidentifiable). Each image is associated with one or multiple labels. We use these annotations to evaluate the performance of various methods across different anatomical regions. Image Embeddings We generate image embeddings using the MedImageInsight foundation model [1] from the Azure AI Foundry model catalog [3]. We apply Uniform Manifold Approximation and Projection (UMAP) to project high-dimensional image embeddings produced by the MedImageInsight model into two dimensions. The visualization is generated using embeddings extracted from both the DermaVQA training and test sets, which covers 39 anatomical regions. For clarity, only the most frequent anatomical labels are displayed in the projection. Figure 1. UMAP projection of image embeddings produced by the MedImageInsight Model on the DermaVQA dataset. The resulting projection reveals that the MedImageInsight model captures meaningful anatomical distinctions: visually distinct regions such as fingers, face, fingernail, and foot form well-separated clusters, indicating high intra-class consistency and inter-class separability. Other anatomically adjacent or visually similar regions, such as back, arm, and abdomen, show moderate overlap, which is expected due to shared visual features or potential labeling ambiguity. Overall, the embeddings exhibit a coherent and interpretable organization, suggesting that the model has learned to encode both local and global anatomical structures. This supports the model’s effectiveness in capturing anatomy-specific representations suitable for downstream tasks such as classification and retrieval. Enhancing Visual Understanding We explore two strategies for enhancing visual understanding through foundation models. I. Training an Adapter-based Classifier We build an adapter-based classification framework designed for efficient adaptation to medical imaging tasks (see our prior posts for introduction into the topic of adapters: Unlocking the Magic of Embedding Models: Practical Patterns for Healthcare AI | Microsoft Community Hub). The proposed adapter model builds upon fixed visual features extracted from the MedImageInsight foundation model, enabling task-specific fine-tuning without requiring full model retraining. The architecture consists of three main components: MLP Adapter: A two-layer feedforward network that projects 1024-dimensional embeddings (generated by the MedImageInsight model) into a 512-dimensional latent space. This module utilizes GELU activation and Layer Normalization to enhance training stability and representational capacity. As a bottleneck adapter, it facilitates parameter-efficient transfer learning. Convolutional Retrieval Module: A sequence of two 1D convolutional layers with GELU activation, applied to the output of the MLP adapter. This component refines the representations by modeling local dependencies within the transformed feature space. Prediction Head: A linear classifier that maps the 512-dimensional refined features to the task-specific output space (e.g., 39 dermatology classes). The classifier is trained for 10 epochs (approximately 48 seconds) using only CPU resources. Built on fixed image embeddings extracted from the MedImageInsight model, the adapter efficiently tailors these representations for downstream classification tasks with minimal computational overhead. By updating only the adapter components, while keeping the MedImageInsight backbone frozen, the model significantly reduces computational and memory overhead. This design also mitigates overfitting, making it particularly effective in medical imaging scenarios with limited or imbalanced labeled data. A Jupyter Notebook detailing the construction and training of an MedImageInsight -based adapter model is available in our Samples Repository: https://aka.ms/healthcare-ai-examples-mi2-adapter Figure 3: MedImageInsight-based Adapter Model II. Boosting Vision-Language Models with in-Context Prompting We leverage vision-language models (e.g., GPT-4o, GPT-4.1), which represent a recent class of multimodal foundation models capable of jointly reasoning over visual and textual inputs. These models are particularly promising for dermatology tasks due to their ability to interpret complex visual patterns in medical images while simultaneously understanding domain-specific medical terminology. 1. Few-shot Prompting In this setting, a small number of examples from the training dataset are randomly selected and embedded into the input prompt. These examples, consisting of paired images and corresponding labels, are intended to guide the model's interpretation of new inputs by providing contextual cues and examples of relevant dermatological features. 2. MedImageInsight-based Retrieval-Augmented Generation (RAG) This approach enhances vision-language model performance by integrating a similarity-based retrieval mechanism rooted in MedImageInsight (Medical Image-to-Image) comparison. Specifically, it employs a k-nearest neighbors (k-NN) search to identify the top k dermatological training images that are most visually similar to a given query image. The retrieved examples, consisting of dermatological images and their corresponding labels, are then used as in-context examples in the Vision-LLM prompt. By presenting visually similar cases, this approach provides the model with more targeted contextual references, enabling it to generate predictions grounded in relevant visual patterns and associated clinical semantics. As illustrated in Figure 2, the system operates in two phases: Index Construction: Embeddings are extracted from all training images using a pretrained vision encoder (MedImageInsight). These embeddings are then indexed to enable efficient and scalable similarity search during retrieval. Query and Retrieval: At inference time, the test image is encoded similarly to produce a query embedding. The system computes the Euclidean distance between this query vector and all indexed embeddings, retrieving the k nearest neighbors with the smallest distances. To handle the computational demands of large-scale image datasets, the method leverages FAISS (Facebook AI Similarity Search), an open-source library designed for fast and scalable similarity search and clustering of high-dimensional vectors. The implementation of the image search method is available in our Samples Repository: https://aka.ms/healthcare-ai-examples-mi2-2d-image-search Figure 2: MedImageInsight-based Retrieval-Augmented Generation Evaluation Table 1 presents accuracy scores for anatomic location prediction on the DermaVQA-iiyi test set using the proposed modeling approaches. The adapter model achieves a baseline accuracy of 31.73%. Vision-language models perform better, with GPT-4o (2024-11-20) achieving an accuracy of 47.11%, and GPT-4.1 (2025-04-14) improving to 50%. However, incorporating few-shot prompting with five randomly selected in-context examples (5-shot) slightly reduces GPT-4.1’s performance to 48.72%. This decline suggests that unguided example selection may introduce irrelevant or low-quality context, potentially reducing the effectiveness of the model’s predictions for this specialized task. The best performance among the vision-language approaches is achieved using the retrieval-augmented generation (RAG) strategy. In this setup, GPT-4.1 is prompted with five nearest-neighbor examples retrieved using the MedImageInsight-based search method (RAG-5), leading to a notable accuracy increase to 51.60%. This improvement over GPT-4.1’s 50% accuracy without retrieval showcases the relevance of the MedImageInsight-based RAG method. We expect larger performance gains when using a more extensive dermatology dataset, compared to the relatively small dataset used in this example -- a collection of 2,474 images associated with 842 patient cases which served as the basis for selecting relevant cases and similar images. Dermatology is a particularly challenging domain, marked by a high number of distinct conditions and significant variability in skin tone, texture, and lesion appearance. This diversity makes robust and representative example retrieval especially critical for enhancing model performance. The results underscore the importance of example relevance in few-shot prompting, demonstrating that similarity-based retrieval can effectively guide the model toward more accurate predictions in complex visual reasoning tasks. Table 1: Comparative Accuracy of Anatomic Location Prediction on DermaVQA-iiyi Figure 2: Confusion Matrix of Anatomical Location Predictions by the trained MLP adapter: The matrix illustrates the model's performance in classifying wound images across 39 anatomical regions. Strong diagonal values indicate correct classifications, while off-diagonal entries highlight common misclassifications, particularly among anatomically adjacent or visually similar regions such as 'lowerback' vs. 'back' and 'hand' vs. 'fingers'. Figure 3. Examples of correct anatomical predictions by the RAG approach. Each image depicts a case where the model's predicted anatomical region exactly matches the ground truth. Shown are examples from visually and anatomically distinct areas including the eye region, lips, lower leg, and neck. Figure 4. Examples of misclassifications by the RAG approach. Each image displays a case where the predicted anatomical label differs from the ground truth. In several examples, predictions are anatomically close to the correct regions (e.g., hand vs. hand-back, lower leg vs. foot, palm vs. fingers), suggesting that misclassifications often occur between adjacent or visually similar areas. These cases highlight the challenge of precise localization in fine-grained anatomical classification and the importance of accounting for anatomical ambiguity in both modeling and evaluation. Conclusion Our exploration of scalable image retrieval and advanced prompting strategies demonstrates the growing potential of vision-language models in dermatology. A particularly challenging task we address is anatomic location prediction, which involves 39 fine-grained classes of dermatology images, imbalanced training data, and frequent misclassifications between adjacent or visually similar regions. By leveraging Retrieval-Augmented Generation (RAG) with similarity-based example selection using image embeddings from the MedImageInsight foundation model, we show that relevant contextual guidance can significantly improve model performance in such complex settings. These findings underscore the importance of intelligent image retrieval and prompt construction for enhancing prediction accuracy in fine-grained medical tasks. As vision-language models continue to evolve, their integration with retrieval mechanisms and foundation models holds substantial promise for advancing clinical decision support, medical research, and education at scale. In the next blog of this series, we will shift focus to the wound care subdomain of dermatology, and we will release accompanying Jupyter notebooks for the adapter-based and RAG-based methods to provide a reproducible reference implementation for researchers and practitioners. The Microsoft healthcare AI models, including MedImageInsight, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. References Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaría-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie L. Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P. Lungren, Mu Wei: MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging. CoRR abs/2410.06542 (2024) Wen-wai Yim, Yujuan Fu, Zhaoyi Sun, Asma Ben Abacha, Meliha Yetisgen, Fei Xia: DermaVQA: A Multilingual Visual Question Answering Dataset for Dermatology. MICCAI (5) 2024: 209-219 Model catalog and collections in Azure AI Foundry portal https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overview Image Search Series Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub Image Search Series Part 2: AI Methods for the Automation of 3D Image Retrieval in Radiology | Microsoft Community Hub Image Search Series Part 3: Foundation Models and Retrieval-Augmented Generation in Dermatology | Microsoft Community Hub Image Search Series Part 4: Advancing Wound Care with Foundation Models and Context-Aware Retrieval | Microsoft Community Hub