HLS
21 TopicsOptimizing Azure Healthcare Multimodal AI Models for Intel CPU Architecture
Alexander Mehmet Ersoy, Principal Product Manager, Microsoft HLS AI Abhishek Khowala, Principal AI Engineer, Intel Ravi Panchumarthy, AI Framework Engineer, Intel Srinarayan Srikanthan, AI Framework Engineer, Intel Ekaterina Aidova, AI Frameworks Engineer, Intel Alberto Santamaria-Pang, Principal Applied Data Scientist, Microsoft HLS AI and Adjunct Faculty at Johns Hopkins Medicine, Microsoft Peter Lee, Applied Scientist, Microsoft HLS AI and Adjunct Assistant Professor at Vanderbilt University Ivan Tarapov, Sr. Director, Microsoft HLS AI Pradeep Sakhamoori, Sr. SW Engineer, Microsoft The Rise of Multimodal AI in Healthcare The healthcare sector is witnessing a surge in the adoption of multimodal AI models, which are crucial for applications ranging from diagnostics to personalized treatment plans. These models combine data from various sources such as medical images, patient records, and genomic data to provide comprehensive insights. Microsoft’s Azure AI Foundry's Model Catalog of multimodal healthcare foundation models is at the forefront of this change. Models recently launched (such as MedImageInsights, MedImageParse, CXRReportGen [8], and many others) are designed to help healthcare organizations rapidly build and deploy AI solutions tailored to their specific needs, while minimizing the extensive compute and data requirements typically associated with building multimodal models from scratch. Real-World Examples from our industry partners regarding the adoption of multimodal AI models are highlighted in the article “Unlocking next-generation AI capabilities with healthcare AI models”. Challenges and Opportunities in Hardware Optimization As models get more complex, which is the case with the foundation model trend, the demands on the hardware rise. While GPUs remain the platform of choice for minimizing the model execution times, CPUs present substantial optimization possibilities, especially for inference workloads. We believe that providing a framework for efficient CPU-based environments holds a huge potential for many production scenarios where speed can be traded off for cost savings. With multimodal healthcare AI, the complexity of handling different data modalities and ensuring efficient inference requires innovative solutions and collaboration between industry leaders. Companies are increasingly looking towards hardware-specific optimizations to enhance model efficiency and reduce latency while keeping costs at bay. Intel, with its robust suite of AI tools and extensions for frameworks like PyTorch, is pioneering this optimization effort. For instance, the Intel® Distribution of OpenVINO™ toolkit has been instrumental in accelerating the development of computer vision and deep learning applications in healthcare [1]. You can learn about our recent collaboration with Intel on AI optimizations to advance medical innovations in the article "Empower Medical Innovations: Intel Accelerates PadChest & fMRI Models on Microsoft Azure* Machine Learning”. The demand for AI applications in healthcare is rapidly increasing. Multimodal AI models, which can process and analyze complex datasets, are essential for tasks such as early disease detection, treatment planning, and patient monitoring. While optimizing these models to perform efficiently on specific hardware is important, it is not necessarily a barrier to adoption. Models optimized with CUDA for Nvidia GPUs often deliver optimal performance and run faster than on any other hardware. However, the benefit of using CPUs lies in the tradeoff they offer. You can choose to optimize for speed by running your model on a GPU and optimizing for it in PyTorch, or you can optimize for cost by sacrificing speed. This is the proposition here: the option to run the model slower with an accessible CPU, which can be advantageous in scenarios where speed is not the primary concern, but access to GPU hardware is. The Intel® oneAPI Deep Neural Network Library (oneDNN) have proven effective in reducing GPU requirement burden and accelerating time to market for AI solutions [2]. Both Intel® Extension for PyTorch (IPEX) and OpenVINO utilize the Intel® oneDNN to accelerate deep learning operations, taking advantage of underlying hardware features. IPEX optimizes existing PyTorch workflows with minimal code changes. OpenVINO provides cross-platform deep learning optimization for deployment flexibility. In this blog post, a custom deployment was implemented using CXRReportGen along with both IPEX and OpenVINO optimizations, demonstrating how these techniques can support different deployment scenarios and technical requirements. This optimization is accessible through Azure's compute services and Intel's technology. Benchmarking and Performance Acceleration To address these challenges, our new collaboration with Intel focuses on leveraging Intel’s advanced AI tools and hardware capabilities to optimize multimodal AI models for greater healthcare access. By utilizing Intel's Extension for PyTorch and other optimization techniques, we aim to optimize CPUs for best model run time speed. While this may slightly degrade performance, the main benefit is addressing the problem of GPU hardware scarcity. This partnership not only underscores the importance of hardware-specific optimizations but also sets a new standard for AI model deployment in real-world healthcare applications. Both IPEX and OpenVINO are built on a common foundation - Intel® oneDNN which is a high-performance library designed specifically for deep learning applications and optimized for Intel architecture. oneDNN leverages specialized hardware instructions available in Intel processors such as Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) [3] on Intel CPUs as well as Intel XeMatrix Extensions (XMX) AI engines on Intel discrete GPUs. Figure 1: OneDNN Library IPEX [4] extends PyTorch* with the latest performance optimizations for Intel hardware [5]. It leverages oneDNN under the hood to provide optimized implementations of key operations. This allows developers to stay within their existing PyTorch code with minimal changes - making it an excellent choice for teams already comfortable with the PyTorch ecosystem who want to quickly optimize their models for Intel hardware. import torch ############## import ipex ############### import intel_extension_for_pytorch as ipex model = Model() model.eval() ############## Optimize with IPEX ############### model = ipex.optimize(model, dtype=torch.bfloat16) # Continue with inference as normal Figure 2. Intel Extension for PyTorch The Intel® Distribution of OpenVINO™ toolkit is a powerful solution for optimizing and deploying deep learning models across a wide range of Intel hardware [6]. Like IPEX, it leverages oneDNN under the hood, but takes a different approach - offering cross-platform optimization and flexible deployment options. OpenVINO supports two main workflows: a convenience workflow, where you run models directly with minimal setup, and a performance workflow, recommended for production, where models are first converted offline into the OpenVINO Intermediate Representation (IR). This one-time conversion step enables highly optimized inference and allows the final application to remain lightweight and efficient. Here’s a simple example using OpenVINO for inference with a pre-converted IR model. Refer to OpenVINO Notebooks repo for more samples: import openvino as ov core = ov.Core() ############## Load the OpenVINO IR model ############### compiled_model = core.compile_model("model.xml", "CPU") ############## Run inference ################### infer_request = compiled_model.create_infer_request() results = infer_request.infer({input_tensor_name: input_tensor}) Figure 3: OpenVINO toolkit Overview. IPEX and OpenVINO are supported in all Intel architectures. However, for optimal performance, Intel recommends using instances powered by 4th Gen Intel® Xeon® Scalable processors or newer, which feature AMX and other hardware acceleration capabilities, such as Azure’s v6-series (e.g., Standard_E48s_v6) [7]. Results We conducted a detailed performance benchmark by using CXRReportGen, a state-of-the-art foundation model designed to generate a list of radiological findings from chest X-rays, over Standard_E48s_v6 hardware (48 vCPUs, 248 GiB RAM) with and without IPEX and OpenVINO optimization. We realized up to 70% improvement in CXRReportGen foundation model run time when applying optimizations with IPEX and similarly substantial gains using OpenVINO, compared to the non-optimized baseline on the same CPU hardware. This significant improvement highlights the potential of leveraging Intel's performance optimizations to make critical healthcare AI models more cost-efficient and accessible. Such advancements enable healthcare providers to deploy advanced diagnostic tools even in resource-constrained environments, ultimately improving patient care and operational efficiency. SKU Run Type (100 Runs) Mean Run Time (seconds) Standard Deviation of Run Time (seconds) Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) No Optimization 22.47 0.1061 Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) IPEX 8.21 0.2375 Standard_E48s_v6 (48 vCPUs, 348 GiB RAM) OpenVINO 7.01 0.0569 Table 1: Performance Comparison of CXRReportGen Model Across 100 Runs with CPU. Future Prospects and Innovations Our benchmarks with Intel optimizations with both IPEX and OpenVINO show great potential on decreasing the model run time of our foundation models and increasing scalability via CPU. This optimization positions Intel CPUs as a viable deployment. This not only increases deployment options but also offers opportunities to reduce cloud costs with CPU-based instances and even consider deploying these workflows on existing compute headroom at the edge. For custom deployments, the setup described in this blog post is now available on the provided compute instances in Azure and with optimization software from Intel. So that developers can optimize inference workloads while taking advantage of large memory pools available via CPU and use towards handling large batch workloads. Our advancements with Intel in model runtime optimizations are considered to be available in the Azure AI model catalogs. Please stay tuned for further updates. As we continue to innovate and optimize, the potential for AI to transform healthcare and improve patient outcomes becomes increasingly attainable. We are now more equipped than ever to making it easier for our partners and customers to create connected experiences at every point of care, empower their healthcare workforce, and unlock the value from their data using data standards that are important to the healthcare industry. References [1] Intel OpenVINO Optimizes Deep Learning Performance for Healthcare Imaging [2] Accelerating Healthcare Diagnostics with Intel oneAPI and AI Tools [3] Intel Advanced Matrix Extensions [4] Intel Extension for Pytorch [5] Accelerate with Intel Extension to PyTorch [6] Intel Accelerates PadChest and fMRI Models on Azure ML [7] Azure’s first 5th Gen Intel® Xeon® processor instances are now available and we're excited! [8] CxrReportGen Model Card in Azure AI Foundry The healthcare AI models in Azure AI Foundry are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals.When "Wrong" Looks “Right”: The Challenge of Evaluating AI in Healthcare
Choosing the right evaluation metrics is crucial for ensuring patient safety and clinical accuracy when integrating AI into healthcare. Traditional text comparison metrics like F1, BLEU, ROUGE, and METEOR often fail to distinguish between clinically accurate and inaccurate responses. Advanced methods such as BERTScore, ClinicalBERT, and MoverScore, show better results but still have limitations. In this blogpost, we present a compelling case for investing in more advanced evaluation methods, even when they require additional computational resources. When patient safety is at stake, the ability to reliably distinguish between clinically accurate and inaccurate content isn't just nice to have—it's essential.2.1KViews3likes0CommentsA specialty-specific approach with Microsoft Dragon Copilot
Clinicians are at the heart of patient care, and the documentation they create shapes how that care is delivered, interpreted, and continued. Nearly 70% of the global medical workforce—around 9 million practitioners, according to recent World Health Organization data—are specialists whose work spans a wide range of disciplines and care settings. As these specialties evolve, so do their documentation needs to ensure the highest quality and accurate care. Each specialty brings its own documentation requirements. Orthopedics relies heavily on imaging reports, physical exam findings, and procedural notes. Preventive Medicine, on the other hand, focuses on understanding the breadth of the patient’s conditions and proactive measures to promote health. Across care settings—from outpatient clinics to emergency departments to inpatient units—documentation also varies in its requirements. Accurate, specialty-specific documentation supports not only improved patient outcomes but also the broader healthcare ecosystem—from ensuring appropriate reimbursement to enabling clinical research and the development of more targeted treatments. When designed to meet the needs of specialists, documentation becomes more than a requirement—it becomes a tool for delivering better care. Purpose-built with clinicians Microsoft Dragon Copilot enhances the clinician experience by streamlining the creation of medical notes tailored to each specialty’s unique requirements. Powered by advanced natural language processing, Dragon Copilot recognizes and adapts to the specific needs of disparate medical fields. This enables clinicians to focus more on patient care and less on administrative work, enhancing both efficiency and satisfaction. Built for continuous learning and adaptation, Dragon Copilot helps specialists keep pace with the evolving clinical guidelines, medical standards, and billing requirements with Microsoft’s dedicated team of medical professionals including MD’s (Doctor of Medicine), RN’s (Registered Nurse), and APP’s (Advanced Practice Provider). In a field shaped by constant change, this agility helps to ensure documentation stays accurate and relevant. At the core of this innovation is Microsoft’s deep, daily engagement with clinicians using Dragon Copilot. Through a diverse network—physicians, advanced care practitioners, coders, and other healthcare professionals—Microsoft works directly with those on the front line of care. This network, and other early access participants, work alongside Microsoft’s in-house clinical experts and researchers to co-design, test, and refine Dragon Copilot. This close partnership brings real-world insight into the development process, helping us ensure Dragon Copilot aligns with the practical, specialty-specific needs of medical professionals. By embedding clinical knowledge and clinician feedback into each iteration, we deliver a solution that is not only clinically accurate but also intuitive and trusted. This is about more than building a better product experience—it is about fostering trust and ownership among clinicians with technology that fits naturally into their everyday practice, supports their expertise, and helps them deliver the highest quality care. The power of a specialty-optimized approach Dragon Copilot is built on a trusted core medical model, fine-tuned on millions of real-world patient encounters. From this core medical foundation, the model is then adapted and optimized for each specialty—integrating clinical experts’ knowledge and research, national association recommendations, and feedback from clinicians. This layered approach supports outputs that are not only medically accurate but also aligned with the documentation standards and workflows clinicians depend on in their daily practice. The system evolves with changes in clinical guidelines and inputs from practicing specialists. This iterative process keeps Dragon Copilot current, relevant, and reflective of both specialty-specific requirements and real-world practice. By aligning note content with specialty-specific standards, Dragon Copilot helps reduce cognitive load, minimize documentation errors, and shorten the time needed to complete and sign notes. The result is a more efficient workflow that enhances both the quality of care and patient data processing. “By teaming up with specialty providers, Microsoft has elevated the quality and accuracy of notes—making my documentation clearer, more robust yet concise, and significantly improving readability for both patients and fellow providers. Additionally, this update also greatly improved the capture of physical exam findings.” Eric Alford, M.D. Baylor Scott & White Health Consider ophthalmology: clinical guidelines in this specialty require documentation of complex decision-making—such as discussing lens implant options in a way that balances clinical appropriateness with individual patient preferences. Dragon Copilot helps to capture both, generating documentation that is structured and personalized to each unique patient encounter. Or take psychiatry: the mental status exam is a crucial component of the evaluation for informed decisions about the patient's treatment. Dragon Copilot supports by capturing this comprehensive assessment essential for tracking the patient's progress over time. Customization that reflects the art of medicine Specialty-specific notes are only part of the solution—clinician satisfaction and adoption rely on meaningful customization. Documentation is personal, and no two clinicians document the same way. Microsoft Dragon Copilot is designed with that in mind, offering customizable templates and flexible styles that align with individual preferences and workflows. “I think the potential of Dragon Copilot is going to be even greater as we start to bring in local vernacular, and the ability to help each doctor tune their note to their appropriate desires because one person's note that is too brief is another one that's too long for someone else”- R. Hal Baker, MD, SVP and CIO, Wellspan Health This level of personalization preserves each clinician’s unique voice while enhancing the accuracy, completeness, and efficiency of documentation. By bridging standardized requirements with specialty-specific content and individual style, Dragon Copilot supports a more seamless and effective documentation experience. Tailoring technology to meet the diverse needs of clinicians not only enhances satisfaction and adoption but contributes to better care delivery across the healthcare system. Trustworthy AI by design Microsoft Dragon Copilot is built on a secure data estate and incorporates healthcare-adapted clinical, chat and compliance safeguards for accurate and safe AI outputs. Dragon Copilot also aligns to Microsoft’s responsible AI principles to help guide AI development and use — transparency, reliability and safety, fairness, inclusiveness, accountability, privacy, and security. We invest in technical performance through regular assessments, building trust with medical professionals. This process looks for potential biases and errors, enabling timely corrections and continuous improvements across specialties. With a strong focus on inclusiveness, Dragon Copilot supports a wide range of medical practices and specialties, reflecting the diverse needs of clinicians and patients. By upholding these principles, Microsoft drives innovation while helping to safeguard the interests of both patients and healthcare providers. These commitments set a high standard for trustworthy AI in healthcare. Looking ahead Clinical documentation should tell the complete story of a patient’s care—clearly and comprehensively—for the stakeholders involved. We are excited to keep innovating around specialty-specific clinical documentation and beyond—and we want you to be part of it. Your feedback fuels our progress. Together, we can improve clinician well-being and keep the focus where it belongs: on patient care. Learn more Watch an on-demand demo Take a deeper look at Dragon Copilot Explore the latest with our new health AI models and integrationsYOTNM Ep. 2: March is Colorectal Cancer Awareness month
Hear a personal journey of the importance of timely health screenings to proactively manage your health and wellbeing. Kathleen McGrow, DNP, CNIO, RN, MS, discusses Colorectal Cancer Awareness month in Episode 2 of Year of the Nurse & Midwife series.