Healthcare data solutions
19 TopicsIntegrating remote patient monitoring solutions with healthcare data solutions in Microsoft Fabric
Co-Authors: Kemal Kepenek, Mustafa Al-Durra PhD, Matt Dearing, Jason Foerch, Manoj Kumar Introduction Remote patient monitoring solutions rely on connected devices, wearable technology, and advanced software platforms to collect and transmit patient health data. They facilitate monitoring of vital signs, chronic conditions, and behavioral patterns. Healthcare data solutions in Microsoft Fabric offers a secure, scalable, and interoperable data platform as part of Microsoft for Healthcare. Such a unified data platform is crucial for integrating disparate data sources and generating actionable health insights. This article provides a reference architecture and the steps to integrate remote patient monitoring solutions with healthcare data solutions in Fabric. The integration is aimed at satisfying low data resolution use cases. With low data resolution, we address infrequent (hourly, daily, or less) transfer of aggregated or point-in-time-snapshot device data into healthcare data solutions in Fabric to be used in a batch fashion to generate analytical insights. Integration steps for high data resolution use cases, which necessitate high frequency transfer of highly granular medical device data (for example, data from EKGs or ECGs) to become input to either batch or (near) real-time analytics processing and consumption, is a candidate for a future article. There are several methods, solutions and partners available in the marketplace today that will allow you to integrate a remote patient monitoring solution with the healthcare data solutions in Fabric. In this article, we leveraged the solution from Life365 (a Microsoft partner). The integration approach discussed here is applicable to most remote patient monitoring solutions whose integration logic (code) can be run inside a platform that can programmatically access (for example, through REST API calls) Microsoft Fabric. In our approach, the integration platform chosen is the Function App service within Microsoft Azure. In the subsequent sections of this article, we cover the integration approach in two phases: Interoperability phase, which illustrates how the data from medical devices (used by the remote patient monitoring solution) can be converted into format suitable for transferring into healthcare data solutions in Fabric. Analytical processing and consumption phase, which provides the steps to turn the medical device data into insights that can be easily accessed through Fabric. Integration Approach Interoperability Phase Step 1 of this phase performs the transfer of proprietary device data. As part of this step, datasets are collected from medical devices and transferred (typically, in the form of files) to an integration platform or service. In our reference architecture, the datasets are trans ferred to the Function App (inside an Azure Resource Group) that is responsible for the integration function. It is important for these datasets to contain information about (at least) three concepts or entities: Medical device(s) from which the datasets are collected. Patient(s) to whom the datasets belong. Reading(s) obtained from the medical device(s) throughout the time that the patients utilize these devices. Medical device readings data may be point-in-time data capture, metrics, measures, calculations, collections, or similar data points. Information about the entities listed above will be used in the later step of interoperability phase (discussed below) when we will convert this information into resources to be transferred to the second phase that will perform analytical processing and consumption. In step 2, to maintain mapping between proprietary device data and FHIR® resources, you can use transformation templates, or follow a programmatic approach, to convert datasets received from medical devices into appropriate FHIR® resources. Using the entities mentioned in the previous step, the conversion takes place as follows: Medical device information is converted to Device resource in FHIR® * . Patient information is converted to Patient resource in FHIR®. Device reading information is converted to Observation resource in FHIR®. * Currently, healthcare data solutions in Fabric supports FHIR® Release 4 (R4) standard. Consequently, the FHIR® resources that are created as part of this step should follow the same standard. Transformation and mapping activities are under the purview of each specific remote patient monitoring integration solution and are not reviewed in detail in this article. As an example, we provided below the high-level steps that one of the Microsoft partners (Life365) followed to integrate their remote patient monitoring solution with healthcare data solutions in Fabric: Life365 team developed a cloud-based transformation service that translates internal device data into standardized FHIR® (Fast Healthcare Interoperability Resources) Observations to enable compatibility with healthcare data solutions in Microsoft Fabric and other health data ecosystems. This service is implemented in Microsoft Azure Cloud and designed to ingest structured payloads from Life365-connected medical devices —including blood pressure monitors, weight scales, and pulse oximeters— and convert them into FHIR®-compliant formats in real time. When a reading is received: The service identifies relevant clinical metrics (e.g., systolic/diastolic blood pressure, heart rate, weight, SpO₂). These metrics are mapped to FHIR® Observation resources using industry-standard LOINC codes and units. Each Observation is enriched with references to the associated patient and device, formatted in NDJSON to meet the ingestion requirements in healthcare data solutions in Fabric. The resulting FHIR®-compliant data is securely transmitted to the Fabric instance using token-based authentication. This implementation provides a consistent, standards-aligned pathway for Life365 device data to integrate with downstream FHIR®-based platforms while abstracting the proprietary structure of the original device payloads. For examples from the public domain, you can use the following open-source projects as references: https://github.com/microsoft/fit-on-FHIR® https://github.com/microsoft/healthkit-to-FHIR® https://github.com/microsoft/FitbitOnFHIR® https://github.com/microsoft/FHIR®-Converter Please note that above open-source repositories might not be up to date. While they may not provide a complete (end to end) solution to map medical device data to FHIR®, they may still be helpful as a starting point. If you decide to incorporate them into your remote patient monitoring integration solution, validate their functionality and make necessary changes to meet your solution’s requirements. For the resulting FHIR® resources to be successfully consumed by the analytics processing later (within healthcare data solutions in Fabric), they need to satisfy the requisites listed below. Each FHIR® resource, in its entirety, needs to be saved as a single row into an NDJSON-formatted file. We recommend creating one NDJSON file per FHIR® resource type. That means creating Device.ndjson, Patient.ndjson, and Observation.ndjson files for the three entities we reviewed above. Each FHIR® resource needs to have a meta segment populated with inclusion of lastUpdated value. As an example: "meta":{"lastUpdated":"2025-05-15T15:35:04.218Z", "profile":["http://hl7.org/FHIR®/us/core/StructureDefinition/us-core-documentreference"]} Cross references between Observation and Patient, as well as between Observation and Device FHIR® resources need to be represented correctly, either through formal FHIR® identifiers or logical identifiers. As an example, the subject and device attributes of Observation FHIR® resource need to refer to Patient and Device FHIR® resources, respectively, in this manner: "subject":{"reference":"Patient/d3281621-1584-4631-bc82-edcaf49fda96"} "device":{"reference":"Device/5a934020-c2c4-4e92-a0c5-2116e29e757d"} For Patient FHIR® resource, if MRN is used as the identifier, it is important to represent the MRN value according to the FHIR® standard. Patient identifier is a critical attribute that it is used to establish cross-FHIR®-resource relationships throughout the analytics processing and consumption phase. We will review that phase later in this article. At a minimum, a Patient identifier, which uses MRN coding as its identifier type, needs to have its value, system, type.coding.system, and type.coding.code (with value “MR”) attributes populated correctly. See an example below. You can also refer to a Patient FHIR® resource example from hl7.org. "reference": null, "type": "Patient", "identifier": { "extension": null, "use": null, "value": "4e7e5bf8-2823-8ec1-fe37-eba9c9d69463", "system": "urn:oid: 1.2.36.146.595.217.0.1", "type": { "extension": null, "id": null, "coding": [ { "extension": null, "id": null, "system": "http://terminology.h17.org/CodeSystem/v2-0203", "version": null, "code": "MR", "display": null, "userSelected": null } "text": null }, ... With Step 3, to perform the transfer of FHIR® resource NDJSON files to healthcare data solutions in Fabric: Ensure that the integration platform (Azure Function App, in our case) has permission to transfer (upload) files to the healthcare data solutions in Fabric: Find the managed identity or the service principal that the Azure Function App is running under: Navigate to the Azure portal and find your Function App within your resource group. In the Function App's navigation pane, under "Settings," select "Identity". Identify the Managed Identity (if enabled): If System-assigned managed identity is enabled, you'll see information about the system-assigned managed identity, including its object ID and principal ID. If User-assigned managed identity is linked, the details of that identity will be displayed. You can also add user-assigned identities here if needed. Service Principal (if applicable): If the Function App is configured to use a service principal, you'll need to look for the service principal within the Azure Active Directory (a.k.a. Microsoft Entra ID). You can find this by searching for "Enterprise Applications" within Azure Active Directory and looking for the application associated with the Function App. Grant Azure Function App’s identity access to upload files: Having been logged into Fabric with an administrator account, navigate to the Fabric workspace where your healthcare data solutions instance is deployed. Click on the “Manage Access” button on the top right. Click on “Add People or Groups” Add the managed identity or the service principal, which is associated with your Azure Function App, with Contributor access by selecting “Contributor” from the dropdown list. Using a coding environment, similar to the Python example provided below, you can manage the OneLake content programmatically. This includes the ability to transfer (upload) the NDJSON-formatted files, which have been created earlier, to the destination OneLake folder. from azure.identity import DefaultAzureCredential from azure.storage.filedatalake import DataLakeFileClient, DataLakeFileSystemClient # Replace with your OneLake URI onelake_uri = "https://your-account-name.dfs.core.windows.net" # Replace with the destination path to your file file_path = "/<full path to destination folder (see below)>/<entity name>.ndjson" # Get the credential credential = DefaultAzureCredential() # Create a DataLakeFileClient file_client = DataLakeFileClient( url=f"{onelake_uri}{file_path}", credential=credential ) # Upload the file with open("<entity name>.ndjson", "rb") as f: file_client.upload_data(f, overwrite=True) print(f"File uploaded successfully: {file_path}") The destination OneLake folder to use for the remote patient monitoring solution integration into healthcare data solutions in Fabric is determined as follows: Navigate to the bronze lakehouse created with the healthcare data solutions instance inside the Fabric workspace. The lakehouse is named as “healthcare1_msft_bronze”. “healthcare1” segment in the name of the lakehouse points to the name of the healthcare data solutions instance deployed in the workspace. You might see a different name in your Fabric workspace; however, the rest of the lakehouse name (“_msft_bronze”) remains unchanged. Unified folder structure of healthcare data solutions is located inside the bronze lakehouse. Within that folder structure, create a subfolder named with the name of the remote patient monitoring solution you are integration with. See the screenshot below. This subfolder is referred to as namespace in healthcare data solutions documentation, and is used to uniquely identify the source of incoming (to-be-uploaded) data. NDJSON files, which have been generated during the previous interoperability phase, will be transferred (uploaded) into that subfolder. The full path of the destination OneLake folder to use in your file transfer (upload) code is: healthcare1_msft_bronze.Lakehouse\Files\Ingest\Clinical\FHIR®-NDJSON\<Solution-Name-as-Namespace> Analytics Processing and Consumption Phase Step 1 of this phase connects the interoperability phase discussed earlier with the analytics processing and consumption phase. As part of this step, you can simply verify that the NDJSON files have been uploaded to the remote patient monitoring solution subfolder inside the unified folder structure in bronze lakehouse of healthcare data solutions in Fabric. The path to that subfolder is provided earlier in this article. After the upload of the files has been completed, you are ready to run the data pipeline that will perform data ingestion and transformation so that the device readings data may be used for analytics purposes. In the Fabric workspace, where healthcare data solutions instance is deployed, find and open the data pipeline named “healthcare1_msft_omop_analytics”. As is the case with the bronze lakehouse name, “healthcare1” segment in the name of the data pipeline points to the name of the healthcare data solutions instance deployed in the workspace. You might see a different name in your Fabric workspace depending on your own instance. This data pipeline will execute four activities, first of which will copy the transferred files into another subfolder within unified folder structure so that they can be input to the ingestion step next. Then, the subsequent pipeline activities perform steps 2 through 4 as illustrated in the analytics processing and consumption phase diagram further above. Step 2 ingests the content from the transferred (NDJSON) file(s) to the ClinicalFHIR delta table of the bronze lakehouse. Step 3 transforms the content from the ClinicalFHIR delta table of the bronze lakehouse into flattened FHIR® data model content inside silver lakehouse. Step 4 transforms the flattened FHIR® content of silver lakehouse into OMOP data model content inside gold lakehouse. As part of step 5, you can develop your own gold lakehouse(s) through transforming content from the silver lakehouse into data model(s) best suited for your custom analytics use cases. Device data, once transformed into a gold lakehouse, may be used for analytics or reporting through several ways some of which are discussed briefly below. In step 6, Power BI reports and dashboards can be built inside Fabric that offer a visual and interactive canvas to analyze the data in detail. (Overview of Power BI - Microsoft Fabric | Microsoft Learn) As part of step 7, Fabric data share feature can be used to grant teams within external organizations (that you collaborate with) access to the data (External data sharing in Microsoft Fabric - Microsoft Fabric | Microsoft Learn). Finally, step 8 enables you to utilize the discover and build cohorts capability of healthcare data solutions in Fabric. With this capability, you can submit natural language queries to explore the data and build patient cohorts that fit the criteria that your use cases are aiming for. (Build patient cohorts with generative AI in discover and build cohorts (preview) - Microsoft Cloud for Healthcare | Microsoft Learn) Conclusion When integrated with healthcare data solutions in Fabric, remote patient monitoring solutions can enable transformative potential in enhancing patient outcomes, optimizing care coordination, and streamlining healthcare system operations. If your organization would like to explore the next steps in such a journey, please contact your Microsoft account team.1.8KViews0likes0CommentsOrchestrate multimodal AI insights within your healthcare data estate (Public Preview)
In today’s healthcare landscape, there is an increasing emphasis on leveraging artificial intelligence (AI) to extract meaningful insights from diverse datasets to improve patient care and drive clinical research. However, incorporating AI into your healthcare data estate often brings significant costs and challenges, especially when dealing with siloed and unstructured data. Healthcare organizations produce and consume data that is not only vast but also varied in format—ranging from structured EHR entries to unstructured clinical notes and imaging data. Traditional methods require manual effort to prepare and harmonize this data for AI, specify the AI output format, set up API calls, store the AI outputs, integrate the AI outputs, and analyze the AI outputs for each AI model or service you decide to use. Orchestrate multimodal AI insights is designed to streamline and scale healthcare AI within your data estate by building off of the data transformations in healthcare data solutions in Microsoft Fabric. This capability provides a framework to generate AI insights by connecting your multimodal healthcare data to an ecosystem of AI services and models and integrating structured AI-generated insights back into your data estate. When you combine these AI-generated insights with the existing healthcare data in your data estate, you can power advanced analytics scenarios for your organization and patient population. Key features: Metadata store lakehouse acts as a central repository for the metadata for AI orchestration to effectively capture and manage enrichment definitions, view definitions, and contextual information for traceability purposes. Execution notebooks define the enrichment view and enrichment definition based on the model configuration and input mappings. They also specify the model processor and transformer. The model processor calls the model API, and the transformer produces the standardized output while saving the output in the bronze lakehouse in the Ingest folder. Transformation pipeline to ingest AI-generated insights through the healthcare data solutions medallion lakehouse layers and persist the insights in an enrichment store within the silver layer. Conceptual architecture: The data transformations in healthcare data solutions in Microsoft Fabric allow you ingest, store, and analyze multimodal data. With the orchestrate multimodal AI insights capability, this standardized data serves as the input for healthcare AI models. The model results are stored in a standardized format and provide new insights from your data. The diagram below shows the flow of integrating AI generated insights into the data estate, starting as raw data in the bronze lakehouse and being transformed to delta tables in the silver lakehouse. This capability simplifies AI integration across modalities for data-driven research and care, currently supporting: Text Analytics for health in Azure AI Language to extract medical entities such as conditions and medications from unstructured clinical notes. This utilizes the data in the DocumentReference FHIR resource. MedImageInsight healthcare AI model in Azure AI Foundry to generate medical image embeddings from imaging data. This model leverages the data in the ImagingStudy FHIR resource. MedImageParse healthcare AI model in Azure AI Foundry to enable segmentation, detection, and recognition from imaging data across numerous object types and imaging modalities. This model uses the data in the ImagingStudy FHIR resource. By using orchestrate multimodal AI insights to leverage the data in healthcare data solutions for these models and integrate the results into the data estate, you can analyze your existing data alongside AI enrichments. This allows you to explore use cases such as creating image segmentations and combining with your existing imaging metadata and clinical data to enable quick insights and disease progression trends for clinical research at the patient level. Get started today! This capability is now available in public preview, and you can use the in-product sample data to test this feature with any of the three models listed above. For more information and to learn how to deploy the capability, please refer to the product documentation. We will dive deeper into more detailed aspects of the capability, such as the enrichment store and custom AI use cases, in upcoming blogs. Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations. FHIR® is the registered trademark of HL7 and is used with permission of HL7.1.2KViews2likes0CommentsElevating care management analytics with Copilot for Power BI
Healthcare data solutions care management analytics capability offers a comprehensive template using the medallion Lakehouse architecture to unify and analyze diverse data sets of meaningful insights. This enables enhanced care coordination, improved patient outcomes, and scalable, sustainable insights. As the healthcare industry faces rising costs and growing demand for personalized care, data and AI are becoming critical tools. Copilot for Power BI leads this shift, blending AI-driven insights with advanced visualization to revolutionize care delivery. What is Copilot for Power BI? Copilot is an AI-powered assistant embedded directly into Power BI, Microsoft's interactive data visualization platform. By leveraging natural language processing and machine learning, Copilot helps users interact with their data more intuitively whether by asking questions in plain English, generating complex calculations, or uncovering patterns that might otherwise go unnoticed. Copilot for Power BI is embedded within healthcare data solutions, allowing care management—one of its core capabilities—to harness these AI-driven insights. In the context of care management analytics, this means turning a sea of clinical, claims, and operational data into actionable insights without needing to write a single line of code. This empowers teams across all technical levels to gain value from data. Driving better outcomes through intelligent insights in care management analytics The Care Management Analytics solution, built on the Healthcare data solutions platform, leverages Power BI with Copilot embedded directly within it. Here’s how Copilot for Power BI is revolutionizing care management: Enhancing decision-making with AI Traditionally, deriving insights from healthcare data required technical expertise and hours of analysis. Copilot simplifies this by allowing care managers and clinicians to ask questions like “Analyze which medical conditions have the highest cost and prevalence in low-income regions.” The AI interprets these queries and responds with visualizations, trends, and predictions—empowering faster, data-driven decisions. Proactive care planning By analyzing historical and real-time data, Copilot helps identify at-risk patients before complications arise. This enables care teams to intervene earlier, design more personalized care plans, and ultimately improve outcomes while reducing unnecessary hospitalizations. Operational efficiency From staffing models to resource allocation, Copilot provides visibility into operational metrics that can drive significant efficiency gains. Healthcare leaders can quickly identify bottlenecks, monitor key performance indicators (KPIs) and simulate “what-if” scenarios, enabling more i nformed, data-backed decisions on care delivery models. Reducing costs without compromising quality Cost containment is a constant challenge in healthcare. By highlighting areas of high spend and correlating them with clinical outcomes, Copilot empowers organizations to optimize care pathways and eliminate inefficiencies ensuring patients receive the right care at the right time, without waste. Democratizing data access Perhaps one of the most transformative aspects of Copilot is how it democratizes access to analytics. Non-technical users from care coordinators to nurse managers can interact with dashboards, explore data, and generate insights independently. This cultural shift encourages a more data-literate workforce and fosters collaboration across teams. Real-world impact Consider a healthcare system leveraging Power BI and Copilot to manage chronic disease populations more effectively. By combining claims data, social determinants of health (SDoH) indicators, and patient-reported outcomes, care teams can gain a comprehensive view of patient needs- enabling more coordinated care and proactively identifying care gaps. With these insights, organizations can launch targeted outreach initiatives that reduce avoidable emergency department (ED) visits, improve medication adherence, and ultimately enhance outcomes. The future is here The integration of Copilot for Power BI marks a pivotal moment for healthcare analytics. It bridges the gap between data and action, bringing AI to the frontlines of care. As the industry continues to embrace value-based care models, tools like Copilot will be essential in achieving the triple aim: better care, lower costs, and improved patient experience. Copilot is more than a tool — it is a strategic partner in you care transformation journey. Deployment of care management analytics Showcasing how a Population Health Director uncovers actionable insights through Copilot Note: To fully leverage the capabilities of the solution, please follow the deployment steps provided and use the sample data included with the Healthcare Data Solution. For more information on care management analytics, please review our detailed documentation and get started with transforming your healthcare data landscape today Overview of care management analytics - Microsoft Cloud for Healthcare | Microsoft Learn Deploy and analyze using Care management analytics - Training | Microsoft Learn. Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations.Microsoft Fabric healthcare data model querying and identifier harmonization
The healthcare data model in Healthcare data solutions (HDS) in Microsoft Fabric is the silver layer of the medallion and is based on the FHIR R4 standard. Native FHIR® can be challenging to query using SQL because its reference properties (foreign keys) often follow varying formats, complicating query writing. One of the benefits of the silver healthcare data model is harmonizing these ids to create a simpler and more consistent query experience. Today we will walk through writing spark SQL and T-SQL queries against the silver healthcare data model. Supporting both spark SQL and T-SQL provides flexibility to users to use the compute engine they are most comfortable with and that is most suitable for the use case. The examples below leverage the synthetic sample data set that is included with Healthcare data solutions. The spark SQL queries can be written and run from a Fabric notebook while the T-SQL queries can be run from the SQL Analytics endpoint of the silver lakehouse or a T-SQL Fabric notebook. Simple query Let’s look at a simple query: finding the first instance of a Patient named “Andy”. Example spark-SQL query SELECT * FROM Patient WHERE name[0].given[0] = 'Andy' LIMIT 1 Example T-SQL query SELECT TOP(1) * FROM Patient WHERE JSON_VALUE(name_string, '$[0].given[0]') = 'Andy' Beyond syntax differences between SQL dialects, a key distinction is that T-SQL uses JSON functions to interpret complex fields, while Spark SQL can directly interact with complex types (Note: complex types are those columns of type: struct, list, or map vs. primitive types whose columns or of types like string or integer). Part of the silver transformations include adding _string suffixed column for each complex column to support querying this data from the T-SQL endpoint. Without the _string columns these complex columns would not be surfaced for T-SQL to query. You can see above that in the T-SQL version the column name_string is used while in the spark SQL version name is used directly. Note: in the example above, we are looking at the first name element, but the queries could be updated to search for the first “official” name, for example, vs. relying on an index. Keys and references Part of the value proposition of the healthcare data model is key harmonization. FHIR resources have ids that are unique, should not change, and can be logically thought of like a primary key for the resource. FHIR resources can relate to each other through references which can be logically thought of as foreign keys. FHIR references can refer to the related FHIR resource through FHIR id or through business identifiers which include a system for the identifier as well as a value (e.g. reference by MRN instead of FHIR id). Note: although ids and references can logically be thought of as primary keys and foreign keys, respectively, there is no actual key constraint enforcement in the lakehouse. In healthcare data solutions in Microsoft Fabric these resource level FHIR ids are hashed to ensure uniqueness across multiple source systems. FHIR references go through a harmonization process outlined with the example below to make querying in a SQL syntax simpler: Example raw observation reference field from sample ndjson file "subject": { "reference": "Patient/904d247a-0fc3-773a-b564-7acb6347d02c" }, Example of the observation’s harmonized subject reference in silver "subject":{ "type": "Patient", "identifier": { "value": "904d247a-0fc3-773a-b564-7acb6347d02c", "system": "FHIR-HDS", "type": { "coding": [ { "system": "http://terminology.hl7.org/CodeSystem/v2-0203", "code": "fhirId", "display": "FHIR Id" } ], "text": "FHIR Id" } }, "id": "828dda871b817035c42d7f1ecb2f1d5f10801c817d69063682ff03d1a80cadb5", "idOrig": "904d247a-0fc3-773a-b564-7acb6347d02c", "msftSourceReference": "Patient/904d247a-0fc3-773a-b564-7acb6347d02c" } You’ll notice the subject reference contains more data in silver than the raw representation. You can see a full description of what is added here. This reference harmonization makes querying from a SQL syntax easier as you don’t need to parse FHIR references like “Patient/<id>” or include joins on both FHIR ids and business identifiers in your query. If your source data only uses FHIR ids, the id property can be used directly in joins. If your source data uses a mixture of FHIR ids and business identifiers you can query by business identifier consistently as you see even when FHIR id is used, HDS adds a FHIR business identifier to the reference. NOTE: you can see examples of business identifier-based queries in the Observational Medical Outcomes Partnership (OMOP) dmfAdapter.json file which queries resources by business identifier. Here are 2 example queries looking for the top 5 body weight observations of male patients by FHIR id. Example spark-SQL query SELECT o.id FROM observation o INNER JOIN patient p on o.subject.id = p.id WHERE p.gender = 'male' AND ARRAY_CONTAINS (o.code.coding.code, '29463-7') LIMIT 5 Example T-SQL query SELECT TOP 5 o.id FROM Observation o INNER JOIN Patient p ON JSON_VALUE(o.subject_string, '$.id') = p.id WHERE p.gender = 'male' AND EXISTS ( SELECT 1 FROM OPENJSON(o.code_string, '$.coding') WITH (code NVARCHAR(MAX) '$.code') WHERE code = '29463-7' ) You’ll notice the T-SQL query is using JSON functions to interact with the string fields while the spark SQL query can natively handle the complex types like the previous query. The joins themselves though are using the id property directly as we know in this case only FHIR ids are being used. By using the id property, we do not need to parse a string representation like “Patient/<id>” to do the join. Overall we’ve shown how either spark SQL or T-SQL can be used to query the same set of silver data and also how key harmonization helps when writing SQL based queries. We welcome your questions and feedback in the comments section at the end of this post! Helpful links For more details of to start building your own queries, explore these helpful resources: Healthcare data solutions in Microsoft Fabric FHIR References T-SQL JSON functions T-SQL surface area in Fabric FHIR® is the registered trademark of HL7 and is used with permission of HL7.1.4KViews2likes0CommentsA scalable and efficient approach for ingesting medical imaging data using DICOM data transformation
The transformation of Digital Imaging and Communications in Medicine (DICOM®) data is a crucial capability in healthcare data solutions (HDS). This feature allows healthcare providers to bring their DICOM® data into Fabric OneLake, enabling the ingestion, storage, and analysis of imaging metadata from various modalities such as X-rays, Computerized Tomography (CT) scans, and Magnetic Resonance Imaging (MRI) scans. By leveraging this capability, healthcare organizations can enhance their digital infrastructure, improve patient care, and streamline their data management processes. Ingestion patterns The capability provides various ingestion mechanisms for processing medical images based on the use case. For processing small datasets, comprising of medical images not more than 10 million at once, customers can either choose to upload their data to Fabric's OneLake or connect an external storage to Fabric. Let’s try to understand the rationale behind the 10 million image limit. Both the ingestion options as mentioned above setup spark structured streaming on the input DICOM® files. During file listing, one of the steps before the start of the actual processing, spark tries to gather the input files metadata like file paths, timestamps associated with the files etc. When dealing with millions of files, file listing process itself is split across multiple executor nodes. Once the metadata is fetched by the executor nodes, it is collected back at the driver node for making the execution plan. The memory allocated for storing file listing results is controlled by spark property called driver.maxResultSize. The default value for this configuration can vary based on the platform - spark in Microsoft Fabric defaults it to 4GB. Users can estimate the results collected at the driver for an input dataset by understanding the input folder structure for file paths and keeping some buffer for overhead. They need to make sure the file listing result is not more than the allocated space (4GB) to avoid Out of Memory (OOM) issues. Based on our experiments, it turned out that 10 million limit on the number of input files will give a reliable and successful run with the aforesaid memory. Now, driver.maxResultSize is a spark configuration, and it can be set to higher value to increase the allocated space for collecting file listing results. Based on spark architecture, driver memory is split into multiple portions for various purposes. Users need to be careful while increasing the value for this property as it can hamper other functioning of the spark architecture. Refer to the below table for more details on tuning the property appropriately in Microsoft Fabric. Note: Below is a reference table illustrating how various node sizes and configurations impact the file listing capacity and result sizes. The table presents rough estimates based on a test dataset deployed in the default unified folder structure in HDS. These values can serve as an initial reference but may vary depending on specific dataset characteristics. Users should not expect identical numbers in their own datasets or use cases. Node size Available memory per node (GB) Driver node vCores Executor node vCores Driver.maxResultSize (GB) File paths size (GB) Files listed (millions) Medium 56 8 8 4 3.38 10.8 Large 112 8 16 8 6.75 21.6 XL 224 8 32 16 12.81 41 XXL 400 16 64 24 15.94 51 Emergence of Inventory based ingestion Microsoft Fabric provides a variety of compute nodes which can be utilized for different use cases. The highest configuration node can be XX-Large with 512GB memory and 64 vCores. Even with such a node configuration, we can increase driver.maxResultSize to a certain limit. Thereby, posing a restriction on the dataset size which can be ingested in a single run. One way to tackle this problem is to segregate the entire dataset into smaller chunks, which is exactly the purpose of having the unified folder structure in HDS where data is segmented by default into date folders, such that the file listing result for a single chunk is within the limits of allocated memory. However, it might not be feasible to make changes every time at the data source. This is where HDS Inventory Based Ingestion comes into play, enabling the scalable ingestion of DICOM® imaging files into Fabric. Inventory based ingestion is built on an approach of segregating the file listing step from the core processing logic. This means, given the files metadata information aka inventory files, which is analogous to file listing result, users don’t need to setup the spark streaming on the folder containing DICOM® files directly rather they can consume the metadata information from inventory files and initiate the core processing logic. This way we avoid the OOM issues arising due to file listing. In case your data resides in Azure gen2 storage account, there is an out of the box service called Azure storage blob inventory to generate inventory files in parquet format. However, while inventory based ingestion does support other storages as well, users need to provide the inventory files in a required format and follow some minimal configuration changes. Capability configurations This capability includes various configuration levers which can be configured by updating deployment parameters config to tune the ingestion process for better throughput. max_files_per_trigger – this is an interface for using maxFilesPerTrigger in spark structured streaming. It is defaulted to 100K. For inventory-based ingestion, it is advisable to lower down this number to either 1 or 2 based on number of records contained in each parquet file. max_bytes_per_trigger – this is an interface for using maxBytesPerTrigger in spark structured streaming. This option doesn’t work directly with all input files as source. However, it works on parquet files as source and thus becomes relevant when using Inventory based ingestion. This is defaulted to 1GB. rows_per_partition – this option is specifically designed for Inventory based ingestion, where the number of default partitions might not be efficient for full utilization of available resources. In a given execution batch, the number of input files is divided by this number to repartition the dataframe. Default value is 250. This implies, let’s say if the current batch size is 10Million then it would create 10Million/250 = 40k partitions which translates to 40k spark tasks. DICOM® is the registered trademark of the National Electrical Manufacturers Association (NEMA) for its Standards publications relating to digital communications of medical information. Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations.1.5KViews0likes0CommentsBuilding Healthcare Research Data Platform using Microsoft Fabric
Co-Authors: Manoj Kumar, Mustafa Al-Durra PhD, Kemal Kepenek, Matt Dearing, Praneeth Sanapathi, Naveen Valluri Overview Research data platforms in healthcare providers, academic medical centers (AMCs), and research institutes support research, clinical decision making, and innovation. They consolidate data from various sources, making it accessible for comprehensive analysis and fostering collaboration among research teams. These platforms automate data collection, processing, and delivery, reducing time and effort needed for data management. This allows researchers to focus on their core activities while ensuring data security and regulatory compliance. The ability to work with multimodal data encourages interdisciplinary and interorganizational collaboration, uniting experts to address complex healthcare challenges. Current challenges Researchers face many common challenges as they work with multimodal healthcare data: Data integration and curation: The process of integrating various data types, such as clinical notes, imaging data, genomic information, and sensor data, presents significant challenges due to differences in formats, standards, and sources. Each AMC employs unique methods for data curation, with some utilizing on-premises solutions and others adopting hybrid cloud systems. No standardized approach currently exists for data curation, necessitating considerable organizational efforts to ensure data consistency and quality. Furthermore, data deidentification is often required to safeguard patient privacy. Data discovery and building cohorts: The lack of a unified multimodal data platform leads to the segregation of data across different modalities. Cohort discovery for each modality is performed separately and often lacks a self-service option, necessitating additional human resources to assist researchers in the data discovery process. This issue is particularly significant because researchers who require Institutional Review Board (IRB) approval cannot access the data beforehand but still need an effective method to identify and explore cohorts. Data delivery: Sensitive patient data, after institutional review board approval, must comply with privacy regulations like the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), requiring secure transfer to prevent breaches. The data, sourced from various systems, needs processing for research readiness. Delivering unified data from modalities like imaging, genomics, and health records is challenging. Typically, research IT teams curate cohort data and deliver it to an SQL database or a file share, accessed by researchers via secure virtual machines. This method often leads to data duplication, creating significant overhead due to numerous ongoing research projects. Cost management: Research projects are funded by government grants and private organizations. Managing these costs is challenging. Research IT departments often implement chargebacks for transparency and accountability in resource use. However, there is a disconnect between funding models and operations. Research teams favor capital expenditure (CapEx) with upfront funding for long-term resources, while cloud platforms operate on operational expenditure (OpEx), incurring ongoing costs based on usage. This shift can lead to concerns about unpredictable costs and budgeting difficulties. Bridging this gap requires careful planning, communication, and hybrid financial strategies to align research needs with cloud-based systems. Compliance with regulations: Healthcare research uses sensitive patient data, requiring strict adherence to HIPAA and GDPR. Transparency in data handling is essential but complex. Researchers must document disclosures thoroughly, detailing who accessed the data and for what purpose. However, tracking and auditing are often fragmented due to inconsistent systems. Variability in disclosure requirements from different agencies adds to compliance challenges. Balancing an auditable trail with privacy and manageable administrative tasks is crucial. Research data platform requirements Ability to curate multi modal data into the research data platform Ability for researchers to identify cohorts (without seeing data) to submit data requests to IRB Automated data delivery after IRB workflow approves the request to access relevant data Tools for researchers as part of the same platform Secure and regulatory-compliant environment for research. An approach to building a research data platform using Microsoft Fabric This article serves as a guide to healthcare organizations, offering a point of view and a prescriptive guidance on building a research data platform using Microsoft Fabric. The solution uses several features from healthcare data solutions in Microsoft Fabric, including its discover and build cohorts capability, and features from the Fabric platform. Microsoft Fabric: is a unified, AI-powered data platform designed to simplify data management and analytics. It integrates various tools and services to handle every stage of the data lifecycle, including ingestion, preparation, storage, analysis, and visualization. Fabric is built on a Software as a Service (SaaS) foundation, offering seamless experience for organizations to make data-driven decisions. For additional details, refer to the following link: What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn Healthcare data solutions in Fabric: Healthcare data solutions in Fabric help you accelerate time to value by addressing the critical need to efficiently transform healthcare data into a suitable format for analysis. With these solutions, you can conduct exploratory analysis, run large-scale analytics, and power generative AI with your healthcare data. By using intuitive tools such as data pipelines and transformations, you can easily navigate and process complex datasets, overcoming the inherent challenges associated with unstructured data formats. For additional details, refer to the following links: Healthcare data solutions in Microsoft Fabric - Microsoft Cloud for Healthcare | Microsoft Learn Discover and build cohorts: Discover and build cohorts (preview) capability in healthcare data solutions enables healthcare organizations to efficiently analyze and query healthcare data from multiple sources and formats. It simplifies the preparation of data for health trend studies, clinical trials, quality assessments, historical research, and AI development. It supports natural language queries for multimodal data exploration and cohort building, making it ideal for research and AI-driven projects. For additional details, refer to the following link: Overview of discover and build cohorts (preview) - Microsoft Cloud for Healthcare | Microsoft Learn The proposal for research data platform architecture builds upon the following foundational premises: Recognition of Fabric as the all-in-one data storage, processing, management and analytics platform with enterprise-level features around security, availability and self-service. Adoption of Fabric Workspace(s) as the security boundary (a secure logical container) for maintaining data platform items (data storage and processing assets). Fabric workspaces may be provisioned for and used by different research data platform stakeholders (groups of users) with different requirements around use cases, data privacy, data sensitivity and access security. Use of healthcare data solutions in Fabric, as the core capability to maintain healthcare data assets in a standard (interoperable) manner. Healthcare data solutions enables the storage and processing of several healthcare data modalities and formats that follow industry standards (for example, clinical modality in FHIR® NDJSON format and Clinical-Imaging modality’s DICOM® format). Industry standards make it easier for research data platform stakeholders to share (exchange) data and insights within their own organization as well as (when needed) with other organizations that they collaborate with. Use of Fabric native capabilities to address requirements that may not (yet) have been implemented for healthcare specific needs. This provides the research data platform stakeholders with the flexibility to develop various data storage and processing workloads easily in a low (or no) code manner. Fig – Conceptual architecture of research data platform in Microsoft Fabric Note: This diagram is an architectural pattern and does not constitute one to one mapping of existing Microsoft products. Organizing source data in data workspace (One Data Hub in the above diagram) Organize your enterprise data into a data workspace that could be leveraged for research purposes. This acts as a ‘One Data Hub’ for the research data platform. Multiple Lakehouse can be present in this workspace. There should be at least one Lakehouse that organizes data using ‘unified folder structure’ best practice. Convert data from non-supported format to healthcare data solutions supported format to leverage out of the box transformation for multimodal data: For healthcare data solutions supported modalities: Implement custom transformations to convert data to supported modalities/format. For unsupported modalities: Implement extensions to bronze Lakehouse to accommodate additional data modalities. Epic data availability: Epic supports FHIR data export using Bulk FHIR APIs. If your dataset meets the use cases of Epic Bulk Data, you can store the resulting FHIR resources into One Data Hub for further transformation. Avoid data content duplication: Data duplication cannot be totally avoided. However, the same file and same content are never duplicated. There will be situations when data needs to be transformed to suit the needs of existing transformation pipelines for accelerating research data platform development. Additionally, OneLake in Fabric storage, where Lakehouse is maintained, uses file compression. Healthcare data solutions in Fabric has functionality to compress raw files to zip and always writes structured data to delta parquet which is a higher compressed format. More information can be found here - Data architecture and management in healthcare data solutions - Microsoft Cloud for Healthcare | Microsoft Learn Curating data for research (One Analytics workspace in the above diagram) Implement and extend Silver Lakehouse: A flattened FHIR® data model is provided by healthcare data solutions out of the box within the Silver Lakehouse. Extending the existing data model is possible through adding new columns to existing tables or through adding new tables in the Silver Lakehouse. If there is a need to introduce a different data model altogether, it is best to implement it using a different Lakehouse. Implement and extend Gold Lakehouse: Deploy and extend Observational Medical Outcomes Partnership Common Data Model (OMOP CDM): Deploy OMOP CDM 5.4 out of the box with healthcare data solutions deployment. Extend OMOP CDM to accommodate additional modalities. For example, implement Gene sequencing, Variant occurrence and Variant annotation tables to add genomics modality into OMOP CDM or implement medical imaging data on OMOP CDM as described here - Development of Medical Imaging Data Standardization for Imaging-Based Observational Research: OMOP Common Data Model Extension - PubMed Implement custom Gold Lakehouse(s): Implement other custom Gold Lakehouse using Fabric tools that run your transformation logic from Silver to Gold. These Lakehouse cannot be connected to discover and build cohorts capability within healthcare data solutions. Customers that need access to custom data can connect their custom cohort browsers to the SQL Analytics Endpoint(s) of their custom Gold Lakehouse(s). Enable data de-identification: Microsoft provides several solutions that can be used to implement a comprehensive de-identification solution that customers expect. Refer to the articles below for details. Dynamic data masking in Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn Row-level security in Fabric data warehousing - Microsoft Fabric | Microsoft Learn Column-level security in Fabric data warehousing - Microsoft Fabric | Microsoft Learn Announcing a de-identification service for Health and Life Sciences | Microsoft Community Hub Cohort discovery using cohort builder tool Microsoft’s cohort browser: Today Discovery and Build Cohort supports eyes-on cohort discovery. This is an out of the box solution that is part of healthcare data solutions in Fabric. When eyes off discovery is supported, researchers as well as research IT can benefit from both eyes off and eyes on discovery and cohort building. 3rd-party cohort browser (e.g., OHDSI Atlas): Most 3rd party cohort browsers (E.g. OHDSI Atlas) and home-grown cohort browsers typically support connection to a SQL endpoint. Microsoft Fabric platform provides the capability of exposing SQL endpoint from a Lakehouse that can be connected to a 3rd party cohort browser to perform cohort discovery. Automated data delivery Creating research workspaces with cohort needed for research: Create separate workspaces for different research projects to keep Fabric items distinct and project specific using Fabric APIs. Assign workspaces to a Fabric capacity: Note: When needed, and if the organization has more than one Fabric capacity provisioned, workspace assignment can be spread across different capacities to help manage cost and performance. Next, set up a Lakehouse and provide access for team members (as per IRB approval list). This ensures both access and security at the workspace level. Export data to research workspace (format desired by researchers): Currently, DBC exports data as CSV/JSON files stored in a Lakehouse within the same workspace. Shortcut the destination Lakehouse into research workspace to keep the sanity of cohort data. Tools for researchers: Fabric provides several data engineering and data science tools out of the box that researchers can leverage to perform research. The following are some of the documents that customers can use to enable researchers with the tools of choice. Data science in Microsoft Fabric - Microsoft Fabric | Microsoft Learn Create, configure, and use an environment in Fabric - Microsoft Fabric | Microsoft Learn Migrate libraries and properties to a default environment - Microsoft Fabric | Microsoft Learn Charge back: Fabric compute pricing depends on the chosen Fabric capacity SKU. Assigning different Fabric capacities to different projects or groups within the same cost center can facilitate chargeback. See the step mentioned above on assigning a workspace to a Fabric capacity during workspace creation. Manage historic data migration to the research data platform on Fabric In most instances, customers already possess a research data platform. They seek to transition to this proposed solution without disrupting their current research data flow and obligations. Follow this approach to migrate or use data from the existing platform to the new one: Use your current research data platform as a Lakehouse or a Data Warehouse in Fabric (take advantage of Shortcut and Mirroring features available in Fabric). Fabric offers cross-database query, i.e. allowing to query and join multiple Lakehouse and data warehouses in a single query. Customers can choose how and where to implement such queries to augment the healthcare data solutions datasets with their existing datasets, all natively in Fabric. A bridge/mapping layer can be built to link the old and the new in a cross-relational way. Conceptually, such an approach has also ties to Bring Your Own Database (BYO-DB) requirement, which is the ability to bring custom defined format and still be able to easily convert to healthcare data solutions specific format. Other workflow integration Integrate research data platform with IRB workflow: IRB workflows are dependent on the tools utilized. For instance, eIRB solution from Huron. While there is currently no direct integration between IRB workflows and the research data platform on Fabric, it is possible to develop a connector using Power Platform integration with Fabric. Specific details are not available at this time as this remains an exploratory initiative. Another approach will be to use Fabric REST APIs (as a pro-code method) that can enable richer integration between Fabric and the 3 rd -party system, and a better consuming user experience at the end. Capture logs necessary for “accounting of disclosures”: Logs in Fabric can be captured at event level. It’s up to the customer to decide the level and type of logs that need to be captured for accounting of disclosure. This will need some custom implementation. One such capability of Fabric that can be used is: Track user activities in Microsoft Fabric - Microsoft Fabric | Microsoft Learn FHIR® is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office and is used with their permission. DICOM® is the registered trademark of the National Electrical Manufacturers Association (NEMA) for its Standards publications relating to digital communications of medical information. If you are a Microsoft customer needing further information, support, or guidance related to the content in this blog, we recommend you reach out to your Microsoft account team in order to set up a discussion with the authors.1.8KViews3likes0CommentsHealthcare data solutions in Microsoft Fabric ALM support
Healthcare data solutions in Microsoft Fabric released support for Application Lifecycle Management (ALM). This new capability provides common tooling for your team to manage and deploy their work as part of an enterprise development and release processes.1.3KViews0likes0CommentsGeneral Availability - Medical imaging DICOM® in healthcare data solutions in Microsoft Fabric
As part of the healthcare data solutions in Microsoft Fabric, the DICOM® (Digital Imaging and Communications in Medicine) data transformation is now generally available. Our Healthcare and Life Sciences customers and partners can now ingest, store, transform and analyze DICOM® imaging datasets from various modalities, such as X-rays, CT scans, and MRIs, directly within Microsoft Fabric. This was made possible by providing a purpose-built data pipeline built to top of the medallion Lakehouse architecture. The imaging data transformation capabilities enable seamless transformation of DICOM® (imaging) data into tabular formats that can persist in the lake in FHIR® (Fast Healthcare Interoperability Resources) (Silver) and OMOP (Observational Medical Outcomes Partnership) (Gold) formats, thus facilitating exploratory analysis and large-scale imaging analytics and radiomics. Establishing a true multi-modal biomedical Lakehouse in Microsoft Fabric Along with other capabilities in the healthcare data solutions in Microsoft Fabric, this DICOM® data transformation will empower clinicians and researchers to interpret imaging findings in the appropriate clinical context by making imaging pixel and metadata available alongside the clinical history and laboratory data. By integrating DICOM® pixels and metadata with clinical history and laboratory data, our customers and partners can achieve more with their multi-modal biomedical data estate, including: Unify your medical imaging and clinical data estate for analytics Establish a regulated hub to centralize and organize all your multi-model healthcare data, creating a foundation for predictive and clinical analytics. Built natively on well-established industry data models, including DICOM®, FHIR® and OMOP. Build fit-for-purpose analytics models Start constructing ML and AI models on a connected foundation of EHR and pixel-data. Enable researchers, data scientists and health informaticians to perform analysis on large volumes of multi-model datasets to achieve higher accuracy in diagnosis, prognosis and improved patient outcomes 1 . Advance research, collaboration and sharing of de-identified imaging Build longitudinal views of patients’ clinical history and related imaging studies with the ability to apply complex queries to identify patient cohorts for research and collaboration. Apply text and imaging de-identification to enable in-place sharing of research datasets with role-based access control. Reduce the cost of archival storage and recovery Take advantage of the cost-effective, HIPAA compliant and reliable cloud-based storage to back up your medical imaging data from the redundant storage of on-prem PACS and VNA systems. Improve your security posture with a 100% off-site cloud archival of your imaging datasets in case of unplanned data loss. Employ AI models to recognize pixel-level markers and patterns Deploy existing precision AI models such as Microsoft’s Project InnerEye and NVIDIA’s MONAI to enable automated segmentation of 3D radiology imaging that can help expedite the planning of radiotherapy treatments and reduce waiting times for oncology patients. Conceptual architecture The DICOM® data transformation capabilities in Microsoft Fabric continue to offer our customers and partners the flexibility to choose the ingestion pattern that best meets their existing data volume and storage needs. At a high level, there are three patterns for ingesting DICOM® data into the healthcare data solutions in Microsoft Fabric. Depending on the chosen ingestion pattern, there are up to eight end-to-end execution steps to consider from the ingestion of the raw DICOM® files to the transformation of the Gold Lakehouse into the OMOP CDM format, as depicted in the conceptual architecture diagram below. To review the eight end-to-end execution steps, please refer to the Public Preview of the DICOM® data ingestion in Microsoft Fabric. Conceptual architecture and ingestion patterns of the DICOM® data ingestion capability in Microsoft Fabric You can find more details about each of those three ingestion patterns in our public documentation: Use DICOM® data ingestion - Microsoft Cloud for Healthcare | Microsoft Learn Enhancements in the DICOM® data transformation in Microsoft Fabric. We received great feedback from our public preview customers and partners. This feedback provided an objective signal for our product group to improve and iterate on features and the product roadmap to make the DICOM® data transformation capabilities more conducive and sensible. As a result, several new features and improvements in DICOM® data transformation are now generally available, as described in the following sections: All DICOM® Metadata (Tags) are now accessible in the Silver Lakehouse We acknowledge the importance and practicality to avail all DICOM® metadata, i.e. tags, in the Silver Lakehouse closer to the clinical and ImagingStudy FHIR® resources. This makes it easier to explore any existing DICOM® tags from within the Silver Lakehouse. It also helps position the DICOM® staging table in the Bronze Lakehouse (ImagingDICOM) as a transient store, i.e., after the DICOM® metadata is processed and transformed from the bronze Lakehouse to the Silver Lakehouse, the data in the bronze staging table can now be considered as ready to be purged. This ensures cost and storage efficiency and reduces data redundancy between source files and staging tables in the bronze Lakehouse. Unified Folder Structure OneLake in Microsoft Fabric offers a logical data lake for your organization. Healthcare data solutions in Microsoft Fabric provide a unified folder structure that helps organize data across various modalities and formats. This structure streamlines data ingestion and processing while maintaining data lineage at the source file and source system levels in the bronze Lakehouse. A complete set of unified folders, including the Imaging modality and DICOM® format, is now deployed as part of the healthcare data foundation deployment experience in the healthcare data solutions in Microsoft Fabric. Purpose-built DICOM® data transformation pipeline Healthcare data foundations offer ready-to-run data pipelines that are designed to efficiently structure data for analytics and AI/machine learning modeling. We introduce an imaging data pipeline to streamline the end-to-end execution of all activities in the DICOM® data transformation capabilities. The DICOM® data transformation in the imaging data pipeline consists of the following stages: The pipeline ingests and persists the raw DICOM® imaging files, present in the native DCM format, in the bronze Lakehouse. Then, it extracts the DICOM® metadata (tags) from the imaging files and inserts them into the ImagingDICOM table in the bronze Lakehouse. The data in the ImagingDICOM will then be converted to FHIR® ImagingStudy NDJSON files, stored in OneLake. The data in the ImagingStudy NDJSON files will be transformed to relational FHIR® format and ingested in the ImagingStudy delta table in the Silver Lakehouse. Compression-by-design Healthcare data solutions in Microsoft Fabric support compression-by-design across the medallion Lakehouse design. Data ingested into the delta tables across the medallion Lakehouse are stored in a compressed, columnar format using parquet files. In the ingest pattern, when the files move from the Ingest folder to the Process folder, they will be compressed by default after successful processing. You can configure or disable the compression as needed. The imaging data transformation pipeline can also process the DICOM® files in a raw format, i.e. dcm files, and/or in a compressed format, i.e. ZIP format of dcm files/folders. Global configuration The admin Lakehouse was introduced in this release to manage cross-Lakehouse configuration, global configuration, status reporting, and tracking for healthcare data solutions in Microsoft Fabric. The admin Lakehouse system-configurations folder centralizes the global configuration parameters. The three configuration files contain preconfigured values for the default deployment of all healthcare data solutions capabilities. You can use the global configuration to repoint the data ingestion pipeline to any source folder other than the unified folder configured by default. You can also configure any of the input parameters for each activity in the imaging data transformation pipeline. Sample Data In this release, a more comprehensive sample data is provided to help you run the data pipelines in DICOM® data transformation end-to-end and explore the data processing in each step through the medallion Lakehouse, Bronze, Silver and Gold. The imaging sample data may not be clinically meaningful, but they are technically complete and comprehensive to demonstrate the full DICOM® data transformation capabilities 2 . In total, the sample data for DICOM® data transformation contains 340, 389 and 7739 DICOM® studies, series and instances respectively. One of those studies, i.e. dcm files, is an invalid DICOM® study, which was intentionally provided to showcase how the pipeline manages files that do not conform to the DICOM® format. Those sample DICOM® studies are related to 302 patients and those patients are also included in the sample data for the clinical ingestion pipeline. Thus, when you ingest the sample data for the DICOM® data transformation and clinical data ingestion, you will have a complete view that depicts how the clinical and imaging data would appear in a real-world scenario. Enhanced data lineage and traceability All delta tables in the Healthcare Data Model in the Silver Lakehouse now have the following columns to ensure lineage and traceability at the record and file level. msftCreatedDatetime: the datatime at which the record was first created in the respective delta table in the Silver Lakehouse msftModifiedDatetime: the datatime at which the record was last modified in the respective delta table in the Silver Lakehouse msftFilePath: the full path to the source file in the Bronze Lakehouse (including shortcut folders) msftSourceSystem: the source system of this record. It corresponds to the [Namespace] that was specified in the unified folder structure. As such, and to ensure lineage and traceability extend to the entire medallion Lakehouse, the following columns are added to the OMOP delta table in the Gold Lakehouse: msftSourceRecordId: the original record identifier from the respective source delta table in the Silver Lakehouse. This is important because OMOP records will have newly generated IDs. More details are provided here. msftSourceTableName: the name of the source delta table in the Silver Lakehouse. Due to the specifics of FHIR-to-OMOP mappings, there are cases where many OMOP tables in the Gold Lakehouse may be sourced from the same/single FHIR table in the Silver Lakehouse, such as the OBSERVATION and MEASUREMENT OMOP delta tables in the Gold Lakehouse that are both sources from the Observation FHIRL delta table in the Silver Lakehouse. There is also the case where a single delta table in the Gold Lakehouse may be sourced from many delta tables in the Silver Lakehouse, such as the LOCATION OMOP table that could be sourced from either the Patient or Organization FHIR table. msftModifiedDatetime: the datatime at which the record was last modified in the respective delta table in the Silver Lakehouse. In summary, this article provides comprehensive details on how the DICOM® data transformation capabilities in the healthcare data solutions in Microsoft Fabric offer a robust and all-encompassing solution for unifying and analyzing the medical imaging data in a harmonized pattern with the clinical dataset. We also listed major enhancements to these capabilities that are now generally available for all our healthcare and life sciences customers and partners. For more details, please refer to our public documentation: Overview of DICOM® data ingestion - Microsoft Cloud for Healthcare | Microsoft Learn 1 S. Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S. Duncan, Bram van Ginneken, Anant Madabhushi, Jerry L. Prince, Daniel Rueckert, Ronald M. Summers A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. arXiv:2008.09104 2 Microsoft provides the Sample Data in the Healthcare data solutions in Microsoft Fabric on an "as is" basis. This data is provided to test and demonstrate the end-to-end execution of data pipelines provided within the Healthcare data solutions in Microsoft Fabric. This data is not intended or designed to train real-world or production-level AI/ML models, or to develop any clinical decision support systems. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental, or punitive, resulting from your use of this data. The Sample Data in the Healthcare data solutions in Microsoft Fabric is provided under the Community Data License Agreement – Permissive – Version 2.0 DICOM® is the registered trademark of the National Electrical Manufacturers Association (NEMA) for its Standards publications relating to digital communications of medical information. FHIR® is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office, and is used with their permission.Driving Better Patient Outcomes with Care Management Analytics in Healthcare data solutions
In today's rapidly evolving healthcare landscape, effective data driven decision is more crucial than ever. The ability to analyse, manage, and optimize patient care processes relies on the seamless integration of diverse data sources like clinical, claims, social determinants of health etc. Leveraging the innovative medallion Lakehouse architecture, care management analytical template capabilities provide a robust platform for organizations to derive actionable insights and drive better patient outcomes. The Medallion Lakehouse for Care Management analytics Built on the foundation of the healthcare data solutions in Microsoft Fabric which utilizes the medallion Lakehouse architecture. This architecture consists of three foundational layers, each playing a critical role in transforming raw data into actionable insights: Bronze: The Raw Zone The Bronze layer serves as the raw data zone, storing all data in its original format. This data includes various sources such as patient encounters, conditions, treatment adherence records, and other relevant care management information. By maintaining this data in its raw form, organizations ensure the integrity and completeness of the dataset, providing a solid foundation for subsequent processing and analysis. Silver: The Enriched Zone In the Silver layer, data from the Bronze Lakehouse is enriched and transformed into a standardized format for analysis. This layer stores metadata and file references based on healthcare interoperability standards such as FHIR (Fast Healthcare Interoperability Resources). The enriched data provides a holistic view of the patient record, integrating different modalities in healthcare data solutions which are critical for comprehensive care analysis. Gold: The Curated Zone The Gold layer represents the curated zone, where data is refined and structured for advanced analytics and reporting. By building a comprehensive data model, the data is optimized for, predictive analytics, and reporting dashboards that can provide deep insights into care quality, patient outcomes, and operational efficiency. Conceptual Architecture Care management analytics involves integrating and analysing diverse datasets, including clinical, claims and social determinants of health data. The medallion Lakehouse architecture in Microsoft Fabric offers the flexibility to ingest and process these data types at scale. The data flows from raw data ingestion to transformation into the Gold Lakehouse format. End to End execution steps Step 1: Create a workspace and add health solutions capability. Step 2: Set up healthcare data solutions on your Fabric workspace. Follow the guidance from the deployment wizard and add sample data if needed. Step 3: Select the Care Management analytics capability and click on Deploy. Step 4: Copy the sample data downloaded into the bronze lakehouse under Process\Clinical\FHIR-HDS folder Step 5: Run the care management analytical data pipeline to transform the data from the bronze lakehouse to gold lakehouse. Step 6: Access the Power BI dashboards once the above steps are completed to view detailed visualization on Clinical and Claims data. Transforming Care Management analytics with healthcare data solutions Healthcare data solutions care management analytics capability provides a comprehensive template solution for customers and partners to unify and analyze diverse data. By leveraging the medallion Lakehouse architecture, healthcare organizations can unlock the potential of their data, enhance care coordination, and drive better patient outcomes. The seamless integration of raw, enriched, and curated data layers ensures that insights are not only actionable but also scalable and sustainable. For more information on how Healthcare data solutions can revolutionize your care management analytics, please review our detailed documentation and get started with transforming your healthcare data landscape today. https://go.microsoft.com/fwlink/?linkid=2284603 FHIR® is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office, and is used with their permission. Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations.Seamlessly use social determinants of health data in healthcare data solutions in Microsoft Fabric
Social determinants of health are the social conditions that contribute to an individual’s or a population group’s health outcomes, like place of birth, median household income, and access to transportation. Research and real-world evidence have established that SDOH information can complement medical information. This helps healthcare organizations understand their patients’ health profile more comprehensively and facilitate tailored care interventions. However, a fundamental challenge in leveraging SDOH data arises due to the lack of a standard data collection and exchange mechanism. To simplify this process, we are thrilled to announce the public preview of SDOH datasets- transformations (SDOH) in healthcare data solutions in Microsoft Fabric. It fuels large-scale analytics by enabling the unification of social determinants of health data with core healthcare domains like clinical & claims. Key features SDOH information can be seen in two forms- Public datasets that contain social determinant details aggregated at a geographic level, and patient-level SDOH data that depict those characteristics of an individual that might pose health risks. This release focuses on the public SDOH datasets, which comes with, A simple and intuitive data preparation mechanism to ready the datasets for ingestion into healthcare data solutions. The supported data formats are .csv and .xlsx. A set of powerful pipelines and notebooks that allow effortless transformation of the datasets into tabular shapes. Eight sample datasets across various SDOH domains that you can readily leverage for your use cases. As the data progresses through the medallion Lakehouse, it gets persisted within a robust data model, custom-built for the SDOH modality. This eases the process of combining SDOH data with other modalities, unlocking use cases such as Care management analytics, Risk stratification, and Population health. How it works The SDOH capability follows three simple steps to transform the disparate datasets into a unified data model, Data preparation and ingestion- As there are no established standards to collect and exchange the information captured in these datasets, it is necessary to unify them into a common shape before they can be ingested. This step requires you to add three sheets in your original dataset to capture key details like publisher information, description of the data columns, and location information. The shipped sample datasets are pre-populated with all the necessary information. Landing zone to bronze- Once the datasets are prepared, they can be uploaded into the landing zone. The bronze notebook will then populate all the key details in the bronze lake in delta table format. Bronze to silver- This notebook normalizes the data from the bronze lake into the custom SDOH data model in the silver lake by creating dedicated tables and establishing relationships between them. It preserves the context of the source tables to help you easily identify or query the data. You can trigger the SDOH pipeline to run all the steps after data preparation at one go and thereafter utilize the normalized silver lake data to build your analytical scenarios. Get started today The SDOH public preview is available in healthcare data solutions for teams to start using today. For a more detailed overview of the capability and the necessary configurations needed to deploy it, please check out the official documentation. Medical device disclaimer: Microsoft products and services (1) are not designed, intended or made available as a medical device, and (2) are not designed or intended to be a substitute for professional medical advice, diagnosis, treatment, or judgment and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment. Customers/partners are responsible for ensuring solutions comply with applicable laws and regulations.