This blog is part of a series that explores the recent announcement of the public preview of healthcare data solutions in Microsoft Fabric. Healthcare data solutions in Microsoft Fabric is a comprehensive, end-to-end analytics SaaS platform that allows you to ingest, store, and analyze healthcare data from a variety of sources, including electronic health records and picture archiving and communication systems. With this platform, you can unlock new insights and drive value from your healthcare data.
With Fabric's healthcare data solutions, your organization can now bring together data from various sources and generate insights through a unified architecture and experience. These solutions provide data models, transformation activities, and analytical tools that help enable customers to create a multi-modal lakehouse, offering a secure and governed way to connect, unify, analyze, and visualize data-driven insights across their organization.
By using Microsoft's healthcare data solutions, customers can unlock a wealth of insights, including the following:
- Clinical summarization
- Medication adherence dashboards
- Quality metric reporting
- Enabling risk predictions
- Performing clinical research (utilizing OMOP)
To see how healthcare data solutions in Microsoft Fabric provides a unified healthcare analytics solution, watch the following overview:
In this multi-part blog series, we will look at all the capabilities that are part of the healthcare data solutions in Microsoft Fabric. In this first installment, we will explore the foundational roles of the healthcare lakehouse and the healthcare data model.
Establishing a healthcare lakehouse in Microsoft Fabric
A foundational element of the healthcare data solutions in Microsoft Fabric is the healthcare lakehouse. This modern data architecture leverages industry standards such as FHIR® (Fast Healthcare Interoperability Resources) and DICOM (Digital Imaging and Communications in Medicine) and is designed to support the multi-modal data needs found in healthcare, such as claims, genomics, patient experience data (PED), and Social Determinants of Health (SDOH). Healthcare data solutions currently supports the following types of data:
Data Modality |
Status |
Structured Clinical |
Public Preview |
Unstructured Clinical |
Public Preview |
Medical Imaging |
Private Preview |
The lakehouse combines the low-cost, scalability, and flexibility of a cloud data lake with the performance and governance of a data warehouse. This design helps healthcare and life sciences organizations overcome the challenges of complex, heterogeneous, and siloed data by providing a unified platform for data management and analytics. With the healthcare lakehouse, organizations can organize all their health data at scale, deliver on their data analytics use cases, and accelerate time-to-value by unifying their data at every layer.
Unifying healthcare data within the lakehouse
A key component of the healthcare lakehouse is the healthcare data model. The healthcare data model provides a common data language that enables data analysts, data scientists, and developers to collaborate and build data-driven solutions that improve patient outcomes and business performance. It aims to support data from across the different healthcare business domains such as clinical, administrative, financial, and social. The core of the healthcare data model has been designed to capture data defined by the FHIR standard. This is done by making FHIR resources available as tables and columns within the healthcare lakehouse. By flattening the FHIR information into delta parquet tables, the healthcare data model enables the use of familiar tools like T-SQL and Spark SQL to explore and analyze the data. For data domains that are not covered by the FHIR standard, we leverage schemas from the Azure Synapse database templates. This enables bringing in non-clinical information like patient engagement data and joining it back to the core patient profile. The healthcare data model provides a unified, validated, and enriched version of the healthcare data that can be leveraged for downstream analytics.
Healthcare data model is designed to support business domains across healthcare.
Exploring the healthcare lakehouse and data model
Let’s start exploring both the healthcare lakehouse and the healthcare data model. To do this, you will need a Fabric subscription. If you don’t have a Fabric subscription, you can start a trial by following the instructions here: Fabric trial - Microsoft Fabric | Microsoft Learn.
Now that we have access to Fabric, we will run through the steps below:
- Deploy the healthcare data foundation and sample data
- Configure the workspace and notebooks
- Run the notebooks (Bronze Ingestion & Silver Flattening)
- Explore the data within the lakehouse
To view the details of the steps above, watch the following walk-through:
Capability deployment
We will begin by first deploying the healthcare data foundations and sample data capabilities to our workspace within Fabric. The healthcare data foundations capability deploys the following fabric items:
Workspace & notebook configuration
After we have completed the deployment of the capabilities, the next step is to configure the workspace spark settings. The healthcare data solutions in Fabric require that the workspace spark runtime version is set to 1.2.
After setting the spark runtime to version 1.2, the next step is to update the key vault setting within the healthcare1_msft_config_notebook. This notebook is used to store all the configuration settings related to healthcare data solutions. The key vault setting is specifically used in conjunction with the FHIR data ingestion and ensures that the transformations are accessing services and data in a secure manner. Because we aren’t planning on leveraging the FHIR data ingestion capability for this exercise, we will leave this value blank and save the notebook.
Running the notebooks
Now that we have updated the workspace settings and configured the notebooks, we can go ahead and run the notebooks. The two notebooks that we will be running are:
- Healthcare1_msft_raw_bronze_ingestion – Responsible for ingesting the FHIR sample data (NDJSON files) into parquet tables within the bronze lakehouse.
- Healthcare1_msft_Bronze_silver_flatten – Responsible for normalizing the data in bronze and transforming it to relational FHIR within the healthcare data model (silver lakehouse).
Together these notebooks hydrate the bronze and silver lakehouses within healthcare data solutions with the clinical data. In this case we only have sample data in our workspace, so the data in bronze and silver will be reflective of this sample set.
Viewing the data model
With the lakehouses hydrated we can now open them to view the tables, columns, and data within. When viewing the lakehouses, you can either choose to use the lakehouse viewer or SQL endpoint. This gives you the flexibility of accessing the tables using either notebooks or through SQL queries.
Delta parquet tables in the bronze lakehouse view.
If we open the silver lakehouse (healthcare1_msft_silver) with the SQL endpoint, we can start exploring the healthcare data model leveraging SQL. The healthcare data model enables working / interacting with the FHIR resources as SQL tables.
Querying the FHIR resources within the healthcare data model using TSQL.
In this article, we shared how the healthcare data solutions in Microsoft Fabric offer a robust and all-encompassing solution for unifying and analyzing healthcare data. We examined the healthcare lakehouse and explored the significance of the healthcare data model. In our upcoming articles, we will take a closer look at the transformation activities and their crucial role in hydrating the healthcare data model. To discover more about getting started with the healthcare data solutions in Microsoft Fabric, review our documentation.
FHIR® is a registered trademark of Health Level Seven International, registered in the U.S. Trademark Office, and is used with their permission.