Analyzing FHIR data with Azure Synapse Analytics

Microsoft

Sep 04, 2021

This blog has been authored by Ranvijay Kumar, Principal Program Manager, Microsoft Health & Life Sciences

HL7 Fast Healthcare Interoperability Resources (FHIR®) is quickly becoming the de facto standard for persisting and exchanging healthcare data. FHIR specifies a high-fidelity and extensible information model for capturing details of healthcare entities and events.

This article will teach you a simple approach to creating analytical data marts by exporting, transforming, and copying data from Azure API for FHIR to Azure Synapse Analytics, which is a limitless analytics service designed for data warehousing and big data workloads. You can complete your Business Intelligence (to Artificial Intelligence (AI) analytics with Synapse due to the deep integration with Power BI, Azure Machine Learning, and Azure Cognitive services.

In this approach, as illustrated in the diagram, you will use the $export operation in Azure API for FHIR to export FHIR resources in NDJSON format (newline delimited JSON) to Azure storage. You will then use T-SQL from any of the serverless or the dedicated SQL pools in Synapse to query against those NDJSON files and optionally save the results into tables for further analysis.

Exporting FHIR data to Azure storage

Azure API for FHIR implements the $export operation defined by the FHIR spec to export all – or a filtered subset – of FHIR data in NDJSON format. It also supports de-identified export to enable secondary use of healthcare data. You can configure the server to export the data to any kind of Azure Storage account; however, we recommend exporting to ADLS Gen 2 for best alignment with Synapse.

Let’s consider a scenario in which data scientists want to analyze clinical data of patients who are former smokers. For the study, data scientists need an initial copy of data from the FHIR server followed by incremental data for the same set of patients every month for the next two years.

The first step to get this data is to identify the patients in the FHIR server who are former smokers. The following GET call searches the FHIR server using the LOINC code 72166-2 (Tobacco smoking status) for Observation, and SNOMED code 8517006 (Former smoker) for Observation value-concept to get subjects of the observations who are former smokers. You may need to use different codes depending on how your data is coded.

https://{{fhirserverurl}}/Observation?code=72166-2&value-concept=8517006&_elements=subject

You need to save this list of patients to enable exporting their clinical data monthly. There are a few options to manage a collection of resources in FHIR. Since Group is supported by the $export operation, you will manage the collection of patient resource IDs as a Group. Use the results from the above search query to create a person-type Group.

{

"resourceType": "Group", "id": "1","type": "person", "actual": true,

"member": [{"entity": {"reference": "Patient/44f6f10e-96c2-4802-b857-4861f1802522"}},

... other patient entities from the result ...

]

}

Once you have a Group, you can export all the data related to the patients in the Group with the following async REST call:

Note: Azure API for FHIR takes an optional container name to simplify the organization of exported data.

https://{{fhirserverurl}}/Group/{{GroupId}}/$export?_container={{BlobContainer}}

You can also use _type and _typefilter parameters in the $export call to restrict the resources we you want to export. Finally, you can use _since parameter in the $export call to do incremental exports every month for two years to meet your original requirement. This parameter restricts export to the resources that have been created or updated since the supplied time.

https://{{fhirserverurl}}/Group/{{GroupId}}/$export?_container={{BlobContainer}}&_since=2021-02-06T01:09:53.526+00:00

Now that you have data in ADLS Gen 2, let’s talk about Synapse and see how you can load it to Synapse.

About Azure Synapse Analytics

Azure Synapse Analytics is a high-scale analytics service that is ideal for analyzing big data. Among other components, Synapse offers two different types of SQL pools. The serverless SQL pool gives the flexibility of querying data directly in the blob using the simplicity of SQL. There is no resource to provision. If you need to add this data with industry leading security, high performance and concurrency, workload management, and other enterprise data warehousing capabilities, use the dedicated SQL pool. Also, the two pools have different levels of Transact-SQL support that you should review before picking the one that suits your needs. You may decide to use both as Synapse allows you to have multiple serverless and dedicated SQL pools in a Synapse workspace.

One of the challenges in analyzing FHIR data is flattening it into tabular schema. FHIR resources are composed of heavily nested and sometimes infinitely recursive datatypes. These datatypes allow capturing healthcare data in detail, but an attempt to indiscriminately flatten all the FHIR resources results in thousands of related tables that would be impractical to use. One practical approach is to retain a copy of FHIR resources in a staging area and project required data into tabular format on need basis using the JSON query ability of Synapse T-SQL. The Azure API for FHIR documentation provides a guide to load exported data from Azure storage to both serverless as well as the dedicated Synapse SQL pool using T-SQL. This approach also enables you to create views or store the results of queries for subsequent use in regular tables or external tables in the dedicated or serverless pool, respectively.

Once you have your data in Azure Synapse workspace, you can complete your end-to-end BI to AI analytics solutions thanks to its deep integration with Power BI, Azure Machine Learning, and Azure Cognitive services.

Create a pipeline

You can use a variety of REST clients such as Postman to export the data from the FHIR server and use Synapse Studio or any other SQL client to run the above T-SQL statements. However, it is a good idea to convert these steps into a robust data movement pipeline using Synapse Pipelines. You can use the Synapse Web activity for triggering the export, and the Stored procedure activity to run the T-SQL statements in the pipeline.

Conclusion

You can use the FHIR $export API and T-SQL to transform and move all or a filtered subset of data from FHIR server to Synapse Analytics. After the initial data load, the _since parameter in the $export operation can be used to do incremental data load. An ETL pipeline with the steps mentioned in this article can be used to keep the data in the FHIR server and the Synapse Analytics in sync.

®FHIR is registered trademark of Health Level Seven International, registered in the U.S. Trademark Office and is used with their permission.

Updated Oct 15, 2021

Version 2.0

Microsoft

Joined June 10, 2020

View Profile

Healthcare and Life Sciences Blog

Follow this blog board to get notified when there's new activity