FHIR is a JSON format which makes it very good at capturing relationships, especially nested relationships. Deep nested relationships work very well in JSON. Although this works as a storage format and a data interchange format, it makes analysis and querying of the data more complicated.
For reporting and analytics, users will need to convert JSON data into a more structured format. A common pattern is to transform the JSON data into a tabular format and store the data in a relational database like Azure SQLDB. This works well for downstream analysis for tools that understand SQL. Business analysts can use SQL to analyze and manipulate the data for analytics needs. However, this format is not great if you have to do big data analytics or machine learning processing. For this type of workloads columnar type storage tends to be better suited.
This requires a file format that gives the users more flexibility to handle either SQL, big data, or ML workloads. Parquet is a good format for this. Parquet allows data to be stored in a columnar format which makes it easier to use for big data analytics and ML workloads. SQL based tools have good support for Parquet making it easier to use for SQL type workloads.
This is the pattern used in the FHIR Analytics Pipeline project which we’ll explore. For the rest of this blog, I’ll highlight some of the key aspects covered in this project and show how you can use this to build a robust analytics platform.
We’ll focus on the part of the project that sets up a pipeline to synchronize FHIR data to a Datalake and build a SQL based model to analyze the data. FHIR to Datalake synchronization is performed using either an Azure Container App or an Azure Function. I suggest experimenting with both options and picking the one that you feel most comfortable supporting. For simplicity, I used the Azure Container App option. Once deployed, this app does the heavy lifting of converting the FHIR/JSON data into Parquet files and stores the data in the storage account that you specify.
I also recommend reviewing the files in the Scripts folder to see how the Parquet files are mapped to tables and views.
You should also review the PowerShell script. It sets up the components that allow Synapse to connect and use the Parquet files in the storage account. This PowerShell script creates the specified database in your Synapse workspace and maps the Parquet files to the corresponding SQL tables and views. By default, this will use the Serverless Pool in Azure Synapse.
After successful completion, the SQL environment will be available for you to use. Because this is a SQL endpoint, you can connect to the database and use SQL to analyze and manipulate the data. You can also use PowerBI connected to your Synapse instance to build reports and dashboards for your users.
PowerBI has built in support for a variety of data sources. Because Synapse SQL pools expose a SQL endpoint, you can connect to them just like other SQL data sources.
The connector has support for both SQL authentication as well as Azure AD authentication. I recommend using Azure AD because you can use it for ACL based controls on your data stored in the storage account. One you connect to your SQL endpoint; you can now use your data like any other PowerBI data source. You can build models, transform data, and build visualizations.
Notice in the image above, we are able to join the Patient table to the Patient Address view and pull a flattened patient address instead of a nested JSON object that we’d have to parse and split in PowerBI. The CurrentAge in the above image shows another example, where we used a custom view with patient details to simplify the processing of the data. Instead of relying on PowerBI to do this custom logic, we created a view in Synapse using the underlying Patient Parquet file to calculate the patient age.
Hopefully this shows some of the flexibility that this analytics pattern provides. One of the biggest advantages is that fact that we don’t have to copy our data over and over. We can use custom views over the stored Parquet data to customize how we view the data.