Blog Post

Healthcare and Life Sciences Blog
5 MIN READ

Microsoft Fabric healthcare data model querying and identifier harmonization

mdearing06's avatar
mdearing06
Icon for Microsoft rankMicrosoft
May 12, 2025

The healthcare data model in Healthcare data solutions (HDS) in Microsoft Fabric is the silver layer of the medallion and is based on the FHIR R4 standard. Native FHIR® can be challenging to query using SQL because  its reference properties (foreign keys) often follow varying formats, complicating query writing. One of the benefits of the silver healthcare data model is harmonizing these ids to create a simpler and more consistent query experience.

Today we will walk through writing spark SQL and T-SQL queries against the silver healthcare data model. Supporting both spark SQL and T-SQL provides flexibility to users to use the compute engine they are most comfortable with and that is most suitable for the use case. The examples below leverage the synthetic sample data set that is included with  Healthcare data solutions. The spark SQL queries can be written and run from a Fabric notebook while the T-SQL queries can be run from the SQL Analytics endpoint of the silver lakehouse or a T-SQL Fabric notebook.

Simple query

Let’s look at a simple query: finding the first instance of a Patient named “Andy”.

Example spark-SQL query

SELECT *
FROM Patient
WHERE name[0].given[0] = 'Andy'
LIMIT 1

 

Example T-SQL query

SELECT TOP(1) *
FROM Patient
WHERE JSON_VALUE(name_string, '$[0].given[0]') = 'Andy'

 

 Beyond syntax differences between SQL dialects, a key distinction is that T-SQL uses JSON functions to interpret complex fields, while Spark SQL can directly interact with complex types (Note: complex types are those columns of type: struct, list, or map vs. primitive types whose columns or of types like string or integer). Part of the silver transformations include adding _string suffixed column for each complex column to support querying this data from the T-SQL endpoint. Without the _string columns these complex columns would not be surfaced for T-SQL to query. You can see above that in the T-SQL version the column name_string is used while in the spark SQL version name is used directly. Note: in the example above, we are looking at the first name element, but the queries could be updated to search for the first “official” name, for example, vs. relying on an index.

Keys and references

Part of the value proposition of the healthcare data model is key harmonization. FHIR resources have ids that are unique, should not change, and can be logically thought of like a primary key for the resource. FHIR resources can relate to each other through references which can be logically thought of as foreign keys. FHIR references can refer to the related FHIR resource through FHIR id or through business identifiers which include a system for the identifier as well as a value (e.g. reference by MRN instead of FHIR id). Note: although ids and references can logically be thought of as primary keys and foreign keys, respectively, there is no actual key constraint enforcement in the lakehouse.

In  healthcare data solutions in Microsoft Fabric these resource level FHIR ids are hashed to ensure uniqueness across multiple source systems. FHIR references go through a harmonization process outlined with the example below to make querying in a SQL syntax simpler:

Example raw observation reference field from sample ndjson file

"subject": {
    "reference": "Patient/904d247a-0fc3-773a-b564-7acb6347d02c"
},

 

Example of the observation’s harmonized subject reference in silver

"subject":{
    "type": "Patient",
    "identifier": {
        "value": "904d247a-0fc3-773a-b564-7acb6347d02c",
        "system": "FHIR-HDS",
        "type": {
            "coding": [
                {
                    "system": "http://terminology.hl7.org/CodeSystem/v2-0203",
                    "code": "fhirId",
                    "display": "FHIR Id"
                }
            ],
            "text": "FHIR Id"
        }
    },
    "id": "828dda871b817035c42d7f1ecb2f1d5f10801c817d69063682ff03d1a80cadb5",
    "idOrig": "904d247a-0fc3-773a-b564-7acb6347d02c",
    "msftSourceReference": "Patient/904d247a-0fc3-773a-b564-7acb6347d02c"
}

 

You’ll notice the subject reference contains more data in silver than the raw representation. You can see a full description of what is added here. This reference harmonization makes querying from a SQL syntax easier as you don’t need to parse FHIR references like “Patient/<id>” or include joins on both FHIR ids and business identifiers in your query. If your source data only uses FHIR ids, the id property can be used directly in joins. If your source data uses a mixture of FHIR ids and business identifiers you can query by business identifier consistently as you see even when FHIR id is used, HDS adds a FHIR business identifier to the reference. NOTE: you can see examples of business identifier-based queries in the Observational Medical Outcomes Partnership (OMOP) dmfAdapter.json file which queries resources by business identifier.

Here are 2 example queries looking for the top 5 body weight observations of male patients by FHIR id.

Example spark-SQL query

SELECT o.id
FROM observation o
INNER JOIN patient p on o.subject.id = p.id
WHERE p.gender = 'male' AND ARRAY_CONTAINS (o.code.coding.code, '29463-7')
LIMIT 5

 

Example T-SQL query

SELECT TOP 5 o.id
FROM Observation o
INNER JOIN Patient p ON JSON_VALUE(o.subject_string, '$.id') = p.id
WHERE p.gender = 'male'
    AND EXISTS (
    SELECT 1
    FROM OPENJSON(o.code_string, '$.coding') WITH (code NVARCHAR(MAX) '$.code')
    WHERE code = '29463-7'
)

 

You’ll notice the T-SQL query is using JSON functions to interact with the string fields while the spark SQL query can natively handle the complex types like the previous query. The joins themselves though are using the id property directly as we know in this case only FHIR ids are being used. By using the id property, we do not need to parse a string representation like “Patient/<id>” to do the join.

Overall we’ve shown how either spark SQL or T-SQL can be used to query the same set of silver data and also how key harmonization helps when writing SQL based queries.

We welcome your questions and feedback in the comments section at the end of this post!

Helpful links

For more details of to start building your own queries, explore these helpful resources:

Healthcare data solutions in Microsoft Fabric

FHIR References

T-SQL JSON functions

T-SQL surface area in Fabric

 

FHIR® is the registered trademark of HL7 and is used with permission of HL7.

Updated May 12, 2025
Version 2.0
No CommentsBe the first to comment