This post was authored by Bruce Nelson, Senior Solutions Architect at Databricks and Clinton Ford, Staff Partner Marketing Manager at Databricks
Healthcare organizations are improving the patient experience and delivering better health outcomes with analytic dashboards and machine learning models on top of existing electronic health records (EHR), digital medical images and streaming data from medical devices and wearables. Azure Databricks and Delta Lake make it easier to work with large clinical datasets to identify top patient conditions.
Using Delta Lake to build a comorbidity dashboard
Simulated EHR data are based on roughly 10,000 patients in Massachusetts and generated using the Synthea simulator. Our ETL notebook ingests and de-identifies our data, then prepares it for our visualization notebook. We create visualizations and a simple dashboard that show the top conditions (comorbidities) in our real world data and also analyze the correlation between any two conditions specified by the user.
Extract, transform and load (ETL)
To begin, we use pyspark to read EHR data from comma-separated values (CSV) files, de-identify patient personally identifiable information (PII) and write to Delta Lake for analysis. Using Delta Lake is a best practice for ingestion, ETL and stream processing as an open source format with support for ACID transactions, faster processing with Delta Engine and easy integration with other Azure services for additional use cases.
EHR data analysis and comorbidity dashboard
In this notebook we visualize top conditions in the database and create a simple dashboard to analyze the correlation between any two conditions specified by the user. You can share this notebook as a dashboard following these instructions.