In the earlier PatientHub: Leveraging AI to enhance an end-to-end healthcare application blog post, we provided an overview of the application, its potential use cases and high-level architecture. In this post, we’ll dive deeper into the architecture and how everything works together from end to end.
Please note: The PatientHub end-to-end solution is intended to demonstrate how you might build an application to enable healthcare practitioners and patients to access knowledge about the care and risks associated with ailments. References to patients and healthcare providers in this discussion are only to illustrate the design and potential use case for applications such as PatientHub. PatientHub is not to be used in production environments as-is and is not intended for use by patients or healthcare providers in clinical decision-making or for any other clinical use.
As shown in the architecture diagram above, PatientHub is mainly composed of 3 tiers:
- The Patient Hub front-end app
- A backend Data Lake for persistent storage
- ML Model publishing and serving layer
The front-end app provides all GUI functionality for end users and includes all the app specific business logic. Our previous blog included various screenshots of the app demonstrating different business flow logic for different user personas. In this blog we’ll focus on what happens under the hood from a technical architecture perspective behind each of these business applications flow.
Starting from the top left-hand side of the diagram above, when the data scientist/IT user logs into the Patient Hub Marketplace portal, they can upload and publish their model through a simple GUI. They can obviously either train the model using Azure AutoML or whatever ML packages/platform they choose. Behind the scenes, the front-end app will call the Azure ML (AML) service using the AML python SDK and kick off a series of actions at the backend, as listed in details below.
- First, the front-end app will connect to the Azure ML workspace specified by the user.
- The model uploaded by the user will then be registered in the AML workspace, using the AML python SDK. Registering models in AML helps to keep track of model versioning and management with various model attributes such as name, version, date, description, tags, etc.
- Once the model is registered, the app will then kick off 2 processes in parallel, to publish a real-time scoring API and a batch scoring API. The combined usage of real-time and batch scoring is quite common among healthcare customers/partners.
Real-time scoring API scenario
Batch scoring API Scenario
1) Targeting real-time what-if analysis & checks how prediction changes
2) When patient data has very frequent changes, real-time scoring is needed.
1) Most EMR history data is pretty static, it’s quite common to do batch scoring on a nightly basis.
2) Pre-generating prediction is important for speeding up user experience, and saves unnecessary infra cost from repeating the same prediction.
3) Model explanation can often be very time-consuming to run, hence needs to be executed as batch process.
The PatientHub: Developing and deploying a healthcare machine learning model blog post covered detailed steps on the model deployment processes for both real-time scoring and batch scoring. Note that the real-time scoring API uses Azure Kubernetes Service (AKS) as the underlying compute resource (this can be either a new AKS or an existing AKS cluster in the Azure subscription), whereas the batch scoring API runs on AML compute.
As shown in the lower left-hand side of the architecture diagram above, the doctor/patient could log into the PatientHub to view patient records, pre-calculated prediction scores and the model explanation data (e.g. how the patient’s attributes have contributed to the prediction), which is all stored in the backend data lake storage layer. The batch scoring API basically reads all the patient records data from the data lake, generates all the prediction scores and model explanation data, and then writes it all back into the data lake. The batch scoring API runs on AML Compute, which is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute, on-demand at run time or as a persistent resource. The compute supports normal and low-priority VMs, scales up automatically when a job is submitted, and can be put in an Azure Virtual Network. A backend job scheduler calls the batch scoring API (through AAD-based authentication) to refresh the data on a regular basis.
We’ll have a future blog on generating the model explanation data using the AML Interpretability SDK.
A doctor could also perform what-if analysis by changing certain attributes of a patient and see how the prediction score might change in real-time. Behind the scenes, the app will call the real-time scoring API running on AKS, through key-based authentication. The real-time scoring API powered by AKS is good for high-scale production deployments and provides fast response time and autoscaling of the deployed service at the Kubernetes pod level based on traffic volume. Note that cluster node-level autoscaling is not supported through the AML SDK at the moment, but AKS cluster autoscaler can also be leveraged if needed for node-level scaling. AML has also integrated the real-time scoring service with Azure Application Insights (AppInsights), providing various monitoring metrics such as request rates, response times, failure rates, exceptions, etc. More details are documented here.
In addition to just monitoring the real-time scoring API health status via AppInsights, AML also recently rolled out a Data Drift service that is designed to monitor potential model performance decay, by detecting drift (i.e. change in distribution of data) between the scoring data (i.e. data submitted by the app to get a real-time prediction) and the training data used to train the model. We’ll demonstrate this monitoring capability in PatientHub as well, as monitoring the model performance continuously and adapting to data drift is pretty important for models deployed in production.
In this blog, you saw how multiple Microsoft AI services were woven together to enable the data scientist to create and deploy models that empowered the practitioners and patients to get more from their data. We invite you to engage with us with comments to let us know what you think and get your questions answered. You can also follow us on Twitter at @AzureaiM.