@Alberto Santamaria-Pang, @Ivan Tarapov, @Yonas Woldesenbet, @Sam Preston, @Rahul Sharma, @Nishanth Chandran, @Divya Gupta, @Kashish Mittal, and @Ajay Manchepalli.
Machine learning models are useful in analyzing patient data, helping in detecting diseases early, and enabling clinicians in creating personalized treatments. However, using these models in healthcare is challenging because it requires accessing and processing sensitive patient data while ensuring patient privacy and complying with strict regulations.
Traditional encryption methods can only protect data when it is stored and not when it is being used for computation. One way to perform computation on encrypted data is to decrypt it in a trusted region like a secure enclave, which is done in Microsoft’s product offering Azure Confidential Computing. A cryptographic way of protecting information exists that can operate directly on encrypted data without the need for decryption - this technique is known as Secure Multi-party Computation (SMPC). SMPC helps ensure that sensitive healthcare data remains secure while enabling healthcare professionals to perform computations on the data they need to provide better care for patients.
Traditional encryption vs. SMPC
While both traditional encryption methods and Secure Multi-Party Computation (SMPC) offer similar levels of data security, SMPC has the added capability of allowing computations on encrypted data. For instance, in the case of wanting to conduct model inference on an encrypted DICOM image, it's possible to directly use the encrypted image with SMPC. The additional computational load or overhead of using SMPC depends on the specific function or computation being performed on the encrypted data.
Comparison criteria |
Traditional encryption methods |
Secure Multi‑Party Computation (SMPC) |
Data exposure |
Raw data needs to be decrypted for analysis or use. |
Computation is performed on encrypted data. |
Inference speed |
Encryption and decryption overhead is minimal. |
Joint computation on encrypted data can introduce overhead in latency. |
Trust assumptions |
Rely on trusted third‑party or secure infrastructure. |
Distributed computation with privacy assurance. |
Figure 1 Traditional encryption methods vs. Secure Multi‑Party Computation (SMPC).
SMPC transforms healthcare data analysis and ML
SMPC provides a solution that allows multiple parties to work together on their data without revealing any sensitive information. It helps healthcare providers and researchers securely analyze patient data and use ML models while maintaining patient privacy.
Here are some key benefits of SMPC in the healthcare sector:
- Privacy preservation. SMPC protects individual patient data during the computation process. Each party only sees their own data, and the others’ data is hidden. This lets healthcare providers and researchers work together and use more data without risking privacy.
- Collaborative research. SMPC facilitates collaborative research among healthcare institutions, enabling them to pool their data resources without compromising privacy. Multiple parties can train ML models together on their combined data while keeping patient records and information safe. This helps improve the ML models in healthcare by using more and different data sources and larger samples.
- Secure data sharing. SMPC helps enable healthcare providers to more securely share specific information from their datasets with other authorized parties. For example, when studying rare diseases, healthcare organizations may be able to share some patient data points or features while helping preserve their identity and privacy. This controlled sharing mechanism helps enhance research and contributes to the advancement of medical knowledge.
Privacy‑preserving ML to improve the security of fMRI data analysis in healthcare.
In this blog we explore the application of SMPC to medical image analysis via machine learning techniques for a specific use case of functional Magnetic Resonance Imaging (fMRI) analysis. Applying ML to fMRI data has the potential to revolutionize healthcare by providing insights into brain function and diagnosing neurological disorders. However, the sensitive nature of fMRI data raises significant privacy concerns. To address these challenges, one may employ privacy‑preserving ML techniques, such as data anonymization, secure data encryption, federated learning, and differential privacy, which would allow leveraging the benefits of ML in fMRI analysis while maintaining patient confidentiality and adhering to regulatory requirements.
Before diving into the details of how OnnxBridge (an end-to-end compiler for converting Onnx Models to Secure Cryptographic backends) enables secure machine learning for fMRI data, it is important to understand how fMRI is relevant for neuroscience research. Functional magnetic resonance imaging (fMRI) is a technique that measures brain activity by detecting changes in blood flow. By using fMRI, researchers can identify which brain regions are involved in different cognitive functions, such as memory, language, or emotion. This is known as functional localization. However, fMRI data is often sensitive and confidential, as it can reveal personal information about the participants’ health, preferences, or personality. Therefore, it is essential to protect the privacy and security of fMRI data when performing machine learning analysis on it.
In the rest of this blog post, we cover these topics:
- What rs‑fMRI is and how it measures brain activity by detecting changes in blood flow.
- How SMPC protects the privacy and security of fMRI data when performing machine learning analysis using EzPC‑OnnxBridge, a crucial part of the EzPC project from Microsoft Research India (MPC-MSRI, 2021).
- How to use EzPC‑OnnxBridge for rs‑fMRI to identify brain regions involved in different cognitive functions.
What is rs‑fMRI and how is it used to localize brain networks?
Unlike traditional fMRI, which captures brain activity during specific tasks or stimuli, rs‑fMRI delves into the spontaneous fluctuations of the brain when it is in a state of rest or free thinking. It explores the intricate networks of communication among different brain regions, shedding light on the underlying functional architecture that forms the foundation of our cognition.
The power of rs‑fMRI lies in its ability to measure and analyse blood oxygen level ‑dependent (BOLD) signals. By detecting changes in blood flow and oxygenation, rs‑fMRI provides a window into the brain's dynamic activity during rest. These fluctuations in the BOLD signal, known as resting ‑state connectivity, are like whispers of communication between various regions of the brain, even when we are not consciously engaged in any cognitive task.
Through advanced computational algorithms and sophisticated statistical analysis, researchers can map and visualize these functional connections within the brain. However, it is important to note that rs‑fMRI is not without its challenges and limitations. The interpretation of resting ‑state connectivity requires careful consideration, as it represents correlations between brain regions rather than direct causality. Moreover, factors such as participant motion, physiological noise, and data pre‑processing methods can influence the results and must be rigorously addressed to help ensure data quality and reliability. Here’s where ML algorithms can help neuro‑radiologists to efficiently map and visualize brain networks towards different number of clinical applications. In this blog, we provide an example of how to use SMPC to automatically identify and localize brain networks using work published in [3].
Figure 2 Visualization of brain networks from 3D dual regression volumes.
How SMPC works using EzPC-OnnxBridge
We begin with an overview of how secure multi‑party computation (SMPC) works and then describe how EzPC‑OnnxBridge can be used in the application described above. EzPC OnnxBridge allows using SMPC without any knowledge of cryptography. We will now walk through the steps for using EzPC OnnxBridge for this application.
SMPC is a cryptographic primitive introduced in the 1980s [4,5] that helps enable two or more parties who have private data to collaborate (or compute joint functions) on their private/secret data, without sharing it in the clear with any entity. This is done through an interactive cryptographic protocol – each party performs computations on their data and exchange (seemingly random looking) messages with other parties iteratively. At the end of such an interaction, the parties learn only the output of the joint function. As an example, if two parties A and B have private inputs a and b and wish to compute the function y = f(a,b) which outputs 1 if a>b and 0, otherwise, they can run an SMPC protocol to precisely compute y and nothing else. SMPC protocols have been extensively studied in the cryptography community over the last four decades with latest research, such as the EzPC technology [6,7,8,9], making SMPC practical for large scale ML models. In the application of secure machine learning for fMRI data, we have 2 parties – one that holds the machine learning model and the other that holds an input data point for inference. For the first party, the weights of the ML model are private, while for the second party, the input data point is private. In typical applications, including ours, the model architecture is public and known to both parties.
1. Identify sensitive data
We first identify the data involved in a single inferencing between two parties:
- Machine Learning Model (Model Weights + Model Architecture).
- Input data for inference.
Image by author using [2].
In the above the secret (or private) data to the two parties are:
- Model Weights (obtained after training publicly available model architecture on private data) to one party.
- Input data to the other party.
Image by author using [2].
Typically, model architectures are openly available and do not hold any proprietary data of any of the parties.
2. Strip ML model of weights
Now that we know what the secret data involved in an inference are, the next step is to strip the ML model of its model weights so that the model architecture can be shared. This is shown in the figure below.
Image by author using [2].
The above step helps us confirm that the secret data is in no way involved in generating crypto protocols, and give us full control over our data, which we input only at the time of secure inference.
In the above image we can see the mlp.onnx model before and after its secret data (i.e., the weights and bias of all layers) is stripped and represented as an input value, which means the model architecture do not contain any secret data and expects it at runtime.
3. Generate SMPC protocols from architecture
After we have the model architecture without weights, we need to convert this architecture to cryptographically secure protocols which will run on the secret data and give us output as if it was run without any crypto or security guarantees involved. This is done through EzPC‑OnnxBridge and is depicted below.
Image by author using [2].
4. Secure inference on private data
Finally, we need to run the above generated crypto protocols for each of two parties involved. These protocols will take the secret data as input and will communicate with each other some encrypted (masked) bits and pieces of data, which have strong mathematical assurances such that at any point the data being communicated does not reveal any information about the secret data.
At the end of the computation, the output of the computation is revealed to the specified parties (one or both) involved in the computation.
Using EzPC OnnxBridge for rs-fMRI
EzPC offers an inference ‑app that serves as a front-end for SMPC operations. This application presents users with a graphical user interface (GUI) through which they can upload images and obtain results securely. Next, we’ll walk through the steps required to get the app running.
Internally, the application utilizes OnnxBridge, an ‑‑ end to end compiler, to convert Onnx files to SMPC cryptographic protocols. The compiler helps with the removal of confidential data from models before converting them to Secure Multi‑Party Computation (SMPC) protocols. Thus, EzPC provides a user ‑friendly interface that facilitates a more secure compilation and execution of machine learning models.
Let’s take a look at the practical implementation of OnnxBridge to conduct secure inference using the mlp.onnx model specifically designed for rs‑fMRI (resting‑state functional magnetic resonance imaging) images.
The setup steps from the EzPC GitHub repo will help us to get the inference‑app running. The steps will be executed in following order:
1. Install dependencies for:
- Cryptographic backend
- Compiler OnnxBridge
2. Set up server (model owner and model processing).
- Extract MLP model from the JHU GitHub repository
- Strips the model of its weights and save them in a file.
- Loads the stripped model architecture.
- Generates the secure backend code for the model architecture.
- Share the stripped model architecture with dealer/client.
3. Set up dealer.
- Compiles the model architecture received from server.
- Compute and share pre generated randomness for server/client to reduce communication drastically and speed up inference.
- Note: For the randomness generation there has been no involvement of secret data.
4. Set up client (acting as image owner).
- Compiles the model architecture received from server.
5. Set up inference app.
- Encrypts the input image and sends it to client VM which starts inference. See screenshots below.
Step 1: Upload the image.
Step 2: Receive encryption from dealer.
Step 3: Encrypt the image.
Step 4: Start secure inference.
With the above we can see how EzPC gives us an interface and empowers us with superior cryptographic backends to follow SMPC ideally without any interaction with the secret data.
References
- MPC-MSRI. (2021). EzPC: Easy Secure Multi-party Computation. GitHub. Retrieved from https://github.com/MPC-MSRI/EzPC.
- AmmarPL. (2021). fMRI Classification JHU. GitHub. Retrieved from https://github.com/AmmarPL/fMRI-Classification-JHU.
- Empower Medical Innovations: Intel Accelerates PadChest & fMRI Models on Microsoft Azure* Machine Learning. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-accelerates-padchest-fmri-models-on-azure-ml.html
- Dsouza, Trevor. Machine Learning Icon, distributed under CC BY 3.0.
- Ghate, S., Santamaria-Pang, A., Tarapov, I., Sair, H., Jones, C. (2022). Deep Labeling of fMRI Brain Networks Using Cloud Based Processing. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13598. Springer, Cham. https://doi.org/10.1007/978-3-031-20713-6_21. https://doi.org/10.1007/978-3-031-20713-6_21.
- Yao, A. (1982). Protocols for Secure Computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (pp. 160-164). IEEE.
- Goldreich, O., Micali, S., & Wigderson, A. (1987). How to play any mental game or A completeness theorem for protocols with honest majority. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing (pp. 218-229). ACM.
- Kumar, N., Rathee, M., Chandran, N., Gupta, D., Rastogi, A., & Sharma, R. (2020). CrypTFlow: Secure TensorFlow Inference. In Proceedings of the 41st IEEE Symposium on Security and Privacy (pp. 1247-1264). IEEE.
- Rathee, D., Rathee, M., Kumar, N., Chandran, N., Gupta, D., Rastogi, A., & Sharma, R. (2020). CrypTFlow2: Practical 2 Party Secure Inference. In Proceedings of the 27th ACM Conference on Computer and Communications Security (pp. 1639-1656). ACM.
- Chandran, N., Gupta, D., Rastogi, A., Sharma, R., & Tripathi, S. (2019). EzPC: Programmable and Efficient Secure Two-Party Computation for Machine Learning. In Proceedings of the 4th IEEE European Symposium on Security and Privacy (pp. 123-138). IEEE.
- Gupta, K., Kumaraswamy, D., Chandran, N., Gupta, D. (2022). LLAMA: A Low Latency Math Library for Secure Inference. In Proceedings of the Privacy Enhancing Technologies Symposium (PoPETS).