Typical security measures may assist data at rest and in transit but can fall short of fully protecting data while it is actively used in memory. Intel® Software Guard Extensions (Intel® SGX) provides a protective hardware environment to secure data used in memory. For confidential computing, users can create Virtual Machines (VMs) with Intel® SGX to secure their applications during computation. However, building an end-to-end confidential computing application is not only knowledge intensive, but also requires a sound understanding of the application, Intel® SGX, and other security components.
This blog introduces you to a confidential computing solution for Privacy-Preserving Machine Learning (PPML) made available by Occlum and BigDL on the Azure cloud. This blog demonstrates the solution using a sample analytics application built for the NYTaxi dataset. This sample application leverages Azure confidential computing (ACC) components such as SGX Nodes for Azure Kubernetes Service (AKS), Microsoft Azure Attestation, Azure Key Vault (AKV), etc., as well as Occlum LibOS and BigDL PPML.
Let’s first review a typical PPML workflow on Kubernetes cluster as illustrated below. The solution on Azure cloud is built by applying the same workflow on Azure cloud using ACC components.
Users can follow the steps in the diagram above to walkthrough the PPML flow:
Now, let’s apply this same workflow on the Azure cloud. The diagram below illustrates the Azure PPML solution built with ACC components, Occlum LibOS and BigDL PPML.
In this Azure PPML solution, the User Application is a Spark application that can be written in Scala or Java. For our sample application, it’s a simple Spark application for querying the NYTaxi dataset. In this case, the Spark Driver takes the Spark jobs submitted by the Spark Client, distributes and schedules the work across the Spark Executors, and responds to the User Application. The Spark Executors execute the code assigned to them and report the state of the computation back to the Spark Driver
Intel® BigDL, which is the core enabler behind the end-to-end and distributed AI processing, works together with the Occlum LibOS to enable the User Application, Spark Driver and Spark Executors to run on a SGX-enabled AKS cluster. The Azure Attestation is used to fulfill the attestation process; while the Azure Data Lake Storge is used to host the data to be processed, and the Azure Key Vault can be KMS in end-to-end workflow.
We use a sample NYTaxi dataset analytics application to demonstrate the PPML deployment procedure on ACC cluster. Following are the steps to deploy the solution:
Step 0: Deploy the Azure cloud services
Note that NYTaxi data is not encrypted on storage, so secret key provisioning is not included in this demo. In real-world deployment, it's recommended to encrypt data on storage (encyrption at rest), then set up key management service (e.g., Azure Key Vault) and secret key provisioning in deployment.
Step 1: Build the sample application
Create NYTaxi query sample application using standard Spark SQL with Azure Storage data source.
Step 2: Submit job to the AKS Cluster
On the Azure VM, submit NYTaxi query on AKS by:
git clone https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example.git
bash run_nytaxi_k8s.sh
Step 3: execute the job on the AKS Cluster
The job is executed on the AKS cluster: in Spark driver/executor pod, Micrsoft Azure Attestation Service runs the attestation process to verify the trustworthiness of the platform and the integrity of the binaries running inside it. Upon completion of the attestation process, Spark executors will then run the data analysis with the Spark SQL query.
Step 4: Review the results
You should get a NYTaxi dataframe count and aggregation duration upon successful completion.
To evaluate the performance of this solution, we make a simple benchmark based on our sample application. The benchmark runs on SGX environment and Non-SGX environment to give an intuitive performance compare.
Scenarios:
Scenario |
Description |
No Intel SGX |
The driver and the executors are running without SGX support using regular Spark image (vanilla Spark). |
Occlum |
The driver and the executors are encrypted and run on Intel CPUs with SGX support using BigDL Occlum image. |
Cluster info:
Results:
We run the benchmark with executors number 1, 2 and 3 for multiple times, and put the average duration time to the chart above.
The run time of the sample appilcaiton consists of Initialization Time and Execution Time. For this specific sample application, the Execution Time of BigDL PPML on Occlum is 130% of vanilla Spark when running on 1 executor, and reduced to 116% of vanilla Spark when running on 3 executors. That indicates BigDL PPML on Occlum has very limited performance impact (at most 30 %) to existing Spark applications, and this performance overhead will reduce when adding more executors.
The Initialization Time is considered a fixed time for this SGX environment, it takes around 50 seconds regardless of running on 1 executor or 3 executors. This Initialization Time is related to SGX enclave size. The larger enclave is used, the longer time it will need to initialization. In near future, Initialization Time will be greatly reduced by SGX Enclave Dynamic Memory Management (EDMM). For real-world Big Data or AI applications, when the execution time is longer, the performance impact introduced by the initialization time will be reduced.
These key components have been leveraged to build the end-to-end confidential computing workflow.
Intel® SGX helps protect data in use via application isolation technology. By protecting selected code and data from modification, developers can partition their applications into hardened enclaves or trusted execution modules to help increase application security.
Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on Intel® SGX with little to no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.
Here is the high-level overview of Occlum.
Occlum also has a unique “Occlum -> init ->application” boot flow. Generally, all operations which are required but not part of the application, such as remote attestation, could be put into the “init” process. This feature makes Occlum highly compatible with any remote attestation solution without involving application change. For example, to support Azure Attestation, Occlum provides the below boot flow.
This design offload the remote attestation burden from the application. For more details, please refer to the Occlum MAA init demo and the Occlum GitHub repo.
BigDL PPML provides a distributed platform for securing and protecting the end-to-end Big Data AI pipeline including data ingestion, data analysis, machine learning, and deep learning. In addition, it extends the single-node Trusted Execution Environment (TEE) to a Trusted Cluster Environment and allow unmodified Big Data analysis and ML/DL programs to run securely on a private or public cloud. The diagram and tasks below show the work behind BigDL PPML:
Please refer to the BigDL github repository and document site for more details.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.