Frictionless Collaborative Analytics and AI/ML on Confidential Data

Copper Contributor

Oct 27, 2022

Secure enclaves protect data from attack and unauthorized access, but confidential computing presents significant challenges and obstacles to performing analytics and machine learning at scale across teams and organizational boundaries. The inability to securely run collaborative analytics and machine learning on data owned by multiple parties has resulted in organizations having to restrict data access, eliminate data sets, mask specific data fields, or outright prevent any level of data sharing. The implication for organizations is the difficulty to execute on numerous use cases across verticals while the urgency to get answers from the data increases. Example use cases that have been challenging for organizations include collaborating to identify and prevent money laundering in financial services, confidentially sharing patient information for clinical trials, sharing sensor data and manufacturing information to perform preventive maintenance, and dozens of other business critical use cases.

Confidential computing’s hurdles to large-scale adoption have inhibited organizations from achieving faster value from data secured in enclaves and confidential VMs. There is an urgent need to overcome the challenges and unlock the data to deliver on key business use cases. Overcoming the challenges requires innovation that includes the following capabilities:

Protecting data by processing it on https://docs.microsoft.com/en-us/azure/confidential-computing/confidential-computing-enclaves.
Securely setting up a cluster of enclaves - including enabling secure key distribution, enclave integrity, and secure inter-enclave communication.
End-to-end security from disparate sources into the enclaves: encrypting data at rest and in transit and protecting data in use.
Eliminating dependencies on highly specialized in-house skills to create analytics applications and adapting machine learning (ML) frameworks to leverage enclaves.
Secure data sharing and secure collaborative analytics across multiple parties on encrypted data (inter- and intra-company).
Adaptability to regulatory compliance policies while sharing data and executing collaborative analytics across entities, for example, personal data.

The Challenge

Securing data and preventing cyberattacks pose many challenges for organizations today. Encrypting data at rest and in transit is effective but incomplete. Organizations also need to verify the integrity of the code to help prevent unauthorized access and exploits. While data must be protected, it should also be effectively and appropriately shared and analyzed within and across organizations.

What’s Needed

Addressing these challenges requires a comprehensive, integrated platform that enables analytics at scale on encrypted data and secure collaborative data sharing within and across organizations. A solution that uniquely secures data at rest, in motion, and during processing at scale. A solution that also supports confidential access and enables advanced analytics and ML within and across company boundaries.

The Opaque Platform

https://opaque.co/ is based on technology created at UC Berkeley by world renowned computer scientists. The original innovations were released as open source and deployed by global corporations in banking, healthcare, and other industries. Opaque Systems was founded by the creators of the https://github.com/mc2-project/mc2 to turn it into an enterprise-ready platform, enabling analytics and AI/ML on encrypted data without exposing it unencrypted. One of the major benefits of the Opaque platform is the unique capability around collaboration and data sharing, which allows multiple teams of data owners to collaborate, whether inside a large organization or across companies and 3^rd parties. The Opaque Platform is a scalable confidential computing platform for collaborative analytics, AI, and data sharing that lets users or entities collaboratively analyze confidential data while still keeping the data and the analytical outcomes private to each party.

The MC2 Open-Source Project

MC2, which stands for Multi-party Collaboration and Coopetition, enables computation and collaboration on confidential data. It enables rich analytics and machine learning on encrypted data, helping ensure that data remains protected even while being processed on Azure VMs. The data in use remains hidden from the server running the job, allowing confidential workloads to be offloaded to untrusted third parties. In addition to helping protect confidential data from breaches, it enables secure collaboration, in which multiple parties - typically data owners - can jointly run analytics or ML on their collective dataset, without revealing their confidential data to anyone else.

A Software Stack that Powers Azure Secure Enclaves

MC2 can seamlessly run popular analytics and machine learning frameworks such as Apache Spark and XGBoost within enclaves securely and efficiently. End-users can focus on data analysis instead of mastering the complexities of writing enclave code.

One approach to leveraging secure enclave technology is to simply load the entire application into the enclave. This, however, affects both the security and efficiency of the enclave application in a negative way. Memory-intensive applications, for example, will perform poorly. MC2 partitions the application so that only the components that need to operate directly on the sensitive data are loaded into the enclave on Azure, such as https://docs.microsoft.com/en-us/azure/virtual-machines/dcv3-series VMs. Other elements, including those responsible for network communication and task scheduling, are executed outside of the enclave. This reduces the potential attack surface by minimizing the amount of code that runs within the enclave.

MC2 also fortifies the enclave components through cryptographic techniques that provide stronger security guarantees. This is achieved in two ways:

To verify the integrity of jobs with distributed execution characteristics, MC2 leverages a variety of built-in measures, such as distributed integrity verification.
To protect against side-channel attack vulnerabilities, MC2 utilizes data-oblivious techniques to help ensure that no side-channel information is leaked via memory access patterns. Data obliviousness helps ensure that the memory access patterns do not reveal any information about the sensitive data being accessed.

High-Performance Analytics and AI/ML on Encrypted Data

The Opaque Platform extends MC2 and adds capabilities essential for enterprise deployments. It allows you to run analytics and ML at scale on hardware-protected data while collaborating securely within and across organizational boundaries. Using our platform, you can upload encrypted data or connect to disparate encrypted sources. You can then edit and execute high-performance SQL queries, analytics jobs, and AI/ML models using familiar notebooks and analytical tools. Verifying cluster deployments via remote attestation becomes a single-click process.

The platform makes it easy to establish confidential collaboration workspaces across multiple users and teams and combine encrypted data sets without exposing data across team boundaries. It removes the hassle of setting up and scaling enclave clusters and automates orchestration and cluster management. On top of that, the Opaque Platform leverages multiple layers of security to provide defense in depth and fortify enclave components with cryptographic techniques, using only NIST-approved encryption.

About Opaque Systems

Opaque makes confidential data useful by enabling secure analytics and AI directly on encrypted data from one or more data sources, allowing customers to share and collaborate on confidential data within their business ecosystem. We are actively working with financial institutions and healthcare companies to facilitate confidential data collaboration across teams and companies, leading them to extract better insights on customers, assess risk, detect fraud, and combat financial crime.

Watch a https://info.microsoft.com/CO-HCS-WBNR-FY22-07Jul-13-Making-Confidential-Computing-and-Secure-Collaboration-Frictionless-Presented-by-Opaque-Systems-7410_LP02-On-Demand-Registration---Form-in-Body.html.

Contact us at mailto:hello@opaque.co to learn more about our platform and technology, request a demo, or discuss a proof-of-concept deployment.

Updated Nov 11, 2022

Version 3.0

Copper Contributor

Joined August 18, 2022

View Profile

Azure Confidential Computing Blog

Follow this blog board to get notified when there's new activity