Enabling data clean rooms with confidential computing
Published Jan 03 2024 10:09 AM 5,182 Views
Microsoft

For many companies, today’s data collaboration landscape offers exciting opportunities to harness valuable insights that drive market advantage. Powered by rapid data clean room adoption, enterprises across industries are moving quickly to launch and expand their data collaboration initiatives.

 

However, this same data collaboration landscape can look like a minefield of potential data breaches and liability for specific heavily regulated industries that commonly deal with highly sensitive data and personally identifiable information (PII). To confidently pursue the data collaboration opportunities before them, enterprises in these industries — financial services, healthcare, telecom, and more — require the highest levels of data security in their collaboration infrastructure.

 

Confidential computing provides the additional security of a hardware-based, trusted execution environment (TEE) to prevent unauthorized access to data or code during processing. Confidential computing uses attestation to verify the integrity of the computing environment (including hardware and software) before allowing access to sensitive data.

 

Potential Use Cases

Data clean rooms backed by confidential computing have applications in any industry, but these regulated industries stand to benefit the most. For example, in financial services, a data clean room can support several critical use cases, such as:

  • Improved fraud detection and anti-money laundering tactics. To effectively expose and mitigate financial fraud, institutions must analyze various data to identify patterns, anomalies, and suspicious activities. Data clean rooms make more data available for analysis, making fraud detection significantly faster and more accurate.
  • Deeper insights into portfolio risk. Data clean rooms provide access to external data that provides a broader context, incorporating economic indicators, market trends, and relevant information that enriches the assessment of potential risks and enhances the precision of risk management strategies.
  • Refined insurance rate calculations. Access to more data from inside a data clean room enables insurers to understand and assess risk factors, incorporating a comprehensive range of variables such as demographics, behavior patterns, and external market trends. This allows for more accurate risk predictions, leading to fairer and more customized insurance pricing.
  • Tailored recommendations of adjacent products. Data clean rooms enable banks to understand customer behavior, preferences, and financial patterns, and tailor recommendations of adjacent products that align with individual needs and circumstances. This data-driven approach enhances customer engagement and satisfaction by offering personalized and relevant financial products and services.

 

Within the healthcare ecosystem, data clean rooms empower research teams to access sensitive datasets, such as real-world data, for sophisticated analytics while protecting confidential patient information. Example use cases in healthcare include:

  • Collaborative Clinical Trial Recruitment. Run proprietary models on patient data to determine likely candidates meeting clinical trial enrollment criteria without revealing model weights.
  • Post-trial analysis with RWE and SDOH data. Leverage Habu’s strong network of social determinants of health data owners to enable collaborative exploration of joint datasets without exposing PII or compromising trust for post-trial insights.
  • Improved collaboration with consortiums. Collaborations between public and private entities often face challenges due to misaligned incentives. Data clean rooms streamline collaboration by mitigating legal and ethical complexities, fostering a more cooperative and prosperous environment.

 

Architecture

 

AnanyaGarg_0-1706299286273.png

 

 

Workflow

The solution involves the following steps:

  • Using AES symmetric encryption, the owner of the data clean room encrypts its datasets, wraps the AES key with an asymmetric RSA key pair, and stores the wrapped encryption key in their Azure Key Vault.
  • The partner follows the above approach with their datasets.
  • Both parties configure their Azure Key Vault to grant Habu access to the requisite data encryption keys.
  • The owner and Habu co-author a program that determines the results and runs it in a TEE provided by Habu and backed by AMD SEV-SNP Confidential VMs.
  • In this TEE, Habu gets the attestation token and sends it to the owner’s and partner’s Azure Key Vault to release their respective private keys.
  • Habu decrypts both datasets in the TEE and writes the result to a shared storage account in Habu.

 

Components

  • Azure Kubernetes Service supports the addition of confidential computing VM nodes (as agent pools in a cluster) to allow sensitive workloads to run within a hardware-based trusted execution environment.
  • Microsoft Azure Attestation uses a combination of hardware- and software-based attestation to verify the identity and integrity of the trusted execution environment before launching a sensitive workload. It ensures it runs on trusted hardware that is not compromised, verifies the integrity and authenticity of software running on VMs or containers, and ensures keys are only released to trusted software components.
  • Azure Key Vault allows the data clean room owner and its collaboration partners to safely store and manage cryptographic keys, certificates, and other secrets used for encryption and authentication.

 

Considerations

This architecture implements the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see the Microsoft Azure Well-Architected Framework.  

Security

This solution uses Habu data clean rooms enabled by confidential computing to shield data from being read or modified by any code outside a trusted execution environment. With the combined privacy capabilities of a Habu data clean room, Microsoft Azure confidential computing environment, and AMD EPYC™ processors with Infinity Guard featuring Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP) technology, organizations can be sure they’re collaborating with their partners without compromising sensitive data.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see the guidance for the cost optimization pillar.

For a deployment in a single region, example pricing information is available in the Pricing calculator.

 

Deploy this scenario

Habu data clean rooms are available as a low-code/no-code SaaS offering. Learn more about Habu in the Azure Marketplace.  

 

Contributors

 

Related resources

Co-Authors
Version history
Last update:
‎Jan 29 2024 10:43 AM
Updated by: