Combining Azure confidential computing capabilities with Sarus unlocks new possibilities to combine sensitive data from multiple parties. Working with multiple banks, we demonstrated how a joint solution can track financial crime by pooling transaction data. It significantly improved the prediction power compared to siloed data while achieving the highest standards of data security and privacy.
Confidential computing solutions are increasingly popular to manipulate sensitive data. They bring unprecedented levels of security for processing personal data in a cloud environment. Bringing data from multiple parties onto a confidential computing platform protects against leakage to the cloud vendor and to other parties. But data is only useful when it can be queried, and confidential computing alone provides little protection for the output of queries. How can one guarantee that the outputs themselves do not leak confidential information?
This problem is commonly referred to as output privacy and has been a field of intense research over the past decades. In 2006, Cynthia Dwork, then a researcher at Microsoft, introduced the concept of Differential Privacy as a mathematical definition of the leakage risk of personal information in a computation’s output.
The universality of differential privacy makes it the perfect tool for scaling data access while minimizing leakage risk. It can be applied to any data processing workflow without requiring bespoke compliance processes. Sarus implements it in all data manipulation so that outputs never reveal sensitive information to data consumers.
In our work, data scientists built models to track and predict criminal activities in financial transactions. The flexibility of Sarus enabled them to experiment many advanced detection approaches, both rules-based and machine learning-based, and successfully ship powerful detection models. At no point during the analysis were they able to see transaction data.
The following services were provisioned:
The Data Science VM was the only entry point to the architecture. It was accessible via the Azure Bastion from the Internet and NSG strictly restricts its access to the Sarus VM on port 80. It could not reach any other resources in the virtual Azure hosted environment. It also had outbound access to the Internet.
The data scientists leveraged Sarus Private Learning SDK to build a comprehensive data processing pipeline and design detection models. The SDK wraps the most common data manipulation libraries (e.g.: numpy, pandas, scikit-learn). Methods from those libraries are sent by the SDK to the Sarus API for execution on the real data. But before that, the Sarus software compiles the desired computation into a differentially-private version so that the output is safe.
The data scientist progressively built up their data pipeline while interacting with the remote data through Sarus. They were able to design models that achieved the same performance as if they had been allowed to download the entire data on their workstation. They used the same tools and wrote identical code. The crucial difference is that the entire project was carried out without granting access to a single bit of user-level information.
Data science python code sample that is captured by the SDK and executed on remote data
We demonstrated that data collaboration on sensitive data can be delivered at scale using modern cloud architectures provided by Azure. The end-to-end data protection does not get in the way of the creativity of data scientists and analysts to solve new challenges. The applications go well beyond financial crime, with an obvious fit in insurance, healthcare, smart cities, or mobility.
About Sarus: Sarus is a privacy startup that enables research and data science on confidential data. It implements the most recent research in privacy technologies technology and brings it directly to the data warehouse. It was part of Y Combinator W22 batch and is a Microsoft Partner available on the Azure Marketplace.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.