At the launch in early December, we gave you a sneak peek of the ability to manage multicloud data sources with Azure Purview. Today, I'm happy to announce, that you can now use Azure Purview to discover, manage and govern data residing in Amazon Web Services S3, in public preview.
A few reasons why you should consider governing your AWS S3 data with Azure Purview:
Discover data stored in Amazon S3 buckets: Scan data in Amazon S3 buckets in a matter of clicks using Azure Purview without needing to deploy software or perform complex configurations in your AWS environment with the help of the Azure Purview scanner. The Azure Purview scanner is an automated scanning and classification agent that reads metadata and sample content to capture the data asset and classify it.
Security insights. Discover the types of sensitive data you have stored in Amazon S3 buckets, and pinpoint its location. Verify that your sensitive data is not stored in unexpected locations, avoid data leakage, and follow compliance regulations.
Ensure data isolation and compliance: The Purview scanner that runs in the Microsoft account in AWS features full data isolation and complies with the highest Microsoft standards for data privacy. The Purview scanner does not store any customer data.
Simplified billing: The billing model to scan and classify AWS S3 sources is the same as any Azure data source. Refer http://aka.ms/Purviewpricing
Now, let's dive into what you can achieve with this feature!
1. Scan Amazon S3 buckets
Azure Purview now provides a managed, built-in solution to explore and govern data across your data estate, including both Azure storage services and Amazon S3 buckets.
Azure Purview uses unique technology to scan Amazon S3 bucket data, including an easy setup and configuration process and the highest Microsoft standards for data privacy:
The Purview scanner is deployed in a Microsoft account in AWS.
Scans are initiated by a simple configuration in Azure Purview, and do not require manual service deployments or maintenance.
Scanner access to the organization’s S3 buckets is granted by a dedicated role in AWS.
The Purview scanning setup ensures full data privacy by scanning Amazon S3 data locally in AWS. The scanning service uses full data isolation and does not store any data in AWS. Only the scanning results and metadata are sent to the Azure Purview data map, where it is displayed for administrators together with the scanning results from Azure services.
The Purview roadmap includes additions for even more non-Azure storage services and aims to strengthen Azure Purview’s multi-cloud capabilities, empowering data administrators to maximize the value of their data with a single view across their clouds.
2. Configure Amazon S3 in Azure Purview
Similar to the Azure data sources in Purview, you first need to register the Amazon S3 bucket as a Purview data source, and then initiate your scan.
You can either register one Amazon S3 bucket, for scanning a single bucket, or register an AWS account, for scanning all S3 buckets in the account.
When setting up the scan of Amazon S3 bucket or AWS account, you need to provide the Purview scanner credentials to access to the organization’s S3 buckets.
To grant this access, you first need to create a role in AWS Identity & Access Management. This role requires read only access to the S3 buckets you wish to scan. If the buckets are KMS-encrypted, a decrypt permission is needed as well.
To keep your buckets security and ensure this new role can only be used for your Purview scanning, use these configurations when creating the role:
Microsoft account ID – to allow accessing the buckets from Microsoft account only
External ID – a unique identifier for your Purview account used for accessing the bucket, for an additional layer of security
You get both the Microsoft account ID and the external ID values when you create a Purview credential object. You’ll need to copy-paste them into the AWS Identity & Access Management role creation screens:
Once the role is created, copy the role ARN value from AWS, and paste it in the Purview credential object in Purview portal. Then use the credential object to initiate a scan on your Amazon S3 bucket or AWS account.
3. Enable discovery of Amazon S3 data by your data consumers in Azure Purview Data Catalog
4. Get granular insights into sensitive data within your AWS S3 sources:
In the insight reports, see a unified view of all scanned data, including AWS S3.