Azure Purview is a unified data governance tool that helps you manage and govern your on-prem, Azure, multi-cloud, and SaaS data. One of the “multi-cloud” features enables customers to scan data stored in their AWS S3 buckets, discovering sensitive information types.
This blog post shows how this scan may be configured with a few steps.
We assume you already have a Purview account, otherwise please follow instructions on our documentation.
Step 1:
First you create an AWS role that will allow Purview to scan the S3 buckets. An AWS role is basically an AWS identity with permission policies determining what the identity can do in AWS.
Follow the instructions for creating the AWS role.
Step 2:
Here, you’ll find how to retrieve the role ARN of the AWS role created in the previous step.
Afterwards, you create a Purview credential for accessing AWS S3. In general, a credential is information used by Azure Purview to authenticate to registered data sources. Follow these instructions for creating the credential.
Optional Step 3:
This step is only necessary if any of the S3 buckets are encrypted, otherwise it may be skipped.
Please follow this guideline to ensure Purview is able to scan encrypted S3 buckets.
Step 4:
This is required if you intend to scan a single Amazon S3 bucket as a Purview resource. If you plan to scan many S3 buckets of an AWS account, proceed to Step 5 of this blog post instead.
Retrieve your AWS S3 bucket name as documented here.
Afterwards, add a single Amazon S3 bucket as a Purview resource according to the following instructions. This ensures the bucket can be scanned by Purview.
Optional Step 5:
Perform this step if you plan to scan all S3 buckets of an AWS account instead of a single AWS S3 bucket. Adding an AWS account as data sources ensures S3 buckets contained in the AWS account can be scanned by Azure Purview.
When following the instructions, please observe that the layout of the IAM dashboard has changed.
Here you’ll find guidance on how to add an AWS S3 account as a purview resource.
Step 6:
Depending on your previous action, this step scans either a single AWS S3 bucket or all S3 buckets of a whole AWS account. By scanning an S3 bucket, Azure Purview learns which sensitive data is stored there.
If you intend to scan select file types or limit scanning to a subset classification rules, you may consider creating a custom scan rule set.
Follow the instructions in order to create a scan for one or more AWS S3 buckets.
Step 7:
In the last step, we explore the Purview scanning results. As illustrated here, this allows drilling down in Purview data sources.