Video Transcript:
-The data you work with every day lives across multiple clouds, services and on-premises locations. Rather than using multiple tool sets to get a handle on it, especially in the age of GPT and generative AI.
-Today, I’ll show you the latest updates to Microsoft Purview, which gives you a single unified solution for discovering, understanding and protecting sensitive information of different types, both structured and unstructured across your multicloud data estate at scale.
-Whether it resides in Microsoft 365, Azure, non-Microsoft Clouds, SaaS services, and even on-premises in your data center. Importantly, the data classifications and protections that you define with Microsoft Purview persists in place without you having to migrate your data into the Microsoft Cloud.
-Over time, Microsoft Purview derives insights into your protection posture and trends to help prioritize locations where data in your organization might be exposed to risk. And from one place you can access data security, data governance, data compliance, and more.
-Additionally, under information protection, you can find proactive recommendations and valuable insights around data you’ll want to secure right from this unified experience. And the policies you put in place with Microsoft Purview will protect your data wherever it is. Even as you work with it using native tools in different clouds.
-Let me show you. Here I’m a data engineer and my tool of choice is Azure Data Factory for ETL and pipelining to help generate reports. I’m working on customer billing and related reporting. We’re pulling in data from multiple sources on the left. The customer profile data is coming from Azure Data Lake Storage and it’s classified as confidential.
-And the department data is also coming from Azure Data Lake Storage. There are also three other data sources for payment info, transactions and fiscal year. These are sitting in Amazon S3 buckets. The payment info data is also classified as confidential, as is the transactions data.
-Even though they reside in non-Microsoft cloud locations. All these curated sources feed into a SQL table called Customers. The confidential sensitivity label here shows that the data classifications have also been applied to its contents. This table in turn feeds our Power BI reports. So now let’s look at my data access experience.
-Here in SQL Server Management Studio, I’ll run a top 10 query with a few sensitive data fields. You’ll see that it returns all of the results, including sensitive information like bank routing numbers and credit cards. That’s because in my role as a data engineer, I have the right permissions to curate, access, and perform operations on sensitive data.
-Now, let’s compare the experience if I was an external vendor with limited data access privileges. Here I’m in the Purview data catalog and I’m able to see the data schema for this table with column names and data types. This lets me quickly see which columns are available in the table.
-Now, as I query data for my report, notice that I’m unable to connect, because a few of the fields are classified as confidential. On the other hand, when I update and run the report with nonsensitive data fields, I can access the data and continue my work. And in case you’re wondering how these protections apply outside of Microsoft tooling, I happen to also have access to the source data in the S3 bucket using AWS Management console.
-In this case, I’m using a CSV file containing sensitive information, and I’ll query it. When I run a query against it here as well my access is denied. The protections persist. This level of automated and unified classification and protection from one common control plane is unparalleled.
-And if you’re a SQL user and have been applying manual protections, this makes it a lot easier. Once data sources are registered and scanned in Microsoft Purview, they’re classified based on hundreds of built-in sensitive information types or the ones you define and labels are auto-applied.
-And in the future, as you register additional sources or re-scan previously registered data, anything meeting your conditions will be classified and labeled. Policy-defined protections like access controls are then enforced based on these labels. For example, a data engineer had full access to confidential information, whereas our external vendor analyst was limited to work with only non-confidential data.
-Now let’s switch gears and look more closely at the admin experience and what made it possible to extend data classifications and corresponding protections across your multicloud estate. Once all of your data sources are connected to Microsoft Purview, let me show you how you can use labels and policies to protect your sensitive information.
-First, Microsoft Purview leverages our extensive and continuously growing list of global sensitive information types. These are managed centrally here in Microsoft Purview and are used pervasively across files, messages under Microsoft 365 and schematized data, which is anything not in Microsoft 365.
-Likewise, your sensitivity labels are also managed in one place. And if you’ve already defined these for your Microsoft 365 environment, your existing labels can now extend to connected Microsoft and non-Microsoft data across your estate.
-Here I’m an information protection under sensitivity labels. I’m going to edit this confidential sensitivity label and extend the label scope. Notice we now have a new option for adding schematized data assets. And you’ll see in the description the locations where this label can be applied, including Azure, AWS, and many more not listed here, including on-premises and other clouds.
-So I’ll select it and you’ll also see that I can allow Microsoft Purview to auto label discovered information based on defined sensitive information types. And from here I can choose the ones that I want to trigger for auto labeling. These are the same items we saw earlier in the list of sensitive information types. I’ll choose three. Routing number, credit card number, and US social security number and hit add, and I’m done. So now we have our classification labels defined and auto-applied.
-Next, let me show you how I’m able to implement user specific differentiated access controls based on label data right down to the individual column level. I’ll start again in information protection under policies, and I’ll create a new protection policy. Once I give it a name and a description, I can define what to protect. From here, I need to add a sensitivity label, and I’ll choose the confidential label that we just extended.
-Next in “Where to Apply”, you’ll see a list of root data sources where this policy will be natively enforced. Currently that’s Azure Storage, Azure SQL, and Amazon S3. And this list will continue to grow to include, for example, Microsoft Fabric. And for each of these locations, I can define the specific resource. Here for example, for Azure SQL, I’ll select it, then choose the location where I want this policy to apply. I can even get a quick view of impacted assets counts, and I’ll confirm.
-Then for Amazon S3, I’ll add the S3 bucket, then choose the one I want to scope. I’ll get a similar view of impact and confirm. Then in “How to Protect”, I can choose which users or groups will not have read access to the discovered sensitive data.
-Here, I’ll only allow data engineers access to the sensitive information, which will exclude everyone else, including our external vendor that we saw before. Next, I can decide if I want to turn this policy on immediately or keep it off. Which helps as I collect feedback on this policy and validate that I have the right data, users, and groups and scope.
-And from there I can create my policy. And now everything in those scope locations and those scoped users will be protected. And even as new data, files, tables or databases are added to these locations, the same protections automatically apply to them.
-So that was a quick overview of how Microsoft Purview now gives you a single, unified solution for discovering, understanding and protecting sensitive information of different types across your multicloud data estate. Especially in the era of GPT and generative AI.
-As our multicloud journey continues, you’ll of course see us increase the services we can classify and protect in the future. And to find out more and for details on how to get started, check out aka.ms/MicrosoftPurviewDocs. And keep watching Microsoft Mechanics for the latest tech updates. Thanks for watching.