Take control of your data by discovering sensitive information across every file type and location with Microsoft Purview Information Protection.
Classify your data, apply clear labels, and enforce protections that automatically adapt to human and AI interactions so you can reduce risk without slowing down workflows. Proactively monitor, assess, and respond to risk in real time. Use labeling and layered policies to stop accidental sharing, manage AI access, and maintain consistent protection across your organization.
Matt McSpirit, Microsoft Mechanics expert, joins Jeremy Chapman to share how to turn scattered data into actionable security that moves as fast as your team and AI.
Scan your environment beyond standard detection.
Identify gaps where AI or big files might expose sensitive data. Get started with Microsoft Purview Information Protection.
Reduce the risk of accidental sharing.
Label sensitive data, including proprietary and hard-to-detect content, to enforce access controls instantly. See how DLP and IRM work.
Act before exposures become incidents.
Identify data risks early, prioritize what matters most, and take action to reduce exposure with Microsoft Purview DSPM.
QUICK LINKS:
00:00 — Microsoft Purview data protection
01:04 — Data Loss Prevention
03:36 — Layered approach in addition to DLP
04:13 — Unified classification
04:27 — How sensitive data is determined
06:23 — Create trainable classifiers
07:06 — Distinction between classification and labeling
08:06 — Configure policy protections
09:12 — DLP in action
10:10 — IRM in action
10:51 — See how protections show up
13:37 — Move from reactive to proactive protection
15:00 — Wrap up
Link References
For deeper guidance, go to https://aka.ms/PurviewInformationProtection
Unfamiliar with Microsoft Mechanics?
As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.
- Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries
- Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog
- Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast
Keep getting this insider knowledge, join us on social:
- Follow us on Twitter: https://twitter.com/MSFTMechanics
- Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/
- Enjoy us on Instagram: https://www.instagram.com/msftmechanics/
- Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics
Video Transcript:
- If you don’t understand your data, what it is, where it lives, and how sensitive it is, you can’t protect it. And it’s easy to assume that you’re covered, maybe you’ve already got data loss prevention, or DLP, running with near realtime detection, which is helpful, yes, but it’s not enough. Protecting data today means going beyond what traditional tech scanning can catch and making sure that those harder to parse file types are covered too. And it also requires a layered approach with instant risk insights, starting with consistent and automatic classification, so everyone’s clear on what’s actually sensitive. Labels that make sensitive content easier to interpret and trigger automatic policies, and Adaptive Protection that responds to the risk level of each user, whether human or non-human, and how they engage with the data. In fact, this matters even more with AI that can now bring hidden or long forgotten information to the surface in just seconds. Now to walk us through all of this, I’m joined by a Microsoft Mechanics expert, Matt McSpirit.
- Thanks, it’s great to be back.
- Okay, so before we get into solutions, why don’t we unpack this a bit more. So for a lot of people, even as they adopt AI, there’s this notion that maybe DLP is good enough. It’s finding things like credit cards, it’s also looking at things like financial information, identity numbers, addresses, et cetera, even if you aren’t paying attention, by the way, to where that information is stored. So is it even worth the extra effort in doing something else?
- Well, these are all fair points, and DLP is one powerful piece of the puzzle. And part of its appeal is that it works without the need to label or add any metadata to your content. It’s also rule-based and can look for sensitive information types as they’re being written, read, or sent, and then use what it finds to apply corresponding protections to prevent sharing or contain its sharing radius.
- Okay, so what you just said sounds like all upsides. So the policies are relatively easy to configure, they work by default with all your Microsoft 365 and Office apps and your managed devices, as long as people are signed in with them, regardless, really, of where that file goes as well. So what’s the downside?
- Well, depending on the scenario, there are a few areas. First, there’s speed of detection and response. Now in this case, I’ll show you an example of DLP in action. I’ll paste in a few thousand words from my clipboard into this Word document. And now DLP will compare it with hundreds of sensitive information types like bank numbers or IDs, dozens of trainable classifiers like contracts or patent applications, and do cross look-ups against exact data match, and more, which based on physics, orchestration, and query speeds, takes time. And it’s only when the policy tip appears whether I choose to apply the recommendation or not, that the content is protected. As you can see, I can’t now share this file externally because DLP has found sensitive information. So there’s a window of time based on a number of factors for DLP to find sensitive information and apply protection. Next, breadth of coverage is another area. You might have file types that can’t be scanned for text easily, like these files synced on my OneDrive location. These are proprietary file types from line of business apps as well as 3D CAD files. So in this case, you’d need a different way to identify the sensitivity of these files and protect the container of the files themselves, like you can see with this rights-protected document using the ARC Add File extension.
- And that makes a lot of sense. You know, even though compute and detection are getting faster, if you’ve got like a hundred-page document and it’s got, or maybe a massive spreadsheet, it’s got passport numbers or similar things buried in it, it’s going to take significant time, then, to find that sensitive info.
- Right, and if we add AI to the picture, which needs to orchestrate access to data across multiple data sources to respond in milliseconds, this isn’t the optimal approach when speed of response counts. And that’s where a layered approach comes in. In addition to your policy engines like DLP, it’s important to augment what you’re doing with unified data classification. It gives you a broader, persistent understanding of sensitive data across your environment so that it’s easy to assess your data risk and then add sensitivity labeling to your data security strategy. This way, DLP can immediately act on an existing signal rather than having to evaluate everything from scratch each time.
- Okay, so why don’t we go deeper then on unified classification as part of this layered approach.
- So this actually gets to the heart of the problem. Over time, as data keeps growing and shifting, different teams and tools have ended up defining sensitive data in their own ways, and it’s hard to know where all that data lives. No one really intends for the inconsistency, it just happens and you’re left with a patchwork view of your data instead of one clear picture. And that’s why the first step is giving everything that works with your data, whether that’s your users, AI, or your apps and policy engines, a single consistent way to recognize what’s important. So here in Data Explorer, Microsoft Purview has already identified sensitive data across my environment automatically. This reflects a unified data classification approach that discovers your sensitive data wherever it lives. I didn’t build any rules for this. This discovery happens automatically. And if I drill in, I can see exactly where these files are, even preview the content to see the content in question and easily understand why they were identified as sensitive.
- And there’s really a lot to it that’s powering this classification. So what is Purview then looking at to determine if there’s sensitive information there?
- Right, there’s a lot happening under the covers. Purview uses two main built-in classification methods. First, sensitive information types that detect specific regulated data such as credentials, IDs, or financial numbers with more than 300 built-in detection patterns for regulated data. And second, more than a hundred pre-trained classifiers that understand broader categories of content like budgets, HR files, or source code. These classifiers are built using Microsoft’s domain expertise and training data sets to recognize common business content categories. Additionally, how fresh your data is also matters to Purview. Purview evaluates new and modified content, automatically analyzing the data with the latest classifications and policies so that your most recent data is well understood and has the latest protections. And if you want to evaluate data that hasn’t been accessed recently, you can run on-demand classification to scan data at rest, helping you uncover sensitive data that might otherwise be overlooked.
- And building on what you said, Matt, you know, you can also teach Purview to recognize content that’s unique to your organization. For example, you can create your own trainable classifiers by providing real sample content. You just have to point it to a SharePoint site with 50 to 500 files of matching content. Or you can use exact data match for structured data comparisons against exact text strings. Think of things like code names, or maybe a specific customer, partner, or competitor names, and more. And Purview, it also supports fingerprinting for things like standard forms or templates so that they’re recognized even if the wording changes. Of course, classifications can trigger protections once they’re paired with active policies.
- Right, and interestingly, labels can also trigger protection policies.
- And we should really unpack this a bit more, because I think a lot of people watching probably make the mistake of conflating classification and labeling as being one and the same thing.
- It’s a common mistake, but there is an important distinction. In fact, there’s an easy way to think about this. Think of data classification as recognizing what your data is. It’s about understanding the sensitive information that’s present in your data. And data labeling is the simple to understand wording along with your intent for how the data should be handled. For example, a confidential/do not forward label needs no complex explanation on how you should handle the data if you’re the user. And on the backend, Purview quietly protects the data based on how you’ve define protections associated with that label, like access restrictions or watermarking. And the bonus is that this guidance and protection travels with the data. And you can set labels up in Microsoft Purview Information Protection. This lets you create sensitivity labels like these to define how different types of data should be classified. Once you’ve done that, you can configure policy protections that are triggered by those labels, such as encryption, limiting the sharing radius or visual markings, and more. And when used in tandem with DLP, you can even prevent Copilot from processing labeled content. Next, with your labels created, you can publish them so they appear in apps like Word, Excel, PowerPoint, and Outlook, and are honored across services like Fabric, Dataverse, and of course, as I mentioned, Copilot. All of what I’ve shown you is included with most versions of Microsoft 365. And with Microsoft 365 E5, you can even set up auto labeling, so Purview can apply labels automatically when it detects sensitive content.
- So labels are respected across all those destinations.
- That’s right, and once a label is applied, it’s recognized across supported workloads, and Purview solutions like DLP, Insider Risk Management, and more, know how to handle that data properly. So instead of stitching together separate tools, each with its own definition of sensitive data, you define sensitivity only once. And that same signal drives consistent protection wherever the data travels to. In fact, let me show you how this works in practice. So here in DLP, I’m going to create a policy based on what Purview has already automatically discovered across SharePoint and OneDrive. From the Insights card, you can see the top sensitive information types like medical, IP and trade secrets, financial data, and medical identifiers. So I’ll get started, then choose to create all of the recommended policies. Now, if I go back to my DLP policies view and look at the ones I’ve just created, you’ll see that there are four new policies. If I click in to edit one, you’ll notice that Purview has already preselected the right conditions with trainable classifiers and actions predefined for the policy. And from there, I can even add to this policy. In this case, I’ll add my confidential labels to the policy. These are the same ones I’ve shown before. So in short, classification identifies the sensitive content, the conditions being met will then trigger the corresponding policies to enforce protections. This reduces configuration effort and ensures consistency across your environment. And in Insider Risk Management, labels work as risk signals too. So here in the policy template, I’m adding a condition that focuses on activity involving items labeled confidential. And that way, if users including non-human agents, exfiltrate or misuse high-value labeled data, printing it, copying it to external storage, or sharing externally, IRM will automatically elevate their risk score based on the activities against the labeled data. So labels also help enforce adaptive protections based on the risk profile of who, whether that’s a human user or a non-human AI agent, and their activities with the data. What we call Adaptive Protection.
- Okay, so now we’ve got all of our policies in place. Why don’t we see how those protections show up in the flow of work, including AI interactions? So first I’m going to upload the same file that Matt showed before, but this time, it has a confidential label applied. So when I try to share it externally, you can see that I’m blocked instantly because that label is detected right away. DLP blocks the action based on the label, and this, again, is before that file could be scanned for sensitive information. Now I’m going to switch desktops. On the left here is a window with a synced folder in File Explorer. And you can see that there are proprietary file types and CAD files like we saw before, and each are labeled but cannot be analyzed for sensitive information types or classifiers. So with the labels applied to these encrypted P files, as they are, if I do try to drag and drop a file into my removable USB driver location in the window on the right, you’ll see I get a data loss prevention notification. Now because in this case, I’m under the file count threshold that we set before in policy, I can allow or override this, but I would’ve been blocked outright if I had transferred multiple files. Now again, the labels in these uncommon file types are what triggered the data loss prevention policy. And inside of risk management, it is also watching for risky handling of labeled content. For example, I can currently access this highly confidential acquisition site and see all the documents contained within it, for the moment. That said, though, because I just attempted to copy confidential information to my external USB drive, that’s going to catch up with me and automatically change my risk profile. So now after some time has passed, if I try to access that same site, I’m blocked outright and denied access. The protection automatically adapted to my heightened risk profile and blocked the site, without the administrator even needing to take any action. And by the way, the same assessment against risk profile would happen if it was an AI agent and it tried to do the same thing. And beyond agents, why don’t we look at label protection, and how that works in general with AI. So here I’m in Copilot and I have a document uploaded to SharePoint. So I’ll prompt Copilot to summarize the file named Relecloud Acquisition, and you’ll see that Copilot will first check the user’s permissions and the presence of a label before it does anything. Now, because this document is labeled as highly confidential and we have a DLP policy in place to block Copilot from processing sensitive files, it tells me that it can’t summarize that content because of its sensitivity label.
- So from creation to risky behavior and even Copilot interactions, the same sensitivity label ensures consistent protection. But the work is never really done. New data keeps coming and risk changes over time. That’s where, because you’ve already classified your data, Purview’s Data Security Posture Management, or DSPM, addresses this by continually assessing your data risk. It’s deeply integrated across Microsoft and beyond, giving you one centralized place to discover unprotected sensitive data across your entire digital estate, including select non-Microsoft services. Built-in intelligence continually assesses data risk to help you prioritize and mitigate high-risk exposures, taking advantage of recommendations where you can strengthen your policy directly from DSPM itself. AI observability features also give you granular insight into what agents are doing and any risk they may introduce. And custom reports make it easy to embed posture management into daily operations by highlighting where to improve.
- And this is all built to help you then move from reactive investigation to more proactive and measurable risk reduction.
- Exactly, and actually, this is just scratching the surface of what Purview can do. You can also use AI itself to manage human and AI data risk using deep-reasoning Purview agents. For example, they can triage alerts and automatically message users in Teams with the sensitive data found and the actions they need to take.
- Okay, so as you saw, there are lots of ways that this layered approach goes beyond traditional DLP protection. So where can everyone who’s watching right now learn more?
- Well, first, check out aka.ms/PurviewInformationProtection. Again, if you use Microsoft 365 in your organization, you’ll have Microsoft Purview today, and you can get the more advanced Purview capabilities with Microsoft 365 E5. So it’s worth exploring further. So start using unified classification and labels today.
- Thanks, Matt, and thank you for joining us. Be sure to subscriber Microsoft Mechanics if you haven’t already, and we’ll see you next time.