Identify what data was exposed in a breach, not just where it moved, but what it contains, how sensitive it is, and the risk it creates using Microsoft Purview Data Security Investigations.
Search across massive volumes of files using natural language, pinpoint the highest risk content, and connect it to user activity to see the full scope of an incident.
Investigate and act in one workflow. Analyze content deeply across files, emails, and AI interactions, uncover hidden or unclassified sensitive data, and contain exposure fast. Proactively identify risks, respond to incidents with clarity, and mitigate impact before it spreads.
Christophe Fiessinger, Microsoft Purview Principal Squad Leader, joins Jeremy Chapman to walk through real-world investigation workflows — from scoping and analysis to mitigation and automation — so you can move faster and make more informed security decisions.
Pinpoint high-risk files.
Locate files hidden among hundreds of confidential documents using contextual search. See how Microsoft Purview Data Security Investigations works.
Search thousands of files in seconds.
Use natural language queries to uncover relevant sensitive data. Get started with Microsoft Purview Data Security Investigations.
Contain data leaks immediately.
Purge exposed files while retaining investigation evidence. Take action with Microsoft Purview Data Security Investigations.
QUICK LINKS:
00:00 — Keep data safe with DSI
01:26 — Connect dots between data risk & impact
02:47 — Built-in AI
03:47 — Work across the full lifecycle of an incident
04:56 — Create an investigation
06:36 — Deep search and analysis
09:03 — How DSI helps data leaks
10:40 — Contain risk with built-in mitigation
11:32 — Automate using agents
13:23 — Estimator tool
14:57 — Wrap up
Link References
As a Microsoft Purview admin, just go to https://purview.microsoft.com/dsi
Unfamiliar with Microsoft Mechanics?
As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.
- Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries
- Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog
- Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast
Keep getting this insider knowledge, join us on social:
- Follow us on Twitter: https://twitter.com/MSFTMechanics
- Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/
- Enjoy us on Instagram: https://www.instagram.com/msftmechanics/
- Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics
Video Transcript:
- If you’ve ever had to respond to a major data breach, insider-driven data theft, or even a suspicious leak involving high-value information, you know the hardest part isn’t just detecting the activity, it’s understanding what data was actually taken, how valuable it is, and what risks that creates to your organization. Today we’re going to show you how the now generally available Microsoft Purview Data Security Investigations, or DSI, dramatically accelerates that process using AI to read and analyze and connect the dots fast at massive scale. I’m joined by Christophe Fiessinger from the Microsoft Purview team to demonstrate more. Welcome.
- Thanks, Jeremy. Happy to be here.
- Thanks so much for joining us today. So most IT teams that I speak to, they’re often using things like SIEMS or incident management tools that connect activity across compromised accounts, devices, and files when they’re responding to things like security events. But these tools, they rarely reveal what’s affected in terms of the files and what’s contained in them. They might show labels, they might show file names or basic metadata like the location or the owner.
- Exactly. Beyond labels on metadata, it’s all about context. Metadata gives you the file name, classification might tell you it’s a financial document, and the label might say it’s confidential, but traditional tools can’t really tell you what’s in the content and how much risk it exposes. They just tag the content, they don’t explain it.
- So how does DSI then change things?
- So DSI on the other end doesn’t just say it’s a confidential financial document. In fact, you might have hundreds of those. Instead, it actually reads and understand each file and the data risks they pose. So of the hundred or so finance documents classified confidential, it can find the one file that carried an existential threat to your company, like the one that contains your entire customer list with the unique credentials that each customer uses to log in your online service. In DSI, that level of insight comes from hybrid vector search and generative AI working together. Hybrid vector search can pick up on semantically similar items, synonyms, or the subtle ways people hide sensitive information while also matching precise text strings like code names or account numbers. In short, it finds the right files by combining context with keyword precision, then generative AI takes over and actually analyzes those files. It performs deep content analysis to uncover sensitive data, security risk, and relationship hidden inside the impacted document.
- So it’s removing a ton of manual effort by connecting the dots around the data risk and also its impact.
- That’s right. DSI helps you rapidly understand and mitigate the downstream impact. You can start large-scale data investigation and use natural language search to find and narrow in on impact data. From there, you can leverage our powerful built-in AI to deeply analyze content, files, email, team messages, and even review and analyze prompts and responses from AI apps and agents, built-in Microsoft Foundry, Copilot Studio, as well as non-Microsoft agents and apps at scale. DSI is able to establish the context around information and even detect obscure sensitive information that might not have been flagged. It can reason over dozens of major world languages with production-grade quality. And it can directly mitigate identified risk. For example, a specific high value content has been distributed to multiple users. You can purge every instance of those files. With DSI, you can also work on data investigations more efficiently across the full lifecycle of an incident with the rest of your team. As part of Microsoft Purview, you can trigger investigation directly from Data Security Posture Management to dig deeper into data that’s at risk and see how valuable it is. And in Insider Risk Management where you might want to understand larger sets of data being used by risky users or agents. Equally, DSI also provides a useful bridge to your security operations team who can start DSI investigations directly from Microsoft Defender XDR. And because DSI is now integrated with the Microsoft Sentinel graph, data security analysts can connect at-risk information to the activities around it, who accessed it, where it was shared, whether behaviors like compromised sessions or impossible travel were involved, and visually correlate risky content, users, and their activities. It automatically combines unified audit logs, Entra audit logs, and threat intelligence which would otherwise need to be manually correlated.
- That’s a really powerful solution. Can you show us an example of an investigation?
- Let me show you Data Security Investigations and where to quickly find all your current and future investigations. From the main Data Security Investigations overview, you’ll find everything you need to get started. identifying content, analyzing deeply what’s contained in that content, and mitigating risk, as well as access to all of your previous investigation so you can quickly pick up where you’ve left off and create new investigation from here. You can start an investigation in a few ways. Sometimes proactively using DSI to assess potential data secure risk or other times reactively like when you already know data is leaked and you need to investigate the breach. In this case, I’m going to start this investigation from Data Security Posture Management to get ahead of data risk in our environment. One of the most common types of data leaks is exfiltration of confidential information. Like if an employee moves on to a competitor with trade secrets or a seller wants to bring their client list their new job. Here I can see a recommended objective to prevent exfiltration of risky destinations. Once I click to view objectives, I can see the amount of data exfiltrated, top sources, as well as file types, and I can see an action to create a new investigation using DSI. Here I just need to give it a name, then provide some context about what I’m trying to do in this investigation like, “I’m looking into confidential data that may have been exfiltrated from my organization. I’m specifically looking for confidential and proprietary information about Project Obsidian, the new release we’re working on.” Now I’ll confirm and create the investigation. From here, I can put in the rest of the parameters for deeper search and analysis. In the investigation, I can see a summary about the investigation and from here I can refine the search scope and make change to the date range and people if I want, which will keep things more efficient. And if I need to, I can always add more data sources to the scope. I’ll keep the data source as is and hit add to scope. This grabs the content from the data source and into our investigation. Now I can further analyze the data and I can use a natural language query. And as mentioned DSI will analyze thousands of languages as part of the process. There are a few intelligent search suggestions, but I’m going to do my own search for “information disclosed to customers about project obsidian.” And in just a few seconds I’ll get information assessing exactly what I’m looking for based on my search criteria. It finds over a thousand items with a lot of different languages represented as you can see. On the left, the AI also suggests content categories based on the executed vector search so that it’s easy to organize and make sense of the amount of risk per category. So I’ll filter all those files down to using the obsidian category, and there they are. From here I can select which ones I want to deeply analyze. I’ll choose all of them in this case and hit examine. And here to choose the focus area for the investigation, I can look for credentials, analyze risk, and get mitigation recommendations. I’m going to choose risk in my case so that I can act quickly to contain the risk and hit examine one more time to kick up the process. As it works, I can view its details. This is where AI runs deep content analysis against all the content in these files by looking at the file content itself. This goes beyond common sensitive information types and trainable classifier matches. And depending on the number and size of the files that you have in scope for this, it could take a few moments to run. And you’ll see that it found relevant results each with an assessment, if it’s privileged content, and overall security risk scores and a risk explanation. I can drill into any of these to preview the content in line like this Microsoft 365 Copilot chat message. Moving back, I can also see other risk scores and explanations for credentials on the right-hand columns.
- So DSI in this case uncovered a lot of what we call dark data. These are files that were never classified, which is great then for getting ahead of risk, but leaks do eventually happen. And when they do, we need a way to see exactly what got out and how we contain it.
- That has happened pretty often, unfortunately. Let me show you a case where credentials were leaked externally as part of a security breach and I had DSI helped. And to show you the integration for SecOps teams with Microsoft Defender XDR, I’ll start from an active incident for data exfiltration in this case. In the incident view, you get the high-level signals, the attack timeline, which users on device were hit, and the file names involved. But we still don’t know what was actually inside those files and what earlier activities might have set up the attack or created additional risk across other files. So from the action menu, I’ll create a DSI investigation right from this open incident to find out more about the content in those files. Here I just need to give it a name, then also paste it in a description and some additional context like I did before for the AI. Then I’ll create the investigation and then it links me directly to an investigation in Microsoft Purview. Like before, I can see a summary and refine the search scope if I want. This time I’m going to fast forward a few steps for scoping the data source and examining the content and just go right to the examination results. Here you can see the subject or title of each item, extracted credentials, including usernames, passwords, and more, credential types including API tokens and MFA, a surrounding snippet or the text around the credential details for context, and the thought process with a summary of the AI reasoning. Next, I also want to show the built-in mitigation. We can actually purge the sensitive files that were forwarded around by email to contain the damage without touching the original copy so we’ll keep the evidence. From the results, I’ll select the items I want, then I’ll choose add to mitigation which will in turn create a list of files and messages containing those credentials. From the list I’ll select purge queue, then view the messages and run the purge where I can choose from a recoverable soft purge or permanent deletion with a hard purge. I’ll keep the default and confirm the purge. Then all the information matching that query will be deleted in minutes. And since these files are part of the investigation, they stay retained for review but are hidden from end users. And safeguards like in-place holds for eDiscovery still work normally so protected files aren’t removed.
- Okay, so far we’ve defined all the investigations up front. Is there maybe a way to automate the process using agents?
- Absolutely. We’re adding new capabilities to help tackle a major hidden risk, credentials buried in everyday files. While Microsoft Purview DLP protects credentials in real time as files are created or shared, the Data Security Posture Agent powered by Security Copilot helps security teams identify and prioritize credential-related risks across scope data allocations. Here you can see that I’ve already enabled the agent and there’s a few tasks in progress. These can be started manually or run on a schedule. I’ll start a new assignment for this agent and create a credential scanning task. We’ll be adding our task types to this over time. I can give it a name or keep what’s there. Then add some additional context, in this case, to look for credentials and passwords. Then I can view its progress as it completes scanning data locations, access patterns, analyzing risky documents, and generating the report. The agent works autonomously scanning thousands of locations and potentially millions of files. I’m going to move over to a scan I ran earlier to save some time. Once the agent completes its scan, you’ll see a prioritized list of exposed credentials such as passwords, API keys, encryption keys, tokens, and more, each with a risk score and the agent’s reasoning. From there, I can group the results into categories, then filter for the highest risk credentials. For each credential found, I can explore the details of the credential itself plus its surrounding context.
- It’s a huge advantage really to run these types of credential scans at scale to catch those risks. But why don’t we switch gears though for the human-led investigations. DSI is using pay-as-you-go billing, which, you know, if people are watching this, they’re probably wondering, how do I keep these investigations in check without breaking the bank?
- So cost, as you say, are usage based and billed through Azure. They’re going to vary depending on the size and complexity of your investigation. So we’ve introduced a new estimator tool to help. Before I go there, as a baseline to see the compute unit I’ve been showing until now, I’ll start in the pay-as-you-go dashboard in DSI, and then filter by our last investigation. This one only used about 250 megabyte and 109 DSI compute unit, which is quite conservative. So let’s go back to the DSI overview tab and scroll down to our new estimate cost tool. This lets you input key values like investigation size and gigabytes and the number of vector searches, and it will estimate cost based on what you enter. It shows you the cost breakdown by types for size and AI usage. And the last related control I want to show you is in Azure Cost Management, where like any other Azure services, you can see forecast and accumulated costs. I’ll filter this by my DSI shared view. In this chart, you’ll see the investigation compute and gigabytes by day as well as a forecast. So, voila, you’ve got what what you need to investigate deeply with AI and keep costs in check while staying ahead of incidents. And we’re only getting started. More integration, smarter AI, new mitigation actions, and more agentic workflows are on the way.
- Thanks so much for joining us today, Christophe. And if you want to learn more about DSI and try it out for yourself. As a Microsoft Purview admin, just go to purview.microsoft.com/dsi. And keep watching Microsoft Mechanics for the latest updates. We’ll see you again soon.