Forum Discussion
On-Prem Scanner and EDM SIT matching
Hello....
We have 2 EDM based SIT that detect account numbers in our institution. These SIT and their corresponding labels work correctly in DLP rules of Exchange, Auto labelling in Onedrive and Sharepoint, and correctly detect and label files on my workstations when users open the matching files.
My issue is that the on-premise scanner (which is fully up to date as of 5/8/25) , is currently pointed at a test directory and that directory only has 2 files in it. No matter what I do, the scanner will not detect these files as matching data in the EDM based SIT labels. The scanner works fine for other labels I have and I have made sure that the labels are all published to the purview scanner service logon account (which has been granted an E5 license).
So, in a nutshell...the on prem scanner works great for all labels and sit that are NOT EDM based...yet documentation and asking copilot if the on-prem scanner can do EDM SIT based labelling, I am told it SHOULD work.
What am I missing here?
All insight is appreciated
4 Replies
- Ankit365Iron Contributor
Thomas_Powers The MPIP scanner scans files on on-premises data stores like file shares and SharePoint Server libraries. It uses pattern-based detection methods, including regular expressions and keyword dictionaries, to identify sensitive information. However, EDM SITs rely on matching exact values from a hashed data set, a process that the on-premises scanner cannot handle.
https://learn.microsoft.com/en-us/purview/deploy-scanner-supported-sits
Workaround (which we use): To achieve similar detection capabilities on-premises, consider the following approach:
Create a Custom SIT Using Regular Expressions or Keyword Dictionaries:
Develop a custom SIT that matches patterns resembling your sensitive data with regular expressions or keyword lists.
For example, if you're trying to detect account numbers, create a regex pattern that matches the format of those numbers.
Deploy the Custom SIT with the MPIP Scanner:
Configure the MPIP scanner to use this custom SIT to scan on-premises files.
Ensure the scanner is properly set up, and the custom SIT is included in the labeling policy assigned to the scanner's service account.
Maintain EDM SITs for Cloud-Based Scenarios: Continue using EDM SITs for cloud environments where they are supported, such as Exchange Online, OneDrive, and SharePoint Online. - Thomas_PowersCopper Contributor
Just double checked...our EDM SIT uses the US Bank Account number as the primary match...so it should work for the on prem...yes?
- BrianStephen
Microsoft
I believe the Microsoft Purview Information Protection (MPIP) scanner does not support Exact Data Match (EDM).
Sensitive Information Types supported by Microsoft Purview Information Protection scanner
https://learn.microsoft.com/en-us/purview/deploy-scanner-supported-sits
I hope that helps!- Thomas_PowersCopper Contributor
OK...but it also tags on custom SITs that we make with regex expressions or keyword dictionaries...and that works great.
Our EDM SIT is using the US Bank Account number as the primary match...with the EDM as a list of our account numbers. is this supposed to then be det4ected by the scanner since it's part of the listed SIT?