Forum Discussion

Thomas_Powers's avatar
Thomas_Powers
Copper Contributor
May 09, 2025

On-Prem Scanner and EDM SIT matching

Hello....

We have 2 EDM based SIT that detect account numbers in our institution.  These SIT and their corresponding labels work correctly in DLP rules of Exchange, Auto labelling in Onedrive and Sharepoint, and correctly detect and label files on my workstations when users open the matching files.

My issue is that the on-premise scanner (which is fully up to date as of 5/8/25) , is currently pointed at a test directory and that directory only has 2 files in it. No matter what I do, the scanner will not detect these files as matching data in the EDM based SIT labels.  The scanner works fine for other labels I have and I have made sure that the labels are all published to the purview scanner service logon account (which has been granted an E5 license). 

So, in a nutshell...the on prem scanner works great for all labels and sit that are NOT EDM based...yet documentation and asking copilot if the on-prem scanner can do EDM SIT based labelling, I am told it SHOULD work.

What am I missing here?

All insight is appreciated

4 Replies

  • Ankit365's avatar
    Ankit365
    Iron Contributor

    Thomas_Powers​ The MPIP scanner scans files on on-premises data stores like file shares and SharePoint Server libraries. It uses pattern-based detection methods, including regular expressions and keyword dictionaries, to identify sensitive information. However, EDM SITs rely on matching exact values from a hashed data set, a process that the on-premises scanner cannot handle.

    https://learn.microsoft.com/en-us/purview/deploy-scanner-supported-sits

    Workaround (which we use): To achieve similar detection capabilities on-premises, consider the following approach:
    Create a Custom SIT Using Regular Expressions or Keyword Dictionaries:
    Develop a custom SIT that matches patterns resembling your sensitive data with regular expressions or keyword lists.
    For example, if you're trying to detect account numbers, create a regex pattern that matches the format of those numbers.
    Deploy the Custom SIT with the MPIP Scanner:
    Configure the MPIP scanner to use this custom SIT to scan on-premises files.
    Ensure the scanner is properly set up, and the custom SIT is included in the labeling policy assigned to the scanner's service account.

    Maintain EDM SITs for Cloud-Based Scenarios: Continue using EDM SITs for cloud environments where they are supported, such as Exchange Online, OneDrive, and SharePoint Online.

     

  • Thomas_Powers's avatar
    Thomas_Powers
    Copper Contributor

    Just double checked...our EDM SIT uses the US Bank Account number as the primary match...so it should work for the on prem...yes?

     

    • Thomas_Powers's avatar
      Thomas_Powers
      Copper Contributor

      OK...but it also tags on custom SITs that we make with regex expressions or keyword dictionaries...and that works great.

      Our EDM SIT is using the US Bank Account number as the primary match...with the EDM as a list of our account numbers.  is this supposed to then be det4ected by the scanner since it's part of the listed SIT?