Forum Discussion
On-Prem Scanner and EDM SIT matching
Thomas_Powers The MPIP scanner scans files on on-premises data stores like file shares and SharePoint Server libraries. It uses pattern-based detection methods, including regular expressions and keyword dictionaries, to identify sensitive information. However, EDM SITs rely on matching exact values from a hashed data set, a process that the on-premises scanner cannot handle.
https://learn.microsoft.com/en-us/purview/deploy-scanner-supported-sits
Workaround (which we use): To achieve similar detection capabilities on-premises, consider the following approach:
Create a Custom SIT Using Regular Expressions or Keyword Dictionaries:
Develop a custom SIT that matches patterns resembling your sensitive data with regular expressions or keyword lists.
For example, if you're trying to detect account numbers, create a regex pattern that matches the format of those numbers.
Deploy the Custom SIT with the MPIP Scanner:
Configure the MPIP scanner to use this custom SIT to scan on-premises files.
Ensure the scanner is properly set up, and the custom SIT is included in the labeling policy assigned to the scanner's service account.
Maintain EDM SITs for Cloud-Based Scenarios: Continue using EDM SITs for cloud environments where they are supported, such as Exchange Online, OneDrive, and SharePoint Online.