Forum Discussion

ANAND_SUNKA's avatar
ANAND_SUNKA
Brass Contributor
Oct 10, 2025

Custom SITs fine tunned in MIP

Hello Everyone,

Currently working on MS Purview Solutions greenfield deployment project for one of the customer for on-premise data and M365 data.

I have created few custom SITs classifiers with regex pattern in the MIP portal almost 3 months ago and it's classifying the data as expected but with some false positives. 

All of them are fine-tuned to prevent false positives. It's scanning and classifying the newly created M365 data as expected. 

However it's not reclassifying the previously classified false positive data. How can I forcefully rescan/reclassify the false positive M365 data.

I just want to reclassify the data with fine-tuned custom SITs to correctly classify the data before labelling.

 

One more question related to on-prem scanners.
I have started the on-prem scanners to scan all the SharePoint sites and Fileshares for any sensitive information.
Initially its ran full scan and later it's started as incremental scan.
Above scan started before creating the custom SITs and labels.

Now I want run a full scan just to classify the data with recommending the labels based on the sensitive data instead of enforcing and applying the label.

Can someone throw some light which options need to be select for just recommending the label instead of applying the label.

Current configuration as shown below: 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Any help really appreciated.

 

 

Regards

Anand Sunka

 

1 Reply

  • Ankit365's avatar
    Ankit365
    Iron Contributor

    This is a great question and a very common scenario when fine-tuning custom sensitive information types in Microsoft Purview, especially during a greenfield deployment that mixes Microsoft 365 and on-premises data sources. As of October 2025, the behavior you are seeing is expected and there are clear steps to address it.

    When you uppdate or fine-tune custom SITs, Microsoft Purview does not automatically reclassify previously scanned or labeled content. Classification is event-driven, meaning it happens during file creation, modification, or during a new content scan. For already classified items, reclassification only occurs if the content changes or if you trigger a fresh scan that includes those files. For Microsoft 365 data, the best approach is to perform a content re-evaluation through the Compliance Portal. You can do this by running a Data Classification Content Search or by temporarily toggling the policy publishing status for the relevant auto-labeling policy and republishing a modified policy forces Microsoft 365 to re-evaluate all items in the defined scope. Another approach is to manually trigger a re-indexing job through the Search & Compliance APIs or by modifying and saving the label policy again.

    For your on-premises scanners, if your goal is to rescan and generate label recommendations only instead of enforcing labels, you should keep “Treat recommended labeling as automatic” set to Off and “Enforce sensitivity labeling policy” set to Off. Ensure “Label files based on content” is enabled, allowing the scanner to evaluate your custom SITs and recommend labels based on the defined patterns. This configuration will cause the scanner to discover and recommend sensitivity labels without automatically applying them. You can then review the results in the Purview portal or in your reports.

    Since you ran a full scan before creating the new SITs, you should now trigger another full scan so that the scanner re-indexes and evaluates all files with the new classifiers. Set the schedule to Manual or create a new scan job, and ensure the cluster targets your intended repositories. After saving, restart the scan agent. The new scan will pick up your fine-tuned SITs and surface recommendations accordingly.

    In short, Microsoft 365 content requires a re-publishing or re-indexing action to trigger reclassification. Additionally, your on-premises scan must be set to perform a new full scan with enforcement turned off and content labeling enabled for recommendations.

    Please hit like if you like the solution.

Resources