Auto-Label Simulation does not simulate your rules exactly

Question

When you’re building an auto-labeling rule and run a simulation, don’t expect it to fully follow your rule. Let me explain.
It doesn’t evaluate everything. For example, if your rule says a document must match at least four regex patterns to count as a positive find, the simulation might treat a single match as a positive.
Yeah, that’s frustrating.
Here’s what works better:

Build your Sensitive Information Type (SIT) and test it against individual documents first.
Then create a policy that targets a small subset of data.
Run the simulation, then turn on the policy.
Check the results in Activity Explorer, which shows real production activity.

Why can’t the simulation just run the full rule? Good question—we all wish it did.

ankit365 · Answer

ou are exactly right, and that limitation is still true as of December 2025. The auto-labeling simulation in Microsoft Purview is designed as a lightweight pre-validation tool, not a full evaluation engine. During a simulation, Purview checks whether the policy can discover potentially matching content but does not process every condition or confidence level in your rule.

In technical terms, the simulation engine skips some of the deeper evaluation steps used by the production labeling service. It looks mainly for a quick presence of a Sensitive Information Type or classifier and reports any potential matches. That means it might treat one regex hit or keyword as a positive result, even if your rule requires multiple occurrences or specific supporting evidence such as proximity or confidence thresholds. The same applies to combined conditions like multiple SITs, file property filters, or “instance count” rules.

Microsoft cites performance and safety as the reasons for this behavior. Simulations are meant to give you an approximate view of which content might be affected without actually applying encryption or labels. Running the full rule logic, including multi-match validation and scoring, would make simulations as resource-intensive as production runs, potentially delaying scans across entire SharePoint or OneDrive environments.

That’s why the best practice remains what you described. Test your Sensitive Information Types individually in the Content Explorer or SIT testing tool. Then publish your policy to a small pilot scope and observe real-time labeling activity in Activity Explorer or Data Classification reports. These production results use the full detection and scoring engine, so they reflect how your rules truly behave in enforcement mode.

So yes, the simulation is only a preview, not a precise replica of your auto-labeling rules, and at present there is no configuration to make it run the complete rule logic. Please hit like if you like the solution.

Forum Discussion

Auto-Label Simulation does not simulate your rules exactly

1 Reply