Forum Discussion
Auto Labeling Policy Delay for Old Files (Exsisting Files)
Hi Everyone,
We are observing a difference in auto labelling policy behaviour in Purview for Sharepoint.
An auto labelling policy has been enabled and scoped to sharepoint with metadata based rule(document creation date or document modification date). The scoped sharepoint only contain 7 unlabeled files that were uploaded before the policy turned on. The policy is working because if i placed any new file after enabling the policy got labelled within about 5 minutes, but the exsisting files are not labeled and remains unlabelled. It seems the new files are evalauated via the near time while exsisting file rely on asychronous mode. Can anyone help explain why exsisting files take longer to be proceesed even when there there are only a few files or share if you faced similar behaviour. This is the test scenario, as we plan to enable the same policy across more than 50 plus sites containing millions of unlabeled files and we want to understand and predict that even though its takes time all exsisting unlabeled files will eventually will be labelled. This is very crucial, so please helo us understand this behaviour.
Regards,
BanuMurali
3 Replies
- DerekMorgan2Brass Contributor
Hi BanuMurali — this behavior is fairly common and usually comes down to how Purview evaluates new vs existing content.
New or modified files are evaluated quickly because they trigger the active/continuous classification path (upload, create, modify). Existing files are evaluated via a background/asynchronous process, which can be delayed and isn’t deterministic in terms of timing.
One thing to double-check: you mentioned using created date / modified date as conditions. That pattern aligns more closely with auto-apply retention label policies (metadata/KQL-based) than with sensitivity label auto-labeling, which typically relies on Sensitive Info Types, classifiers, or EDM. If the rule is metadata-only, existing files may not be re-evaluated promptly unless something explicitly triggers a scan.
Microsoft has also started addressing this gap with on-demand classification for SharePoint and OneDrive, which allows admins to scan data at rest instead of waiting on user activity.
If you’re willing, it would help a lot to see your auto-label policy configuration (rule type + scope). Even a redacted screenshot or short summary would help confirm whether this is expected async behavior or a rule/scope mismatch.
A few quick clarifiers:
- Are you auto-applying a sensitivity label or a retention label?
- Did you run the policy in simulation, and do the older files show as matches?
- Are the files standard Office docs in a regular document library?
References
- Protecting sensitive information in the era of AI with Microsoft Purview Information Protection | Microsoft Community Hub
- Simulation mode for retention labels | Microsoft Purview
- https://learn.microsoft.com/en-us/purview/apply-sensitivity-label-automatically?view=o365-worldwide&tabs=apply-label
- BanuMuraliBrass Contributor
Hi DerekMorgan2,
Thanks for the explanation
Clarification as requested:
Are you auto-applying a sensitivity label or a retention label? - Applying a sensitivity label via an auto-labeling policy.
Did you run the policy in simulation, and do the older files show as matches? - Yeah, I ran it in simulation mode, and it's showing the old files (all 7 of them).
Are the files standard Office docs in a regular document library? - yes, standard files format- .docx, .xlsx, .pptx, .pdf
The auto‑labeling policy I configured uses the rule:
Document creation date on or after 1/1/1960
OR
Document modification date on or after 1/1/1960, without any SITs or trainable classifiers, and with manual label replacement disabled so that only unlabeled files are classified.
My objective is to apply blanket classification to unlabeled files in scoped SharePoint sites.
In my test site containing 13 files (7 unlabeled: 3 .docx, 2 .pdf, 1 .xlsx, and 1 .pptx), the initial run labeled only the 2 .pdf files after about 5 hours, while the remaining Office files were not labeled until I toggled the policy off and back on, which then took nearly 35 hours to process remaining 5 files.
When I repeated the test in a new site with the same configuration ( rule: document creation date on or after 1/1/1960 OR document modification date on or after 1/1/1960), till now none of the 7 unlabeled files were labeled even after more than 24 hours (it was turned on Apr 26), though newly added files were labeled in near real time (within 5 minutes), and the simulation correctly identified the unlabeled files before activation.
This delay in labeling existing files is concerning, especially as we plan to expand the policy across 50+ sites to classify millions of unlabeled SharePoint files, and we would appreciate guidance on why this latency occurs and how to address it.
kindly let me know if you require any clarifications.
Regards,
BanuMurali
- DerekMorgan2Brass Contributor
Hi BanuMurali — thanks for the detailed follow‑up.
Based on your testing, the policy itself is working as expected (simulation matches, new files label quickly). The issue is execution on existing SharePoint content, which appears to be asynchronous and not time‑deterministic.
Your results — PDFs labeling earlier, Office files only labeling after a policy toggle, and a second site showing no retroactive labeling after 24+ hours — point to service‑side background processing, not a configuration problem. Simulation confirms eligibility, not when existing files will actually be labeled.
For a rollout across many sites and millions of files, I’d plan on eventual consistency for existing content and avoid assuming there’s a reliable way to force timing (policy toggles included).
Given the impact and the non‑deterministic behavior you’re seeing, it would be reasonable to open a Microsoft support case to confirm whether there are backend constraints or throttles affecting retroactive labeling in your tenant.
Your concern is valid — and your testing highlights why expectations need to be set carefully before scaling this out. Once I work things out with my demo tenant, I will try and reproduce what you have 🤞🏾