Jul 24 2023 08:04 AM
Hello all,
My M365 tenant has content size of our 1.7 PB with close to 90,000 site collections. My requirement is to fetch the report of content (should crawl through all document libraries in all sites) that is older than 3 years. Report should include document type, filename, size, created, created by, modified and modified by details.
Tried to use ShareGate for this purpose by utilizing custom reports but considering amount of sites/size, the program is moving to non-responding state after few minutes of task kick-off. Tried SharePoint PowerShell but no luck.
Is there any other way that this report can be generated? Any help is greatly appreciated!
Jul 24 2023 09:58 AM
Jul 25 2023 01:09 AM
Sounds like a nice challenge.
I would first try to use the Content search in the Compliance Center as Chris Webb suggested. Interesting to see if it can handle such a large data volume. Your tenant may hold several 100 million documents and depending on the way it is used you may get 10's of million of "stale" documents.
Hopefully there are no "blind spots" in the search index where part of the content has not been indexed.
If the above OOTB method does not work you will need to look at alternative approaches.
We use a node.js application to update metadata for SharePoint documents. It loops over all sites and libraries and selects documents using a CAML query, downloads the documents, extracts the properties (key word, created date within the document, sent date email, ...) and then updates the SharePoint column(s). The key challenges you face are:
- use credentials that have access to all sites
- data volume: you will need to be able to scale out (multiple threads, multiple systems)
- handle list view threshold
- handle throttling
Your case is a bit simpler than our case: you can already stop after executing the CAML query.
In short, use the OOTB features from Compliance Center. If that does not work estimate the effort to perform the task successful and whether it can be justified. Potential drivers are most likely compliance or storage costs.
good luck
Jul 25 2023 03:53 AM
Jul 25 2023 04:36 AM
Dec 21 2023 12:31 PM
Were you able to run a successful report? I am looking for a similar report to demonstrate the impact of setting up a retention policy that will delete files older than 5 years old and it is challenging to get data for the whole tenant. Reading through the suggestions for the Compliance Center (now Purview) the audit only allows 180 days for the report range so that's already a big limitation on time. And I don't see anything specific to content for a last modified date. Would love to hear if you were able to get the data successfully.
Dec 22 2023 02:43 PM
Jan 29 2024 07:44 AM
We have a 10yr retention policy and would like to warn users when the retention is met with an alert and/or pull a list of what's about to be deleted soon and warn them.
Some of the files have started deleting by system account as they retentions been met.