Forum Discussion
SharePoint: Report to fetch content older than X amount of time
Sounds like a nice challenge.
I would first try to use the Content search in the Compliance Center as Chris Webb suggested. Interesting to see if it can handle such a large data volume. Your tenant may hold several 100 million documents and depending on the way it is used you may get 10's of million of "stale" documents.
Hopefully there are no "blind spots" in the search index where part of the content has not been indexed.
If the above OOTB method does not work you will need to look at alternative approaches.
We use a node.js application to update metadata for SharePoint documents. It loops over all sites and libraries and selects documents using a CAML query, downloads the documents, extracts the properties (key word, created date within the document, sent date email, ...) and then updates the SharePoint column(s). The key challenges you face are:
- use credentials that have access to all sites
- data volume: you will need to be able to scale out (multiple threads, multiple systems)
- handle list view threshold
- handle throttling
Your case is a bit simpler than our case: you can already stop after executing the CAML query.
In short, use the OOTB features from Compliance Center. If that does not work estimate the effort to perform the task successful and whether it can be justified. Potential drivers are most likely compliance or storage costs.
good luck