Forum Discussion

ellan1537's avatar
ellan1537
Iron Contributor
Jul 24, 2023

SharePoint: Report to fetch content older than X amount of time

Hello all,

 

My M365 tenant has content size of our 1.7 PB with close to 90,000 site collections. My requirement is to fetch the report of content (should crawl through all document libraries in all sites) that is older than 3 years. Report should include document type, filename, size, created, created by, modified and modified by details.

 

Tried to use ShareGate for this purpose by utilizing custom reports but considering amount of sites/size, the program is moving to non-responding state after few minutes of task kick-off. Tried SharePoint PowerShell but no luck.

 

Is there any other way that this report can be generated? Any help is greatly appreciated!

  • Should be able to utilize the compliance center and do a content search with Conditions set for dates, Then you can export the "Results" Excel file with all the information. You will need Global admin or elevated role for this obviously but that's the only option I know of that's easy to do.
  • Vivek Jagga's avatar
    Vivek Jagga
    Copper Contributor
    Hi ellan1537,

    My thought here is to use Search API to get such info, but not to execute on all. Run script in such a way so that it will run for a site max in memory.
    • Paul de Jong's avatar
      Paul de Jong
      Iron Contributor
      Using search api gives perhaps more control than using Content Search from Compliance Center but requires more knowledge of the search api.
      Using search API has pros and cons. I personally think it is great. It can be used in many different applications (PowerShell, csom, node.js, ...) but assumes that the content has been indexed and there are no blind spots plus the user must have at least read access.
      Also make sure to iterate over the search results because they will only be returned in sets of 500 items. Also make sure the trimduplicates is set to false.
      Search will only return the latest version. All the document versions are not exposed.
  • Paul de Jong's avatar
    Paul de Jong
    Iron Contributor

    ellan1537 

    Sounds like a nice challenge.

    I would first try to use the Content search in the Compliance Center as Chris Webb suggested. Interesting to see if it can handle such a large data volume. Your tenant may hold several 100 million documents and depending on the way it is used you may get 10's of million of "stale" documents.
    Hopefully there are no "blind spots" in the search index where part of the content has not been indexed.

    If the above OOTB method does not work you will need to look at alternative approaches.

    We use a node.js application to update metadata for SharePoint documents. It loops over all sites and libraries and selects documents using a CAML query, downloads the documents, extracts the properties (key word, created date within the document, sent date email, ...) and then updates the SharePoint column(s). The key challenges you face are:
    - use credentials that have access to all sites
    - data volume: you will need to be able to scale out (multiple threads, multiple systems)
    - handle list view threshold
    - handle throttling

    Your case is a bit simpler than our case: you can already stop after executing the CAML query.

     

    In short, use the OOTB features from Compliance Center. If that does not work estimate the effort to perform the task successful and whether it can be justified. Potential drivers are most likely compliance or storage costs.

    good luck

  • acoliva's avatar
    acoliva
    Copper Contributor

    ellan1537 

    Were you able to run a successful report? I am looking for a similar report to demonstrate the impact of setting up a retention policy that will delete files older than 5 years old and it is challenging to get data for the whole tenant. Reading through the suggestions for the Compliance Center (now Purview) the audit only allows 180 days for the report range so that's already a big limitation on time. And I don't see anything specific to content for a last modified date. Would love to hear if you were able to get the data successfully.

    • ellan1537's avatar
      ellan1537
      Iron Contributor
      No direct way of fetching the report. I'm still investigating for solution
      • Chopper520's avatar
        Chopper520
        Copper Contributor

        We have a 10yr retention policy and would like to warn users when the retention is met with an alert and/or pull a list of what's about to be deleted soon and warn them.

         

        Some of the files have started deleting by system account as they retentions been met.