How to find files over certain limit in SharePoint Library with hundreds of folders

Regular Contributor

It's not possible to add the file size to the library column as I would have to open every possible folder and our hierarchy is crazy.  We are not at our limit, however, I know that many users are uploading large attachments and not overriding with newer versions so we are using way more than we need.

 

I know storage metrics shows total size, however, we still have to spend hours opening every single folder > subfolder > subfolder > subfolder then back etc.  


I checked the Audit Log Search but there isn't anything for Search.  We do not use OneDrive client for this library as there are too many files to support it.

 

There must be a way?

14 Replies

Hi Mercedes,

In Site Settings > Storage Metrics you can see the size of each library and clicking through to folders to see where large folders are.

Hi @Mercedes Green 

You could also create a new View within the library, Save All Items as File Size.

Then edit the properties of this new view to be as follows

  • Columns
    • Tick to display File Size
  • Sort
    • First sort by the column , change to File Size
    • Choose : Show items in descending order
  • Folders
    • Show all items without folders

filesize-view.png

 

This view will then show your biggest files at the top of the list.

To show files over a certain size, you would have to add a calculated column I think.

 

@Mercedes Green 
Have you considered using PowerShell scripts to harvest the information of interest? e.g. get a complete list of all documents and their properties in a document library.
Another option is to look for a reporting tool. There are apps that provide this capability (e.g. here) plus more. For example, find encrypted documents, checked-out documents, documents with versions consuming storage space, ...
Paul

Thank you all for the suggestions.  I shall try them out.  My experience with Powershell is copying and pasting existing scripts so not sure I could figure out how to do that.

@Dorje McKinnon I just get the "something went wrong" after saving the new view. I think we have too many files. Anytime I try and use that File Size column it craps out. Untitled.png

This will happen if the view returns more than 5000 items. Try adding a filter to the view to reduce the number of results - https://support.microsoft.com/en-us/office/use-filtering-to-modify-a-sharepoint-view-3d8efc52-0808-4...

Hi @Mercedes Green ,  

@Steve Knutson is right the thing to do will be to add a filter e.g. 4900 then because of the ordering you may get the largest files at the top. 

I say "may" because I'm not sure if the SharePoint views go through all the files first, find the largest ones according to the order by rules you've set, then show you the first 4900 of them.
OR

if the view grabs the first 4900 files according to it's own rules, then orders those by their size.

 

The other route you could take is as @Paul_HK_de_Jong said, use a PowerShell script.
I understand your comment about needing one that is pre built.

on the following page
https://www.sharepointdiary.com/2018/11/sharepoint-online-find-all-large-files-using-powershell.html

There are two scripts.

The lower one "PnP PowerShell: Find Large Files in SharePoint Online Site", is the simplest it can find.

It will get you a list of files and their sizes for every library in the site you point it at. So more than you want but you'll be able to filter the spreadsheet export easily enough.

 

If you use the following script it will work on just one library

https://www.sharepointdiary.com/2018/08/sharepoint-online-powershell-to-get-all-files-in-document-li...

but you need to add one row to it to get the file size

AGain the last script on the page called "Get All Documents Inventory from a SharePoint Online Document Library and Export to a CSV" will give you almost what you want.

You just need to insert the following between row 35 and 36 so that the following :

            ModifiedByEmail    = $Item["Editor"].Email
        })

 

Changes into 

 

            ModifiedByEmail    = $Item["Editor"].Email
            Size = [math]::Round(($File.FieldValues.SMTotalFileStreamSize/1MB),2)
        })

 

You will need to change the parameters in rows 2 , 3, 4 so that they refer to your site library and export file location. NOTE the library row will be the name of the library not the URL I think. What I mean is "Documents" (the name of the library) rather than the URL for that library "Shared Documents".

 

 

I added the item limit to only 4900 and still receive the something went wrong. What I'm puzzled about is, SharePoint support hundreds of thousands of files in a library and also supports with good performance 300,000 files limit for sync to OneDrive. Why can't we filter or search or create views on libraries of this size without it failing if it supposed to store a lot more than 5000?

I'm trying the powershell, however since we use MFA, I had to first use Connect-SPOService -Url https://mytenant-admin.sharepoint.com which then prompted me to login and authenticate, but then using this script asked me to login again and my admin account won't work:

#Config Variables
$SiteURL = "https://mytenant.sharepoint.com/Estimating"
$CSVFilePath = "C:\Temp\LargeFiles.csv"

#Connect to PnP Online
Connect-PnPOnline -Url $SiteURL -Credentials (Get-Credential)

#Get all document libraries
$FileData = @()
$DocumentLibraries = Get-PnPList | Where-Object {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $False}

#Iterate through document libraries
ForEach ($List in $DocumentLibraries)
{
Write-host "Processing Library:"$List.Title -f Yellow

#Get All Files of the library with size > 100MB
$Files = Get-PnPListItem -List $List -PageSize 500 | Where {($_.FieldValues.FileLeafRef -like "*.*") -and ($_.FieldValues.SMTotalFileStreamSize/1MB -gt 100)}

#Collect data from each files
ForEach ($File in $Files)
{
$FileData += [PSCustomObject][ordered]@{
Library = $List.Title
FileName = $File.FieldValues.FileLeafRef
URL = $File.FieldValues.FileRef
Size = [math]::Round(($File.FieldValues.SMTotalFileStreamSize/1MB),2)
}
}
}
#Export Files data to CSV File
$FileData | Sort-object Size -Descending
$FileData | Export-Csv -Path $CSVFilePath -NoTypeInformation

HI @Mercedes Green ,

 

sorry to hear you've run into an issue.

My work place has MFA setup as well.

So what I do is change the following line to use -useweblogin

 

Connect-PnPOnline -Url $SiteURL -useweblogin

 

This should popup a web browser window allowing you to logon using your admin account AND then put in your MFA code or click the MFA app on your phone.

Everything else should then run.

 

The -useweblogin option only works for scripts you run. You have to use other methods if you want to set the script up to run on it's own at a scheduled time. But from what you said I think you'll be running this script in person.

 

Dorje 

 

 

I can now run the script, however, it's only showing me large folder sizes over 100Mb and not the individual files.  It's also only going 2 folders deep (we have some folders that go 5-6 subfolders).

HI @Mercedes Green 

 

Great that it is working

The following line is that one that will need to change

 

 

#Get All Files of the library with size > 100MB
$Files = Get-PnPListItem -List $List -PageSize 500 | Where {($_.FieldValues.FileLeafRef -like "*.*") -and ($_.FieldValues.SMTotalFileStreamSize/1MB -gt 100)}

 

The # means comment out this line i.e. it tells the programming language to ignore this row

 

So I suggest you change the script so it looks like this , removing the part after the -and  statement

 

 

#Get All Files of the library with size > 100MB
#$Files = Get-PnPListItem -List $List -PageSize 500 | Where {($_.FieldValues.FileLeafRef -like "*.*") -and ($_.FieldValues.SMTotalFileStreamSize/1MB -gt 100)}

#Get all files, of any size
$Files = Get-PnPListItem -List $List -PageSize 500 | Where {($_.FieldValues.FileLeafRef -like "*.*") 

 

I can't see anything in the script which will limit it to just 2 folders deep.

It may be that there are NO files larger than 100MB that exists in folders lower than 2 folders deep.

 

Microsoft have some training about PowerShell

https://docs.microsoft.com/en-us/powershell/sharepoint/sharepoint-online/connect-sharepoint-online?v...

as do a number of prominent SharePoint online authorities

https://www.google.co.nz/search?q=how+to+learn+powershell+for+sharepoint 

 

Everyone who does our kind of work, can get value out of knowing something about PowerShell, so I highly recommend learning a little about it. I prefer dealing with people and the social anthropology side of my job, but there are times when Powershell makes things possible that wouldn't be otherwise.

 

Hi @Mercedes Green 

The 5000 limit is really a legacy issue because SharePoint used to be a web front end to a SQL database using on premise servers. 

This legacy 5000 item issue still exists, I believe because code from the  good old days, was used for SharePoint online.

You're right libraries can hold 20million files (last time I checked the limits).

What the 5000 item limit means in practice is that when you have VERY big libraries you have to use PowerShell and PAGINATE  or PAGE the results, hence the code mentioned in this conversation has the 

Get-PnPListItem -List $List -PageSize 500

What that tells PowerShell is get the first 500 items, then follow the generated link to the next 500 items until there are no more items to get. If there are 10,000 items in the library the script will run (10,000 / 500 = 20) 20 times and get all the items.

If the script didn't have -PageSize 500 then it would get the most it can by default, which I think is 100.

 

I know this seems weird but Microsoft does this to reduce the cost (processor time and electricity) of us running scripts.

As users we just have to work with this foible to get what we need.

 

Minor addition. The 5000 item limit is not a legacy thing. It is caused by the lock escalation mechanism in SQL server. It is one of the top requests on user voice (https://sharepoint.uservoice.com/forums/329214-sites-and-collaboration/suggestions/8463457-remove-th...). It basically it means we have to deal with it. For example, by creating indexes, requesting data in sets, ...