Forum Discussion
How to find files over certain limit in SharePoint Library with hundreds of folders
Dorje-McKinnon I just get the "something went wrong" after saving the new view. I think we have too many files. Anytime I try and use that File Size column it craps out.
- luvsqlJul 27, 2021Iron Contributor
I added the item limit to only 4900 and still receive the something went wrong. What I'm puzzled about is, SharePoint support hundreds of thousands of files in a library and also supports with good performance 300,000 files limit for sync to OneDrive. Why can't we filter or search or create views on libraries of this size without it failing if it supposed to store a lot more than 5000?
- Dorje-McKinnonJul 29, 2021Iron Contributor
Hi luvsql
The 5000 limit is really a legacy issue because SharePoint used to be a web front end to a SQL database using on premise servers.
This legacy 5000 item issue still exists, I believe because code from the good old days, was used for SharePoint online.
You're right libraries can hold 20million files (last time I checked the limits).
What the 5000 item limit means in practice is that when you have VERY big libraries you have to use PowerShell and PAGINATE or PAGE the results, hence the code mentioned in this conversation has the
Get-PnPListItem -List $List -PageSize 500
What that tells PowerShell is get the first 500 items, then follow the generated link to the next 500 items until there are no more items to get. If there are 10,000 items in the library the script will run (10,000 / 500 = 20) 20 times and get all the items.
If the script didn't have -PageSize 500 then it would get the most it can by default, which I think is 100.
I know this seems weird but Microsoft does this to reduce the cost (processor time and electricity) of us running scripts.
As users we just have to work with this foible to get what we need.
- Paul_HK_de_JongJul 30, 2021Iron ContributorMinor addition. The 5000 item limit is not a legacy thing. It is caused by the lock escalation mechanism in SQL server. It is one of the top requests on user voice (https://sharepoint.uservoice.com/forums/329214-sites-and-collaboration/suggestions/8463457-remove-the-list-view-threshold-5000-by-default). It basically it means we have to deal with it. For example, by creating indexes, requesting data in sets, ...
- luvsqlJul 28, 2021Iron Contributor
I'm trying the powershell, however since we use MFA, I had to first use Connect-SPOService -Url https://mytenant-admin.sharepoint.com which then prompted me to login and authenticate, but then using this script asked me to login again and my admin account won't work:
#Config Variables
$SiteURL = "https://mytenant.sharepoint.com/Estimating"
$CSVFilePath = "C:\Temp\LargeFiles.csv"
#Connect to PnP Online
Connect-PnPOnline -Url $SiteURL -Credentials (Get-Credential)
#Get all document libraries
$FileData = @()
$DocumentLibraries = Get-PnPList | Where-Object {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $False}
#Iterate through document libraries
ForEach ($List in $DocumentLibraries)
{
Write-host "Processing Library:"$List.Title -f Yellow
#Get All Files of the library with size > 100MB
$Files = Get-PnPListItem -List $List -PageSize 500 | Where {($_.FieldValues.FileLeafRef -like "*.*") -and ($_.FieldValues.SMTotalFileStreamSize/1MB -gt 100)}
#Collect data from each files
ForEach ($File in $Files)
{
$FileData += [PSCustomObject][ordered]@{
Library = $List.Title
FileName = $File.FieldValues.FileLeafRef
URL = $File.FieldValues.FileRef
Size = [math]::Round(($File.FieldValues.SMTotalFileStreamSize/1MB),2)
}
}
}
#Export Files data to CSV File
$FileData | Sort-object Size -Descending
$FileData | Export-Csv -Path $CSVFilePath -NoTypeInformation- Dorje-McKinnonJul 28, 2021Iron Contributor
HI luvsql ,
sorry to hear you've run into an issue.
My work place has MFA setup as well.
So what I do is change the following line to use -useweblogin
Connect-PnPOnline -Url $SiteURL -useweblogin
This should popup a web browser window allowing you to logon using your admin account AND then put in your MFA code or click the MFA app on your phone.
Everything else should then run.
The -useweblogin option only works for scripts you run. You have to use other methods if you want to set the script up to run on it's own at a scheduled time. But from what you said I think you'll be running this script in person.
Dorje
- Dorje-McKinnonJul 25, 2021Iron Contributor
Hi luvsql ,
SteveKnutson is right the thing to do will be to add a filter e.g. 4900 then because of the ordering you may get the largest files at the top.
I say "may" because I'm not sure if the SharePoint views go through all the files first, find the largest ones according to the order by rules you've set, then show you the first 4900 of them.
ORif the view grabs the first 4900 files according to it's own rules, then orders those by their size.
The other route you could take is as Paul_HK_de_Jong said, use a PowerShell script.
I understand your comment about needing one that is pre built.
on the following page
https://www.sharepointdiary.com/2018/11/sharepoint-online-find-all-large-files-using-powershell.htmlThere are two scripts.
The lower one "PnP PowerShell: Find Large Files in SharePoint Online Site", is the simplest it can find.
It will get you a list of files and their sizes for every library in the site you point it at. So more than you want but you'll be able to filter the spreadsheet export easily enough.
If you use the following script it will work on just one library
but you need to add one row to it to get the file size
AGain the last script on the page called "Get All Documents Inventory from a SharePoint Online Document Library and Export to a CSV" will give you almost what you want.
You just need to insert the following between row 35 and 36 so that the following :
ModifiedByEmail = $Item["Editor"].Email})Changes into
ModifiedByEmail = $Item["Editor"].EmailSize = [math]::Round(($File.FieldValues.SMTotalFileStreamSize/1MB),2)})You will need to change the parameters in rows 2 , 3, 4 so that they refer to your site library and export file location. NOTE the library row will be the name of the library not the URL I think. What I mean is "Documents" (the name of the library) rather than the URL for that library "Shared Documents".