04-30-2019 10:31 AM
04-30-2019 10:31 AM
We currently work with a 5TB network drive with 2,400,000 files, adding 15,000 files per month. The drive is organized with one folder per year, then one subfolder per month, etc.
I was looking into migrating the network drive to one SharePoint library with on demand files, but I found this page that says "for optimum performance we recommend syncing no more than 300,000 files across all document libraries".
So... if I can't use one library and I can't split it in multiple libraries, how do I organize it?
Is SharePoint not the right tool?
04-30-2019 10:43 AM
@stenci The key word in the statements is SYNCING - don't use OneDrive sync client to pull down libraries where the combined total of items is over 300000 items as that is a known performance issue. Document libraries in SharePoint Online are built to handle large amounts of content - but there are best practices and set up to consider before migrating a large number of files to a single library.
04-30-2019 11:31 AM
@Timothy Balk Smart caching / on demand file sync is a requirement, so SharePoint is not the right tool for us?
Is it possible to configure the library with the sync on demand for the last year and use the hand picked sync for the older folders?
For example we could keep available with on demand caching the last one year, or 180,000 files, and the older stuff, without on demand caching, will only be available if the user manually syncs the folder?
04-30-2019 12:42 PM
04-30-2019 12:51 PM
@stenci It's all in your user case. Probably provide more details regarding the why this large of a sync is needed.
Honestly, for syncing 2 million plus files I wouldn't consider a valid use case for it because other technologies O365 provides should be considered first. Because they will provide a more robust solution when engineer properly that will scale than this brute force use SharePoint as a dumping ground.
04-30-2019 12:52 PM - edited 04-30-2019 12:53 PM
What you'll need is to structure of libraries/folders so that you don't have to "Sync" or access folders/files that may go over the limit. You'll need to take into consideration how many folders/files do you produce in a year, month, etc. separating them as libraries if necessary.
Also, another storage may be more suited if you think files/projects are no longer going to be accessed.
04-30-2019 01:38 PM
Thanks for your comment. Here is a few more details:
Every month we have about 5 new projects, each with 1,000 to 50,000 files, most of them are CAD drawings, plus a small percentage of PDF, JPG and Excel files (with VBA macros, they cannot be used with Office 365). I would say 5,000 to 30,000 files per month are added.
Each project lasts 3 to 12 months. We have about 20 projects alive at any given time that require syncing for 2-300,000 files. The syncing is required because we have many CAD, CAM, PLM tools that look for the files on the file system.
After a project is completed we could archive it. After that we will access it only if the client looks for a spare part or an addition. No automatic syncing is required here, something like a manual un-archiving would work just fine.
A solution with two areas would work:
- one for the live projects with the on demand file syncing; this would still play well with our tools
- one for archived projects that is not automatically synced; we would need a way to bring the project back to life if requested
This would be an acceptable compromise to work around the 300,000 file limitation.
Redesigning an infrastructure with all the software tools that rely on files being available on a network drive is unthinkable.
04-30-2019 01:40 PM
04-30-2019 02:10 PM
04-30-2019 03:41 PM
I just found this that talks about new features being added to the synchronization system that will allow to pick which folders will be visible or not to the syncing engine.
This could be the solution to my problem... when it becomes available?
05-01-2019 01:23 AM
05-01-2019 07:00 AM
Due to large amount of data we may dig into the below problems.
1) Maintenance of Large amount of database
2) Consider long backup time
3) Search / Indexing shall complete with long delay.
4) Search result may inappropriate due to non indexing.
Its always better to categories and maintain required folder hierarchy for your repositories. By dividing docs in to multiple categories, store your document to multiple sites / libraries. In any technology, instead of maintaining large database, its better to divide into multiple chunks. SharePoint fast search have proved better results for regular users.
05-01-2019 07:00 AMSolution
@stenci To get around the 300000 file limit it's an issue where you have to throw hardware at it. Balance the load across multiple machines using multiple clients.
SharePoint may not be the best place for this. As I said it sounds like this process just using SharePoint as a dumping ground.
I would say that if you're required to use SharePoint - this is incurring a lot of technical debt because of the need to use OneDrive client to sync contents as if they were in a local drive. Re-engineer the process to work for the place where you are storing the files there are plenty of different ways to automate the file upload process. I would also be mindful of how many files that are created that are like versioned by adding something to the end of the file because that could be de-duped by using versioning in SharePoint.
Whatever the outcome may be document what is going on because IMO - this isn't a process or development that I would want to come into having to figure out what is going on with it.
05-01-2019 08:02 AM
05-01-2019 11:48 AM
I agree with a lot of responses here. SharePoint isn't a direct replacement for a file share. If you were going to use SharePoint as your document management system for sharing and collaborating directly on documents, I could see it being a potential option if it had a solid information architecture.
However, it does seem like just trying to get rid of a file share and use SharePoint instead. I would recommend against this, and it's not the most cost effective storage.
05-01-2019 12:08 PM
> it would make sense to split the content into multiple libraries, rather than a single library
I am trying to consider different options:
- More smaller libraries or fewer larger libraries?
- Is there a maximum number of libraries?
05-01-2019 02:35 PM