OneDrive Client, Files on Demand and Syncing large libraries

Iron Contributor

I thought I'd post some observations regarding the OneDrive sync client we've observed that aren't documented anywhere but we needed to figure out when planning a massive move to SharePoint from on-premise file servers:

 

Limits:

 

Microsoft documents that you shouldn't sync more than 300,000 files across all libraries that the client is connected to, but there was no documentation about Files on Demand limits, and we have observed the following:

 

The OneDrive client will fail when the dat file that stores object metadata reaches exactly 2GB in size (%localappdata%\Microsoft\OneDrive\settings\Business1). Now, while Microsoft says you shouldn't sync more than 300,000 files, you can connect using files on demand to libraries that contain more than this. The trick here is that in this case, the total number of files and folders matter, lets call them collectively "objects". (Interestingly, when you first connect to a library and the client says "Process changes" and gives you a count, "changes" is the total number of objects in the library that it's bringing down using files on demand and storing in the dat file.)

 

My suspicion is that since the OneDrive client is still 32bit, it's still subject to certain 32bit process restrictions, but I don't really know. What matters in this case is that up until build 19.033.0218.0009 (19.033.0218.0006 insiders build), the client would fill up the dat file and reach the 2GB limit after about 700-800,000 objects. After build 19.033.0218.0009, it appears that the client has been optimized and no longer needs to store quite as much metadata about each object, "increasing" the upper limit of files on demand. (It seems that in general, each object takes up just over 1KB of data in the dat file, putting the limit somewhere just under 2 million objects). Keep in mind, this is not per library, this is across all libraries, including OneDrive for Business (personal storage), SharePoint Document Libraries, etc.

 

Performance:

 

The client has made some significant improvements in performance quickly as they refine each new build, but there are some things to be aware of before you start connecting clients to large libraries:

 

It. takes. forever. 

 

The more objects in a library, the longer it's going to take for the client to build it's local cache of files on demand copies of all the items in the library. It seems that in general, the client can process about 50 objects per second, if you were connecting to a library or multiple libraries that had 1.4 million objects, it will take around 8 hours before the client is "caught up".

 

During the time that the content is being built out locally, Windows processes will also consume a large quantity of system resources. Specifically, explorer.exe and the Search Indexer will consume a lot of CPU and disk as they process the data that the client is building out.

 

The more resources you have, the better this experience will be. On a moderately powered brand new Latitude with an i5, 8GB of Memory and an SSD OS Drive, the machine's CPU was pretty heavily taxed (over 80% CPU) for over 8 hours connecting to libraries with around 1.5 million objects. On a much more powerful PC with an i7 and 16GB of memory, the strain was closer to 30% CPU, which wouldn't cripple an end user while they wait for the client and Windows to finish processing data. But, most organizations don't deploy $2000 computers to everyone, so be mindful when planning your Team-Site automount policies.

 

Restarts can be painful. when the OS boots back up OneDrive has to figure out what changed in the libraries in the cloud and compare that to it's local cache. I've seen this process take anywhere from 15 minutes to over an hour after restarts, depending on how many objects are in the cache.

 

Also, if you're connected to a large number of objects in the local cache, you can expect OneDrive to routinely use about a third of CPU on an i5 processor trying to keep itself up to date. This doesn't appear to interfere with the overall performance of the client, but it's an expensive process.

 

Hopefully over time this will continue to improve, especially as more organizations like mine move massive amounts of data up into SharePoint and retire on premise file servers. If I had to make a design suggestion or two:

 

- If SharePoint could pre-build a generic metadata file that a client could download on first connection, it would significantly reduce the time it takes to set up a client initially.

- Roll the Activity Log into an API that would allow the client to poll for changes since the last restart (this could also significantly improve the performance of migration products, as they wouldn't have to scan every object in a library when performing delta syncs, and would reduce the load on Microsoft's API endpoints when organizations perform mass migrations)

- Windows to the best of my knowledge doesn't have a mechanism to track changes on disk, i.e. "what recursively changed in this directory tree in the last x timeframe", if it were possible to do this, Windows and SharePoint could eliminate most of the overhead that the OneDrive client has to shoulder on it's own to keep itself up to date.

 

Speaking to OneDrive engineers at Ignite last year, support for larger libraries is high on their radar, and it's apparent in this latest production release that they are keeping their word on prioritizing iterative improvements for large libraries. If you haven't yet started mass data migrations into SharePoint, I can't stress enough the importance of deeply analyzing your data and understanding what people need access to and structuring your libraries and permissions accordingly. We used PowerBI to analyze our file server content and it was an invaluable tool in our planning.

 

Happy to chat with anyone struggling with similar issues and share what we did to resolve them. Happy SharePointing!

 

P.S., shoutout to the OneDrive Product Team, you guys are doing great, love what you've done with the OneDrive client, but for IT Pros struggling with competing product limits and business requirements, documenting behind the scenes technical data and sharing more of the roadmap would be incredibly valuable in helping our companies adopt or plan to adopt OneDrive and SharePoint.

 

 

69 Replies

@Dustin Adam I am wondering if you could share any updates on your experience with using OneDrive for Business Sync for large libraries. We have a use case where the company is attempting to replace certain network drives used by many users. i.e. For example, a drive with 5,000+ Folders and 7,500+files

 

Would you be able to offer any thoughts or input on such plan looking forward into 2020? Any input would be greatly appreciated!

We also work with Microsoft on the issue and here is the last insight I can share.

Microsoft Support engineer answer:
“I understand that the OneDrive sync has performance issues if we try to sync large libraries due to which we recommend syncing no more than 300,000 files across all document libraries. Also, Performance issues can occur if you have 300,000 items or more across all libraries that you are syncing, even if you are not syncing all items within those libraries”

The second sentence make OneDrive sync useless if we are working with large amount of data. You have to think more then twice how you wan to organise your library... “ex: active vs archive contents” where archived contents cannot be sync.

We personally have issue syncing 1 folder containing 1 document in a library containing 250k documents. Sync never end or take forever wich result updated document get re-uploaded/downloaded after 1hour or more.

@jab365cloud@RJ Miller ;

 

A couple things to look for and consider:

 

One of the things we've discovered that isn't really documented anywhere is that the more content you shove into a single Document Library, the worse that library performs. Adding additional Indexes to the Library manually can help, but in general, the fewer items you put into a library the better. This becomes apparent even when browsing the library via the UI: a library with fewer total objects browses faster than one with hundreds of thousands. The Sync client will ultimately be affected by that increased overhead as well: when it makes API calls to detect or replicate changes, it's going to take longer to complete. We learned this the hard way ourselves and are actively working on breaking up our libraries. If you haven't migrated data yet, find a way to break up your content into as many libraries as possible to reduce the total volume. Also bear in mind that as is the nature of all file storage, it never gets smaller, nobody ever deletes anything, if you start with an overly large library, your experience will never get better from that point.

 

I know that the eventual goal is to get the OD Client to gracefully handle syncing up to a million objects, but that hasn't been publicly communicated and there is no timeline for when that might be realized.

 

 

Thanks for sharing!!!

@Dustin Adam is there a way to enforce online/cloud only when using OneDrive vs Files On Demand? I know this is a completely different architecture, but when dealing with all these issues and user complaints, comparing it to a Google Drive implementation for enterprise, Google seems to have gone with a 'make it look/work like a mapped network drive'. They don't need to constantly sync and check what's changed as far as I can tell. Staff who do want to use OneDrive instead of the browser, really just want the explorer view if they are in that transactional type role. If there are staff that want an offline option, they can just do the right-click - keep offline as-hoc (basically as it is now).

@_Chris_G 

 

Hey Chris;

 

I'm not sure if this is exactly what you are looking for, but through MDM or ADML templates you can enforce the OneDrive Client to use Files On Demand by default:

 

https://docs.microsoft.com/en-us/onedrive/use-group-policy#FilesOnDemandEnabled

 

If I misunderstood your question let me know.

@Dustin Adam thanks for the reply. I was actually meaning the opposite and to prevent any local download/offline files using the OneDrive client and keep it as 'cloud only' access. This would be an attempt to prevent performance syncing issues on the client as well as the general conflicts/issues that can occur. I understand the trade-off would be to have reliable internet access. I basically want to replicate the map network drive and file server architecture as in the past but instead use the OneDrive Client and SharePoint online in its place. I feel this would prevent all the issues in this thread (until at least the sync client is reliable and fast when picking up changes). I suspect that is not an option and 'Files on Demand' is our only choice? I want 'Files Cloud Only' in OneDrive.

@Dustin Adam 

 

So we took the plunge of moving a file server to OneDrive for Business Plan 2, on request from a client. The migration spanned approx. 10 parent folders (Shared Drives) and roughly 600 000 - 800 000 files in total - 2.4TB. There was one or two folders in excess of 100 000 files which we split out as we learned about the 100 000 limit. All users have the Files on Demand feature enabled and we shared folders from the OBP2 account to respective users (approx. 20 users). 

 

Unfortunately it has been a disaster. With the most common issue being that the end users cannot even sync 1 shared folder to their PC's, with files on demand enabled. It often just hangs in "processing changes" state without any files appearing for days on end. 

 

We raised it with Microsoft support (Premier support) - but here's the strange thing. While their communication has been absolutely dismal - what I've gathered between the radio silence and infrequent responses is that they have run some "diagnostics tool" on some of the affected accounts. Within less than an hour suddenly those affected accounts start syncing the shared folders immediately, things start appearing in the app at light speed. It works brilliantly. But then after say 48 hours the user's OneDrive account/sync is "broken" again and just hangs forever.

 

I've often struggled to gather any precise responses from MS Support team on the issue and what they did when, but the client is now cancelling with MS and wants us to find another solution. Perhaps the scope was too large for OneDrive for Business, or we did it wrong or missed the fine print, but we've also learned a hard lesson that support for the product is also poor and not business ready.  I have subsequently cancelled all OneDrive migrations lined up in future for fear of this happening to others.

Ah, yeah, thats not how the client works. Whether you're using Files on Demand or Full Sync, the OneDrive client is integrated with the NTFS file system and actually uses a number of unique NTFS property flags to set the state of any given file. So in reality, something is always going to be written to disk in some fashion when you sync a library. What you're referring to is only possible in SharePoint and OneDrive if you use WebDAV, and in fact we do at my org so get people into libraries that the Sync Client cant handle yet, it's dirty and comes with it's own set of unique challenges but it does act largely the way you describe.

@JonnaP 

 

Are there a lot of broken inheritance permissions in the content that your client is syncing? or does everyone have access to everything? there is a performance switch that can be enabled in the Sync Client if it's handling read-only folders or folders with broken inheritance.

Aside from one or two senior execs having access to all of the shared folders, most users only have been granted access to certain folders based on their role in the company. Both senior and standard users have the shared folder sync issue.
I see sharepoint online (which OneDrive is based on IIRC) recommends 300k objects max for best performance, so it's probably a case that we've exceeded that limit and things are going awry.

@JonnaP  Hi. Interested in your case. What is your largest document library in terms of number of files and what is your largest site in terms of number of files (answers may be the same if you only have one doc library in each site).

I would suggest, at least for the individuals who have access to a sub-set of the content, enabling the "Permit Disable Permission Inheritance" Group Policy flag for the OneDrive client:

https://docs.microsoft.com/en-us/onedrive/use-group-policy#PermitDisablePermissionInheritance

You should also consider enabling the Storage Sense group policy option in Windows 10 (on computers running at least 1903) that can automatically make synced files Online Only if they haven't been accessed in a specified time period, as this helps keep the pressure off the sync client as well.

https://support.office.com/en-us/article/use-onedrive-and-storage-sense-in-windows-10-to-manage-disk...

You can enable that manually, but you can also set Storage Sense options through GPO or InTune, depending on how your client's PCs are managed.

Lastly, I find it concerning that all the content is stored in a OneDrive account as opposed to a SharePoint Document Library. While, yes, a ODfB account is in essence a SP Library, and content can be shared from it, it wasn't designed to be used in the same manner. OneDrive libraries are not given the same processing overhead as a SP Library, which is intended to serve a large number of clients. all OneDrive requests place burden on the backend SQL infrastructure to serve back responses to clients, storing the organizations content in OneDrive instead of SharePoint is bound to introduce performance issues, as OneDrive is provisioned to serve a single client primarily.

Hi @Dustin Adam ,

 

Kudos on some valuable insight here. I was wondering if there were any new updates in regards to the one drive sync limitation increase... I am dying and hoping that something will get released soon. 

We have spent most of last year planning our SharePoint collaboration environment and we have re-structured our large department libraries to an active (sync'able) and archive (non-sync'able) libraries. We slimmed down the "active" libraries to about "60K" sync'able files for our large departments (from millions of files) and only recommend to sync 1 library per department. 
However, we still notice a lot of sync delays, conflicts, and performance overhead.

Please let us know if there is a light at the end of the tunnel.

I hear you, its a struggle, our own roll-out of OneDrive has been exceptionally limited because of the upper limits.

I know that it's on their radar, and there are a lot of customers that are struggling. Through the grapevine I hear that the internal target they are going to try and work towards is getting the sync client to handle a million objects gracefully, but they aren't going to attempt to get it to handle more than that.

Obviously, they wont provide a timeline for this officially, work on performance improvements isn't at all the same as building a new feature that's easier to hold yourself to. Could be this year? I don't have a sense of how much of a priority this is for them, and I would suspect that achieving that is going to require engineering changes beyond simply the sync client itself.

All of that is to say, yeah, it'll get significantly better and likely open up most use cases for migrating file server content to SharePoint... but don't hold your breath, it'll happen when it happens.

@Dustin Adam let's hope it comes even in incremental steps and not with a major release going from 100k limit to 1M files limit.

 

What I've heard from a friend is that there is a partner beta that tries to offload the sync client load by identefying stale files and ignoring them. 

 

We are also struggling with sync issues and we ended up splitting large libraries to many smaller 7 months after the initial migration...

lets hope so, the product team isn't great at discussing performance roadmap objectives

There's a lot of good knowledge in this thread - keep them coming.

So basically, if you have a huge library with 1 million files you will probably run into problems. If you decided to split them into 20 of them with 50k each you will be better off. But if a user decides they need to sync (even though just on-demand) all 20 of them, the user will still run into problems because the total volume of files sync'ed on this particular user's computer is 1 million?

@eddablinHi there. If I understand the terminology correctly... so all folders/files only exist on a single OneDrive for Business Plan 2 account. That means there's only 1 library. This plan 2 account has 9 parent folders. Unfortunately OneDrive doesn't seem to detail or report the number of files (or I just can't find where in the UI), but it should be around 600 000 - 800 000 files in total.

Each user who has O365 Business Premium license (About 20 users total) has been given shared links to certain folders in the above library, but in most cases the shared link is actually a subfolder and not the entire parent folder, see below...

So in at least 2 parent folders in the library there exists subfolders that initially contained in excess of 100 000 files. Due to this causing issues with sharing links to users and us finding out about the 100 000 limitation, we split out alphabetized variants of the subfolders and shared those instead.

Parant Folder A >
Subfolder A-F (Shared)
Subfolder G-K (Shared)
etc...

I'm now considering that despite us overcoming the shared permission issue by alphabetized split of the subfolders, because they exist in a parent folder that contains in excess of 100 000 objects then perhaps this could still cause the issue we are observing? (File on Demand sync at client end just hanging and changes taking forever to process)

 

I'm now considering moving the subfolders out of the parent folder and re-sharing to users so they can sync to their PC's and see what happens...

 

UPDATE: I signed into a user's OneDrive account, accessed his "Shared With Me" section, clicked sync on a folder shared from the OneDrive Plan 2 library/account that contains 20 000 items only - unfortunately it's not syncing to my PC. Hangs on processing changes, nothing comes through - occasionally says processing 501 or 502 changes then back to nothing again. There goes that theory :( So moving things out a parent folder to remove the 100 000 concern probably won't help.

 

My next plan is to move one of the folders to another OneDrive account and try sharing/syncing that to the user's account.

 

@eddablin 

Hi there. If I understand the terminology correctly... so all folders/files only exist on a single OneDrive for Business Plan 2 account. That means there's only 1 library. This plan 2 account has 9 parent folders. Unfortunately OneDrive doesn't seem to detail or report the number of files (or I just can't find where in the UI), but it should be around 600 000 - 800 000 files in total.

Each user who has O365 Business Premium license (About 20 users total) has been given shared links to certain folders in the above library, but in most cases the shared link is actually a subfolder and not the entire parent folder, see below...

So in at least 2 parent folders in the library there exists subfolders that initially contained in excess of 100 000 files. Due to this causing issues with sharing links to users and us finding out about the 100 000 limitation, we split out alphabetized variants of the subfolders and shared those instead.

Parant Folder A >
Subfolder A-F (Shared)
Subfolder G-K (Shared)
etc...

I'm now considering that despite us overcoming the shared permission issue by alphabetized split of the subfolders, because they exist in a parent folder that contains in excess of 100 000 objects then perhaps this could still cause the issue we are observing? (File on Demand sync at client end just hanging and changes taking forever to process)

 

I'm now considering moving the subfolders out of the parent folder and re-sharing to users so they can sync to their PC's and see what happens...

 

UPDATE: I signed into a user's OneDrive account, accessed his "Shared With Me" section, clicked sync on a folder shared from the OneDrive Plan 2 library/account that contains 20 000 items only - unfortunately it's not syncing to my PC. Hangs on processing changes, nothing comes through - occasionally says processing 501 or 502 changes then back to nothing again. There goes that theory :( So moving things out a parent folder to remove the 100 000 concern probably won't help.

 

My next plan is to move one of the folders to another OneDrive account and try sharing/syncing that to the user's account.