Major Onedrive Business client continuous sync loop bug

Copper Contributor

I have utilized procmon and other methods and reproduced continually to be able to confirm now that there is in fact a MAJOR bug IMO in the Onedrive new gen client that handles both personal and business (both of which I use) - currently the latest version 2016 (Build 17.3.6943.0625).

In my case, I literally BURNED OUT a newer SSD as it sat in a stuck "processing changes" sync loop - day after day, despite continually performing the canned suggestions my MS to blatently do full resets and resync the HUNDREDS OF GIGS of data and some of the other wildly severe suggestions.

Here is the major bug:  To reproduce this (there are other ways but this is just the fastest sure-fire test that I have even scripted to a simple one-liner command (NOTE: THIS COMMAND DEFAULTS TO YOUR FIRST ONEDRIVE BUSINESS ACCOUNT FOLDER - IF YOU HAVE MORE THAN ONE BUSINESS ACCOUNT DEFINED YOU MAY HAVE TO ADJUST THE COMMAND OR MANUALLY CHOOSE THE LOCATION😞

1.  Ensure Onedrive Client is running and if goal is a reproduce ensure client in sync and idle
2.  Open a Powershell session
3.  Type/Paste the following in single line:  

     Start-Transcript -Path "$(Get-ItemPropertyValue -path HKCU:\Software\Microsoft\OneDrive\Accounts\Business1 -Name 'UserFolder')\LockedFile.txt"

4.  INSTANTLY - watch the Onedrive client switch to "Processing Changes" and as bonus open procmon to verify in fact the client is now CHURNING 100% INFINITE LOOP through ALL FILES in the Onedrive structure AND all Onedrive registry keys

5.  At this point, you have TWO choices:

     a) Continue to watch the HUNDREDS OF GIGABYTES PER DAY of NAND writes as the SSD burns out after a few months

     b) Decide you are convinced of this issue, and in turn issue the following command in the same Powershell session:

     Stop-Transcript

6. (Assuming you opted for option b above) Wait literally a few seconds and watch the magic as Onedrive regains it's mind, and at the same time watch in procmon how it then CEASES all activity, goes back into an "in sync" state, and back to fully IDLE.

 

7. Remember to clean up and manually delete the "LockedFile.txt" in your Onedrive folder (I refuse to post delete commands). 🙂

 

Of note, this was NOT easy to track down with procmon, because as you have seen, ALL your Onedrive files keep looping and will NOT show you an indication of this issue.  It was only through HOURS of trial and error and in my case only because on only one of my machines had I just moved my PSHOME folder into Onedrive, and very ironically because specifically because it was the TRANSCRIPTS of my daily PS sessions that I wanted synced via Onedrive which of course is clear is IMPOSSIBLE until MS addresses this issue.  So, if you have other apps that are putting a hard lock on files in your Onedrive folders somewhere, you may have to do some trial and error.

MS needs to fix this.  To have a sync client that is at THIS maturity level and totally thrown into a broken fit by ONE SINGLE hard-locked file that of course will throw a SHARING VIOLATION, is utterly absurd (and bad exception handling code, IMO).

So, word to the wise - along with long paths > 255 char, zero-byte files, and about another dozen absolutely ABSURD things that basic error checking code should be catching for exceptions (these too also cause sync issues) but instead we as BUSINESS users have to manually manage (and protect against) these exceptions ourselves, now please also be sure to move ANYTHING that may use a path that will result in EVER having a SINGLE fully locked file within - OUT of the Onedrive structure completely!

Hope this helps someone else save hours of troubleshooting and that MS folks can get this issue in front of DEVs to fix soon.

 

7 Replies

Yes, I'm seeing the exact same thing over the past few weeks. It appears to be the most recent version of OneDrive that was installed (17.3.6943.0625), which also happened around the same time my laptop was forced to update to the Creator's Update. Randomly (it seems), when I login to my computer, OneDrive feels like it needs to resync some or all of the 12GB of sharepoint folders that I have synced. I have 3 sites synced, and it doesn't always resync all of them, but usually at least 7GB of something. It is driving me crazy, and several other people in our office are experiencing it as well.

 

How did you identify the files that were the culprit? Someone else recommended the I tried the Support and Recovery Assistant and thought it was related to OneNote, but that tool found no issues for me.

 

Is there any way to go back to an old version? Is there any hope on the horizon for a fix for this? As you have said, this is not good for my SSD drive, and it's also not good for my home/office bandwidth usage.

Couple of points (mostly in reply to OP):

 

First point - this has actually been a "problem" for a considerable amount of time, I recall certainly throughout 2016 as encountering this (personal OneDrive with iTunes library configuration stored there - iTunes causes a write lock throughout the time it's running). Any write lock will cause OneDrive to loop through looking for changed files, as it's reacting to writes. Consequently, I'm not convinced searching for an older client will help.

 

Most straight-forward way to find the files is looking at the explorer file badges; folders and files with locks will constantly have the blue sync badge over their icon.

 

Second point - the bit about destroying SSD through NAND writes is completely wrong. If you actually look at the procmon results you'll see the desired access for every file is Read/List + Read Attributes. Additionally, it's not hundreds of gigabytes worth of data since it's just scraping the properties for the file to work out what's changed. You can confirm this by loading up Resource Manager, hitting the Disk tab and (if it even appears) selecting OneDrive.exe.

 

CPU nor disk performance never spirals out of control, so don't see this as being much of an operational issue.

 

 

Interesting. I've never seen this happen until a few weeks ago, but that could certainly be due to some file locking that I"m not aware of.

 

Just a few minutes ago, as I arrived to work, and docked my laptop, and resumed from hibernation, all of a sudden, every site started syncing again, and now it's cranking through 8GB of data. This never happened before, and now it seems to be almost every day.

 

It's good to know that my SSD won't burn out, but the bandwidth issue is annoying, and so is the fact that I can't really access my files while this is happening. It is true that the CPU seems to be under control, but my fans do kick in on the laptop, and the network speed is affected...especially when this has happened to several people at our company all at once.

 

So I'm very interested to know why this is happening, whether a fix is coming, and if it is truly related to a locked file (does this mean I have locked files on EVERY site, or does one locked file cause all sites to be re-synced?), how can identify which file that is? Should I just stop syncing all sites and re-sync them again to possibly "fix" the locked file situation?

 

I also wondered if this is due to the fact that two of the three sites I'm syncing have exceeded the List View Threshold. That used to just kill things entirely, but now OneDrive seems to work, and so do the sites, but I'm wondering if that is affecting the sync at all?

 

Incidentally, a few weeks ago I did end up having to resync everything because my One Drive kept crashing on a KernelBase.DLL error. I was finally able to get it to function long enough to try to resync, and it told me I was already syncing those sites (but it wasn't), so most if not all of the registry settings were still there. But once I was able to stopp syncing all the sites, I was able to restart the sync. I'm not sure that was related, but these issues all started after that. (What seemed to cause this issue was one of our users dropping a 2.6GB zip file into a folder. I noticed the increased size, unchecked the folder it was in, and it seemed fine for a bit, and then the icon just disappeared, and I could never get it to restart, until finally checking event viewer to find the crash details.)

 

The other thing that I saw, and some of my co-workers, was a scenario where OneDrive got stuck syncing, and it showed remaining files of 30-40GB (when our site is nowhere near that big). So something was messed up, and I'm not sure why. I haven't seen that since re-syncing, but now I'm experiencing this full sync issue, almost on a daily basis.

 

Any help on how to find the cause and fix it would be appreciated.

 

@Luke Gatchell:
Trial and error. Searching and ordering by creation/modified date and working backwards. It sucks as now I have to do it again today as it started again and hopefully is just some simple zero byte file, long path length, etc.


@Chris Moore:

"Most straight-forward way to find the files is looking at the explorer file badges; folders and files with locks will constantly have the blue sync badge over their icon."

 

This is ludicrious. So much so, that I won't even go on to ask if before you so dangerously stated that "you do not see this as an operational issue", if you have actually DONE ANYTHING on this? Can you provide the research and logs, or are you just here to troll?

 

I clock this at generating HUNDREDS OF GIGS PER DAY of nand writes due to this behavior - do you know how to operate calc.exe to do that math over even a year? Show me a single recent SSD that can handle that many writes.......they can't and again unlike you I actually DO have dead SSDs to prove this.

I am also well aware that this has been an issue for "some time" as I'm approaching my 30yrs in this business and have literally used Onedrive since it was in limited internal MS testing as Skydrive (for personal), live mesh (before they bought it), and blended it with Sharepoint to make this latest client. It's a mish-mosh, hodge-podge of code from three major sources that makes this latest "next gen" sync client.

 

Now, with all that said, since I've also been doing development for three decades let me ask, who has the excuse of why this client has been not in alpha or beta, but now in production desktop environments for YEARS yet can't handle the most BASIC of things like zero byte files and simple friggin PATH LENGTH ONE LINE OF CODE tests that I was writing in assembly 30 yrs ago?!?!? What's the excuse as to why, even today in 2017, I literally AGAIN am watching this "next gen" client cycle infinitely on my SSD undoubtedly because of a single zero byte file or long path I need to hunt down, yet STILL CANNOT GET a single simple verbose LOG FILE out of this client indicating WHAT FILE OR FOLDER is even a problem?

 

.....And if you knew ANYTHING about this client you would know that if you can make it past the ~11 (also horrible coding) hard-coded OS limit in 2017 for simple shell icon overlays, to even SHOW the Onedrive sync icon overlays, you would be well aware that they are HORRIBLY inaccurate at times and in this bug case, can show HUNDREDS of files not sync'd.......know why? Because of the ONE issue preventing it from continuing - yet unlike the cases where it DOES pop up with illegal filename, sharing issues, etc., in this bug case it NEVER INDICATES A SINGLE ISSUE (along with no log), and you can prove all of this in 2 seconds with procmon (MS own tool now since they bought Russinovich) and watch all of this (I think I even already provided a real life video showing it) as it loops through in my case the HUNDREDS OF GIGS in my Ondrive Business structure......and I have yet to find any line/event to easily target the issues and wind up wasting HOURS each time. Last time it was a simple Powershell lock on transcripts that anyone can prove this bug in 2 minutes just by opening a PS transcript anywhere in the Onedrive structure and boom sync loop bug and then leave it on for a year and call me and let me know how that SSD is doing. 🙂

 

Sorry but this trolling is absurd when I've even shot VIDEOS of this issue when idiots show up just like if I passed a FATAL car accident on the freeway, rolled my window down, and said to the family of the deceased "this accident is really not a big operational problem in your life", LOL.

 

I have 155 files right now not syncing because of this issue right now, so you want to tell me again it's "not really an operational issue"? LOL. It really, really, is exactly that - a BIG operational issue that MS should be embarrassed of with this client maturity level and should get on it to eradicate such basic sync problems like this before it bites their paying corporate customers (like me) before they get tired of waiting and just move to a competitor for basic cloud syncing.

 

I mean, I've been developing for 3 decades, so seriously if MS can't afford friggin putting developers on major basic issues like this that again if I were them would be EMBRARRASED of and would want to squash immediately, then release the source code and I bet between me and the rest of the developers out here using their product daily with skin in the game that we could have this issue eradicated tested and rolled out quickly.

 

Have similar issues as well with one of our OneDrive for Business users.

 

I can also reproduce the problem with Start-Transcript, even on current OneDrive client version 17.3.7074.1023

Has Microsoft fixed the issue or is this the pattern you refer to?

 

I try reproducing it:

Start-Transcript -Path "C:\Users\myUsername\OneDrive - myCompany\test\LockedFile.txt"

Now the status of LockedFile.txt is Syncing (overlay icon is blue circular arrows). (Astonishingly the status of the folder test and the top folder OneDrive – myCompany both is UpToDate (green overlay icon) – so without drilling into the structure you never know whether UpToDate (green) is really UpToDate (green)? For me this is a bug.) 

 

Process Explorer now indicates that OneDrive.exe consumes between 3% and 7% of the CPU on my old 2 core machine (i5-2520M@2.5GHz, 8 GB RAM) running Windows 7 SP1. When I also start Process Monitor, the CPU consumption of OneDrive.exe indicated by Process Explorer goes up to something between 8% and 10% (reminds me of Heisenberg's uncertainty principle in quantum mechanics – comprehensive measuring is affecting). Indeed OneDrive.exe is one of the top event contributors in Process Monitor. The vast majority of these operations are Registry operations, most of them reads. The file system operations of OneDrive.exe are much less. WriteFile operations are between 10 and 30 per second and even less CreateFile and CloseFile operations. The WriteFile operations basically are logging to C:\Users\myUsername\AppData\Local\Microsoft\OneDrive\logs\Business1\SyncEngine-2018-3-1.332.20180.57.aodl . CreateFile and CloseFile operations are performed on LockedFile.txt and the two folders above, but on no other files in the OneDrive for Business structure.

 

Collin, is that your error pattern? Or what exactly do you see when you "watch the HUNDREDS OF GIGABYTES PER DAY of NAND writes"?

 

I feel either Microsoft fixed the issue in the meantime – I am on version 18.25.204.7 – or this is not really an issue.

 

Also a zero byte file does not create any issue.

 

Are there other issues around with the Next-gen Sync Client?

@p.schaefer You are correct on the registry, however no this is not resolved and the NAND writes the few above have argued with they clearly did not launch procmon to watch it continually both read and ultimately cause unneeded SSD NAND writes (in a loop - FOREVER) until you (through trial and error because Onedrive isn't (still) intelligent enough to actually indicate just which file(s) are currently making it lose it's mind and barf on my CPU. 🙂

 

Sorry but as a paying customer of both Onedrive and Onedrive for business, I should be able to expect a sync app that at the very least will tell me if it's having trouble with one or more pieces of my data - and currently (06-30-2018) even the much improved client still does not.  I write code (and have for > 30 yrs) so why on earth the try/catch blocks they have in place can't manage to break the client out of a loop when it hits a file it cannot sync, even with # of retries, and either log it and/or report it to the user - is totally without excuse IMO.