SharePoint 429 Error - Throttling

Copper Contributor

We have a custom code application which hits SharePoint with multiple ExecuteQueries. This has been running fine for 6 months. From last week, we began receiving a lot of 429 Errors. 

 

We made sure that that there is a delay of up to 3 secs between each ExecuteQuery and also decorated the HTTP calls as Microsoft suggest in (https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-o...). Also multiple users are being used so that not always the same user is used when hitting SharePoint.

 

Has anyone encountered this? Does anyone know the actual values of throttling in SharePoint or what triggers it?

 

 

28 Replies

Instead of 3 seconds delay between each query, you can execute one after the other. If you encounter 429 error, just inspect the response header for 'retry-after' entry. This retry-after field will contain the number of seconds you need to wait until you make your next query.

Thanks Robert,

Very interesting and we'll try it out.

Our main concern though is that we haven't been able to obtain a reply in 3 days if the site has been throttled and why, so as to resolve this and ensure it is not triggered again.

We already have tried this but unfortunately when there is a 429 error, and the ExecuteQuery retries, the returned information will contain missing fields. For example the File name of the item is returned as empty. 

 

So unfortunately this was leading us to other issues and that's why we created a delay of 3 secs between each ExecuteQuery.

ExecuteQuery retries, the returned information will contain missing fields - This should not happen!. Can you share your retry code?

We have faced this issue last week and we noted that one of the possible issues was that the decoration for not being throttled was not correct...another thing you can take a look is to the possibility to make async queries to SPO...you can find more about this pattern in the PnP Community Call hold last week

Hi Robert.. yes sure please find below:

 

                for (var i = 0; i < 1000; i++)

                {

                    var lib = ctx.Web.Lists.GetByTitle(libraryName);

                    var items = lib.GetItems(new CamlQuery());

                    ctx.Load(items);

                    try

                    {

                        ctx.ExecuteQuery();

                    }

                    catch(System.Net.WebException wex)

                    {

                        var response = wex.Response as System.Net.HttpWebResponse;

 

                        if (response != null && (response.StatusCode == (System.Net.HttpStatusCode)429 || response.StatusCode == (System.Net.HttpStatusCode)503))

                        {

                            var retryAfter = Int32.Parse(response.Headers["Retry-After"].ToString());

                            Thread.Sleep(TimeSpan.FromSeconds(retryAfter));

                            ctx.ExecuteQuery();

                            try

                            {

                                var x = items.Count; // throws CollectionNotInitializedException

                            }

                            catch(Exception ex2)

                            {

 

                            }

                        }

                    }

                    Console.WriteLine($"{i}:{items.Count}");

                }

 

Some comments:

  • I wrapping around the ExecuteQuery in a try/catch block and not the ctx.Load() because this suggested that should be enough: https://github.com/SharePoint/PnP/tree/dev/Samples/Core.Throttling
  • My idea was to write an ExecuteQueryWithRetry code, similar to the one published by the PnP group but use your retry-after recommendation.  This way, all I’d need to change within my code is replacing all calls to ExecuteQuery with ExecuteQueryWithRetry.
  • The problem is that the we get an error when accessing the items collection after a retry attempt.
  • I guess a “solution” could be to encapsulate the “Load” in the try catch block as well but while this is easy to accomplish in the above code, in our real life scenario it gets much more tricky as there would be a lot of lines of code to modify.
IMO, encapsulating the 'Load' within the try-catch will not work. I think the context might be corrupted, You may reinitialize the context when the exception is thrown.

I guess you might be having a lot of calls to ExecuteQuery, I suggest you to try to reduce the calls by loading it them in single context and fire as a single call.

We just got hit with this today as well.  In my case i simply connected via powershell to enable a cdn and got a 429 back from a single request.  I've seen a few posts on this.   feels like something changed recently

We had the same. After converting motlre than 6Tb without any problems using sharegate we were throttled migrating a small delta. Microsoft told us that we were throttled only during business hours. When converting in the weekends and after business hours we did not get the 439 errors anymore!!! Maybe you should try this as well! Good luck!

At least you got that reply :)

 

We have been in contact with Microsoft Support for quiet a while now and we did not get any specific reply like yours :\

 

However it does not make sense that you are being throttled during business hours for normal use. We have been using this scenario for months (after multiple migrations to SharePoint) and we never had this issue. So I find it very strange that all of a sudden we began to experience this throttling issue.

 

One item which Microsoft told us was that it could be the case that other tenants could be effecting the overall SharePoint experience, hence effecting other tenants. But again it does not make from a customer perspective.

I've been battling this for a few weeks now as well and have a support ticket open with Microsoft. Decorating the requests with a UserAgent has helped as per this guidance

https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-o...

It seems you can make up and Company name | Product Name | Version and although the article say to "register" the app, I've not been able to find anyone in Microsoft which whom to give these values to. It seems the mere presence of the values in the right format makes a difference but won't prevent the 429 responses from occurring. Our testing has concluded that the 429 throttling responses (without decorating the requests) happens during business hours at the data centre (peak times) and is heavily influences by other traffic happening in the data centre at the time and not purely how many and how frequent the calls from your code is. e.g. I can get a 429 making a single call (no other calls for hours before it), yet I can make > 5000 calls to SharePoint in a couple of minutes and not get a single throttling error.

 

As an aside I've identified an issue using the Graph API to access SharePoint items where it will always return a 429 throttling response when it's actually the query hitting a SharePoint threshold (large list) limit so you will want to check this isn't the case if you are seeing this issue as that one is easier to identify and avoid! 

https://camerondwyer.wordpress.com/2018/01/31/microsoft-graph-api-throttling-sharepoint-lists-librar...

 

Sharegate has released a new version which should get a higher priority from Microsoft which than should decrease the number of 429. We did not try this version yet because we finished our migration.

 

The 429 throttling errors are to protect the tenant so they might occur in different situations in different tenants. Our was situated in the UK and we were throttled suddenly in Feb-2018. Before we did not have any 429 issues and we converted 6Tb before.  When moving off hours we were able to run Sharegate again without many 429 errors. Microsoft has a SLA on availability NOT on performance that is why they will do anything to protect their office365 tenants.

The timing of 429 errors starting to appear was early Feb 2018 for me as well. Other threads and people I've had conversations with share this timeline as well. It seems something changed in either the way throttling happens or the thresholds were tightened around this time - still trying to get some official information from Microsoft. 

Our exact same conclusion.

 

Symptoms all started in Feb 2018 and we're also trying to obtain more information from Microsoft, but so far we keep being redirected on 'how to avoid 429 errors' rather than what led to this situation. We have worked with no issues for the past 6 months, including migration.

 

As you mentioned earlier, we also came across instances where a single call triggers error 429 and other times with heavy load and this does not occur.

 

 

Further to this, the 2 main issues are;

 

- the disruption such unannounced changes create in production environments

- the lack of information on what changed in Feb 2018

 



 

 

Thanks everyone for sharing your finding. We would really appreciate if you could report these issues using our official sp-dev-docs issue list at https://github.com/SharePoint/sp-dev-docs/issues. This issue list is automatically synced with our engineering task list so that we can ensure that your issue does not get hidden or ignored.

 

Would ask following from each of you, so that we can start solving these one-by-one. 

 

- Which tenant has the issue?

- When did the issue happen, so that we can check right log entries?

- What was your code performing when this happened?

- Did you use a user-agent string as instructed in https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-o...?

 

There should NOT be any world-wide change on this capability which should be this widely impacting, but apparently, issues are seen in multiple tenants, so we need actual facts and technical details to analyze what has happened. We can't really investigate the underlying issue without tenant and timing details.

 

Thanks for your input advance. Even if your issue seems to be identified as the previously reported issue, would suggest submitting a new issue so that we truly understand how widely these throttling issues are being encountered.

 

Also in the future - Please use the https://github.com/SharePoint/sp-dev-docs/issues issue list for this kind of issues. This list exists for you to be able to directly report issues to engineering. Thx.

Thanks @Vesa Juvonen for you reply. We have opened a new issue in the list provided and answering all your questions there. 

 

These issues were occurring mainly in February. Our target here is to identify what caused these errors and why. We are planning to move more services to SharePoint and such issues will hold us back until we fully migrate our services, since it will effect usability of custom built applications for our Business to run on SharePoint.

Dear @Vesa Juvonen

 

As to support, we tried to open this through the normal O365 channels and through our distributor. The case was # 30126-7340755.

 

Unfortunately we were never directed to the site you mentioned and if you have access to this case, you may see that most of our questions in relation to throttling were not answered. The standard reply was;

 

"As discussed earlier we as a frontline support will not be able to provide you any further support on this scenario. Also according to the scope of the service request this is out of support boundary and we are proceeding with the archival of this service request 30126-7340755"

 

We were being directed to how to avoid throttling with each support engineer, as against replying to our queries as to how and why the throttling occurred and started out of nowhere.

 

I believe Justin has followed your advise and replied accordingly.

 

Many thanks for following this up.

 

Hi,

 

we're currently experiencing the same situation in our tenants and our customer's tenants. Since 2/3 weeks ago, the 429 errors are suddenly increasing without any configuration changes or updates. 

 

Also using incremental backoff retry and User Agent. 

 

We created an ASPX page that can be uploaded to any Azure web app and simply performs a CSOM query every 5 sec. against SharePoint online. We can see how this operation returns 429 a significant amount of times in these tenants. In case it's useful for someone, it's attached to this comment.

 

As we didn't get any official reply from MS yet, did you guys manage to get an explanation or a satisfactory resolution to your cases? Did the errors just go or were there any actions involved?

 

We feel we're quite in the dark here...

 

Thank you,

Enric