Cosmos Db JAVA SDK Retry Policy

Brass Contributor

Hi Azure Cosmos Db Team,

 

We haven't explicitly set retry policy in the event of throttling. Uses the default throttling retry policy.

Below as seen from diagnostics.

 

throttlingRetryOptions=RetryOptions{maxRetryAttemptsOnThrottledRequests=9, maxRetryWaitTime=PT30S}

 

However when we encountered actual throttling ("statusCode\":429,\"subStatusCode\":3200) we see in the diagnostics values increasing in multiples of 4 \"retryAfterInMs\":4.0 x-ms-retry-after-ms=4, \"retryAfterInMs\":8.0 x-ms-retry-after-ms=8 and resulting in Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later.

 

Can you please let me know the difference in behavior here(maxRetryWaitTime as shown in throttlingRetryOptions and retryAfterInMs in the diagnostics as seen above in the event pf throttling) ? I was expecting in the event of throttling the request will be retried after 30 seconds only based on throttlingRetryOptions setting? This is having a compounding effect in case of concurrent requests which affects overall throughput. We need to customize based on our requirement the retry no of times and interval in the event of throttling. Which parameter should we use for that?

 

With Regards,

Nitin Rahim

 

6 Replies

@mannu2050 can you help here? 

 

Hi Azure Cosmos Db Team,

Just following up on the same ?

With Regards,
Nitin Rahim
Hi Azure Cosmos Db Team,

Good morning.

Just following up on the same?

Thanks.

With Regards,
Nitin Rahim
setMaxRetryWaitTimeInSeconds(int maxRetryWaitTimeInSeconds) is the method which helps in setting the maximum retry time in seconds. After setting this up, you can validate it with getMaxRetryWaitTimeInSeconds(), if the object has got the correct value.
This method encapsulates the header x-ms-retry-after-ms whose core job to set the gap. It seems to me there is some problem inside the client code.
Just an example:
To use retryOptions you can use something like this:
ThrottlingRetryOptions retryOptions = new ThrottlingRetryOptions().setMaxRetryAttemptsOnThrottledRequests(1);

client1 = new CosmosClientBuilder()
.endpoint(AccountSettings.HOST)
.key(AccountSettings.MASTER_KEY)
.consistencyLevel(ConsistencyLevel.SESSION)
.contentResponseOnWriteEnabled(true)
.directMode()
.throttlingRetryOptions(retryOptions)
.buildAsyncClient();
I have copied this snippet from this sample https://github.com/Azure-Samples/azure-cosmos-java-sql-api-samples/blob/3a7314e6062c8edf9335f727aee1...

@nitinrahim `maxRetryWaitTime` in `ThrottlingRetryOptions` is an indicator of how long should the SDK wait before it stops retrying internally and returns the error back to the client. You can read about it in the Java docs on the `setMaxRetryWaitTime()` - ThrottlingRetryOptions (Azure SDK for Java Reference Documentation) (azuresdkdocs.blob.core.windows....

 

On the other hand, what you see in the diagnostics is the retry-after header, which is returned by the backend service to the client-side SDK as part of the internal retries to guide the client to retry after this time (usually in milliseconds and usually increases as multiples of 2) after throttling. Application or end user using SDK cannot control this internal header.

 

For concurrent requests, if you would like to keep retrying, you can increase the number of maxRetryAttempts, or you can also increase the maxRetryWaitTime so that the SDK will keep retrying. However, if throttling happens with multiple concurrent requests, then one way to resolve it is increasing the throughput of the database / container. Another way to solve this could be using the ThroughputControl mechanism built in the Java SDK, you can refer to the sample here - azure-cosmos-java-sql-api-samples/src/main/java/com/azure/cosmos/examples/throughputcontrol/async/Th...

Thanks Manish and Kushagra Thapar. Testing the same.

With Regards,
Nitin Rahim