Azure PaaS Blog

12 MIN READ

Azure Redis Timeouts - Client Side Issues

Microsoft

Jan 04, 2021

Overview

There are many reasons that may cause timeouts on Redis client side, due to client, network or server side causes, and the error message also may differ based on Client library used.

Timeouts in Azure Cache for Redis occurs on client side when client application cannot receive the response from Redis server side timely.
Anytime, when client application doesn’t receive response before one of the timeout values expire, a timeout will occur and will be logged on client application logs as Redis timeout error message.

As Redis service acts as a Memory cache for client applications, the performance on Redis calls should be high, with low response times. For that reason, timeout values used in the client application are usually low.
This means any stress or overload on client side will affect firstly Redis service, other than other applications, causing higher latency on Redis requests, and for that reason some Redis timeouts may occur.

Order by most common issues, below are the most common Client side causes:

1- Traffic Burst

2- Large Key Value Sizes

3- High Client CPU Load

4- Client Network Bandwidth Exceeded

5- Client Timeout Configurations

6- Idle Connections

7- High Client Memory Usage

For Network or Server side Redis timeout causes, please see these Tech Community articles:

Azure Redis Timeouts - Server Side Issues
Azure Redis Timeouts - Network Issues

1- Traffic Burst

Bursts of traffic combined with poor thread settings at client side can result in delays in processing data already sent by Redis Server but not yet consumed on the client side - this can cause Redis timeouts on client side.
Different client libraries can deal with this in different ways.

Traffic burst and ThreadPool settings on Stackexchange.Redis:

To deal with traffic burst and manage threadpool, Stackexchange.Redis client library have two types of threads:

• Worker threads: used for processing commands on Task.Run(…) or ThreadPool.QueueUserWorkItem(…) methods.
• IOCP (I/O Completion Port) threads: used when asynchronous IO happens (e.g. reading from the network).

Each type defines minimum number of threads and by default, the minimum is set to the number of cores in a client machine.
ThreadPool provides new threads on demand until it reaches a minimum.
When the number of IOCP or Worker used threads (Busy) reach the minimum (Min), the new threads will be throttled.
When throttle occurs, the rate at which it injects new threads is one thread per 500 milliseconds.
When client application gets burst of traffic that needs more threads than the minimum, then some threads will be throttled, and some timeouts may occur on Redis client.

Stackexchange.Redis exceptions contain useful information about a number of minimum and busy threads, for example:

Timeout performing EVAL, inst: 187, mgr: Inactive, err: never, queue: 74, qu: 1, qs: 73, qc: 0, wr: 0, wq: 0, in: 1536, ar: 0, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=40,Free=32687,Min=2,Max=32767), clientName: RD123456789ABC

In the above example, the number of busy Worker threads is 40 and the minimum is set to 2.
So, 38 threads * 500 ms = 19s; the 40th thread waited at least 19s to be created.
This exception has been thrown because Redis operation waited too long for a free thread.

Each application may need to adjust the “Min” value based on the usage and on the load.

How to monitor Traffic burst, from Client side:

- At network level, traffic bursts may be monitored by any network traffic analyzer to identify unexpected spikes on network usage;
- Stackexchange.Redis client library logs may be used to investigate higher “Busy” values than “Min”, on IOCP or Worker thread pools, as described above;
- ThreadPoolLogger application may be used to monitor and adjust “Min” values based on the “Busy” on bursts on traffic.

https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs

How to adjust "Min" ThreadPool settings on Stackexchange.Redis:

There are two different ways to configure thread pool settings on Stackexchange.Redis:

1. Using SetMinThreads() on Redis StackExchange (C#) configuration (client side):

Thread pool settings on Stackexchange.Redis can be configured using SetMinThreads() on Redis StackExchange (C#) configuration, on client side application:

private readonly int minWorkerThreads = 200;
private readonly int minIOThreads = 200;
void Application_Start(object sender, EventArgs e)
{
    // Code that runs on application startup
   (…)
   ThreadPool.SetMinThreads(minWorkerThreads, minIOThreads);
}

Note: The value specified in when calling SetMinThreads IS NOT a per-core setting. For example, if you have a 4-core machine and want to set minWorkerThreads and minIoThreads to 200 per CPU during run-time, you should use ThreadPool.SetMinThreads(200, 200).

2. Using <processModel> on Microsoft Internet Information Services (IIS) web server:

It is also possible to specify the minimum threads setting by using the minIoThreads or minWorkerThreads configuration setting under the <processModel> configuration element in Machine.config, usually located at %SystemRoot%\Microsoft.NET\Framework\[versionNumber]\CONFIG\.

<processModel>  on Machine.config

Setting the number of minimum threads in this way is generally not recommended, because it is a System-wide setting.

Note: The value specified in this configuration element is a per-core setting. For example, if you have a 4-core machine and want your minIoThreads setting to be 200 at runtime, you would use <processModel minIoThreads="50"/>.

2- Large Key Value Sizes

Redis cache is optimized to short key sizes (typical 1KB) and using larger key value sizes may cause Redis timeouts.

A large value size may be ok if making only few request at same time, but high number of requests with large key value sizes in a short time period can cause timeouts on Redis.

Because Redis is single-threaded system by design, processing any large key value will block all other requests that come after; if the first large key value takes more time than the expected, next requests will timeout.

In the following example, request 'A' and 'B' are sent quickly to the server. The server starts sending responses 'A' and 'B' quickly. Because of data transfer times, response 'B' must wait behind response 'A' times out even though the server responded quickly.

This is described in this documentation: Large request or response Size

Microsoft recommendation is to Optimize client application to use a large number of small values, rather than a few requests with large values.

In summary, large value sizes should be avoid and should be split in short key value sizes, making more requests.

How to verify Large Key Value Sizes used, from client side:

redis-cli.exe is a popular command-line tool for interacting with an Azure Cache for Redis as a client, and can be used to investigate large key value sizes used.

This tool is part of Redis-io distribution and also available for use with Azure Cache for Redis.

The only caveat of redis-cli.exe command-line tool for Windows doesn't support TLS and Azure Cache for Redis only enable the TLS port (6380) by default. (Latest versions of redis-cli command-line tool, available only for Linux, already supports TLS connection).

To use this tool on Windows environments, an SSL/TLS access can be used using SSL tunnel, or temporarily enabling the non-TLS port (6379) (this is not recommended for security reasons, as the access keys are sent via TCP in clear text).
To use an SSL tunnel, Stunnel tool can be used and this Tech Community documentation may help on that: Enable access for redis-cli.exe

Download and extract redis-cli.exe command-line tool from Windows zip file for the latest Redis.io distribution, available here Redis-x64-3.0.504.zip

Then you can run redis-cli with this syntax (-p 6379 for non-SSL connections, or -p 6380 for SSL connections):

redis-cli -h <RedisName>.redis.cache.windows.net -p 6379 -a <RedisAccessPassword> --bigkeys

In this special mode, redis-cli works as a key space analyzer. It scans the dataset for big keys, but also provides information about the data types that the data set consists of.

This mode is enabled with the --bigkeys option, and produces quite a verbose output:

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far 'key-419' with 3 bytes
[05.14%] Biggest list   found so far 'mylist' with 100004 items
[35.77%] Biggest string found so far 'counter:__rand_int__' with 6 bytes
[73.91%] Biggest hash   found so far 'myobject' with 3 fields

-------- summary -------

Sampled 506 keys in the keyspace!
Total key length in bytes is 3452 (avg len 6.82)

Biggest string found 'counter:__rand_int__' has 6 bytes
Biggest   list found 'mylist' has 100004 items
Biggest   hash found 'myobject' has 3 fields

504 strings with 1403 bytes (99.60% of keys, avg size 2.78)
1 lists with 100004 items (00.20% of keys, avg size 100004.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
1 hashs with 3 fields (00.20% of keys, avg size 3.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

More info about redic-cli and --bigkeys option can be consulted here: redis-cli, the Redis command line interface

How to avoid using Large Key Value Sizes:

To avoid using Large Key Value Sizes, Microsoft recommendation is to optimize client application for a large number of small values, rather than a few large values.

Large value sizes should be avoided and should be split in short key value sizes, making more requests.

3- High Client CPU Load

High CPU usage on the client side can cause that the system cannot keep up with the work that it has been asked to perform.

This could be a problem because if the CPU is busy, can't keep up with the work the application is asking it to do.
The response from Redis can come very quickly, but because the CPU isn't keeping up with the workload, the response sits in the client socket's kernel buffer waiting to be processed.

Microsoft recommendation is not having Client CPU above 80%.

How to verify High Client CPU Load:

- Monitor the client's system-wide CPU usage using metrics available in the Azure portal, in client environment blade (AppService, VM, AKS, etc);

- Monitor performance counters on client machine.

How to limit Client CPU usage:

- Investigate what is causing High CPU or CPU spikes on client side and modify the code logic.
Keep in mind that client CPU are usually not only used by Redis client application, but by other applications in the same AppService plan (in case of WebApps), or by other applications and operating system tasks, on VM's, VMSS's, AKS, etc.

- Upgrade your client to a larger VM size with more CPU capacity.

4- Client Network Bandwidth Exceeded

Client Network Bandwidth exceeded and some other network causes for Redis timeouts are explained on this Tech Community article: Azure Redis Timeouts - Network Issues

5- Client Timeout Configurations

Each Redis client library has its own configuration options and the ability to configure different timeout values. Each option and the respective default value can be different based on the client library used.
As example, StackExchange.Redis uses the below related configuration options that can throws a timeout.

Configuration string	Default value	Description
SyncTimeout	5 seconds	the amount of time to wait on a specific synchronous operation to complete
ResponseTimeout	SyncTimeout	deprecated - the amount of time before the connection be treated as dead - used for backwards compatibility
ConnectTimeout	5 seconds	the amount of time used to the connection attempt
ConnectRetry	3	the number of times to repeat connect attempts during initial Connect
keepAlive	60 seconds	time at which to send a message to help keep sockets alive - see Idle Connections below

How to verify Redis Client Configurations:

To verify Redis Client Configurations you need to look for the connection string or the configuration object in the client application code, and check if the values used are the best for you case scenario and your workload.
Microsoft recommendation is to use only one connection and reuse connections.
Keep in mind that if not following Microsoft recommendations, client application may have different connection strings with different configuration values, in different parts of code.

How to Configure Redis Client Timeout values:

StackExchange.Redis client library:
The connection to Azure Redis could be made through a connection string. Because there are lot of different ways to configure Redis, StackExchange.Redis offers a rich configuration model, which is invoked when calling Connect.

Specific to ConnectTimeout value on Stackexchange.Redis (the amount of time used to the connection attempt), Microsoft recommends the default 5000ms value (see here), giving the system time to connect even under higher load conditions. Depending of each use case, you may want to adjust this value.

The configuration can be either:

a ConfigurationOptions instance:

string configString = "<RedisName>.redis.cache.windows.net, abortConnect=false, ssl=true, password=XXX";
var options = ConfigurationOptions.Parse(configString);
options.AllowAdmin = true;
options.abortConnect=false;
options.connectTimeout=5000;

a string representing the configuration:

string configString = "<RedisName>.redis.cache.windows.net, abortConnect=false, ssl=true, password=XXX, connectTimeout=5000";

the switch between both configuration methods can be done trivially:

ConfigurationOptions options = ConfigurationOptions.Parse(configString);

or:

string configString = options.ToString();

Then use the configString or the ConfigurationOptions instance to create the connection with the new configuration values:

ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(configString);
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(options);

Some other client library specific guidance:

6- Idle Connections

Azure Cache for Redis service configures a 10 minute idle server timeout for connections, independently of the client library used. This is defined on Server side and cannot be changed.
In that case, if the connection is idle for 10 minutes then it will reset the connection and clients may see socket reset errors, timeouts or some other connection error messages, depending of client libraries used.

Some clients like StackExchange.Redis will do regular PING calls to Redis to keep the connection alive (see keepAlive configuration above).
Some clients don't do this, so application developers should implement this periodic ping call to avoid close idle connections.
This is discussed in Azure Redis Best Practices documentation.

How to verify Idle Connections:

If you have idle connections without any Redis request during 10 minutes and you see socket reset error, timeout or some other connection error message, the most probably cause is not having any keep alive policy implemented.
You may also check for the specific client library used, if have any keepalive configuration option, as described for Stackexchange.Redis.

How to mitigate Idle Connections:

Adjust keepalive configuration option to 60 seconds (by default on Stackexchange.Redis), or the correspondent configuration value for different client libraries.
On some cases for some client libraries, you may need to implement your own keepalive process doing a periodic ping to Redis endpoint, to maintain the connation active.

7- High Client Memory Usage

Memory pressure on the client machine may be the cause of many kinds of performance problems that can delay processing of responses from the cache.
When this occur, the system may page data to disk, and page faulting causes the system to slow down significantly.

This may have higher impact on Redis calls, as is expected that Redis calls have high performance, with low response times. For that reason any memory pressure on client side will cause Redis timeouts prior than other services running on client environment.

How to verify High Client Memory Usage:

High Client Memory Usage can be verified at specific timestamp of Redis timeouts, monitoring memory usage on client machine to make sure that it doesn't exceed available memory.

Could be useful also, Monitor the client's Page Faults/Sec performance counter, as page faults can be caused by memory pressure.

How to limit Client Memory Usage:

Client Memory Usage can be reduced, reducing the number of running applications/processes at same time, or investigating the processes using higher amount of memory. If some process is using high amount of memory than the expected, this process/application should be troubleshooted to find the reason for that memory usage.

If all is expected, scaling the client environment to have more memory resources available should be implemented.

Conclusion:

Despite having different reasons to have Redis timeouts on connections to Azure Cache for Redis, some different causes on Client, Server or Network side can cause Redis timeouts, and some more details can be obtained in the documentation below.

For Network or Server side Redis timeout causes, please see these Tech Community articles:

Azure Redis Timeouts - Server Side Issues
Azure Redis Timeouts - Network Issues

Related documentation:

Timeout issues

1- Traffic Burst

StackExchange.Redis timeout exceptions (GitHub article)

ThreadPool Growth: Some Important Details (Jon Cole article)

Client-side issues

BusyIO or BusyWorker threads in the timeout exception (GitHub discussion)

ThreadPoolLogger code sample (from Jon Cole)

Traffic-burst

Recommendation ThreadPool settings