Web servers, network security appliances, and system network stacks have a number of time-based thresholds that are intended to insulate these systems from buggy or malicious clients. For example, a denial-of-service (DoS) attack could be mounted by failing to complete the handshake that is implicit in the creation of a TCP connection, but the TCP stack in Windows mitigates this threat by requiring that the handshake complete within a finite time. Similarly, a DoS attack could be mounted against IIS by opening a larger number of TCP connections but never actually issuing an HTTP request. IIS mitigates this threat by requiring that a client submit a fully-formed HTTP request within a certain time before dropping the connection.
Note, however, that there are timeout settings that can be safely increased without compromising the security of the network. For Direct Push, the timeout of concern is the idle connection timeout; that is, given a fully established TCP connection, for how long should that connection be permitted to live in the absence of traffic? Recall that the essence of Direct Push is a long-lived HTTP request: the device issues a request, and the server holds that request without responding until either a device-specified timeout (the "heartbeat interval") expires or new email arrives. When either of those events occurs, the HTTP request is completed, but the connection is idle in the time between the request and response.
Now, the design of Direct Push makes no assumption as to the length of its sessions - email is delivered rapidly whether the heartbeat interval is one minute or thirty minutes. However, using a heartbeat interval of 15-30 minutes has positive implications for battery life and bandwidth consumption: if the Direct Push sessions are permitted to live longer, there will be fewer HTTP roundtrips, less data sent and received, and less power consumed by the device.
We will characterize the different types of DoS threats regarding incoming connections and show that increasing the idle connection timeout neither increases nor decreases the exposure to attack:
- An attacker attempts to create a large number of "half open" TCP connections by only partially completing the TCP handshake process. Increasing idle connection timeouts is unrelated to this type of attack - the time within which a TCP handshake must complete is a separate threshold governed by the Windows TCP/IP stack.
- An attacker establishes a large number of TCP connections but never issues an HTTP request over any of them. These connections will be closed as soon as the "Connection Timeout" value in the IIS management console is exceeded (this defaults to 120 seconds). We realize that this setting is misleadingly named, but here again, increasing the idle connection timeout of the firewall appliance does not further expose an enterprise because the attack is mitigated by a separate threshold that is reached before the firewall's idle connection timeout comes into play.
- An attacker establishes a larger number of TCP connections, issues HTTP requests over all of them, but never consumes the responses. This threat is mitigated by the same timeout as the previous scenario. That is, the "Connection Timeout" setting in IIS defines the time within which a client must issue its first request after a TCP connection is established and any subsequent requests in an HTTP keep-alive scenario.
Microsoft has worked with mobile operators to increase the idle connection timeouts on their outgoing firewalls, but the enterprises that are deploying Direct Push will also need to increase those timeouts on their incoming firewalls. In Microsoft’s own deployment, the timeouts on the firewall are set to thirty minutes.
You Had Me at EHLO.