Exchange OnPremises mailflow issue ( Remote(ConnectionReset) )

Copper Contributor

we have an organization in hybrid with central exchange 2016 relay servers that relay the inbound messages to the satelite locations. The inbound mail from O365/internet arrive just fine on the relay servers and then get queued for delivery to the onpremises exchange server. All onpremises exchange reside in the same AD Forest and mail-delivery TLS is based on the exchange server trusted certificates.

 

the satelite location network connection is using wireless broadband that can be either Sattelite/4G/WiFi based on availability of the best provider. The messages queue and in some cases the messages are stuck and the error in the smtpreceive log is "Remote(ConnectionReset)". The lasterror on the sending relay server is:

LastError Status
--------- ------
450 4.4.318 Connection was closed abruptly (SuspiciousRemoteServerError) Retry

 

The network traffic capture with wireshark shows a lot of package retransmit and finally you see RST (reset) messages.

The OUTBOUND traffic from the satellite location towards the central relay seems to be just fine which makes me wonder where to search for the real issue/cause.

As the nature of the WAN connection via satellite/4g/longe range wifi is sensitive to dataloss/inefficiency that should impact both directions but I see mainly issues inbound towards the satellite exchange server.

 

any suggestions related to TCP-stack optimization (Set-NetTcpSetting profile optimization??) or which exchange log to check if the receiving server is the issue or the network?

 

the intrusion detection (checkpoint) has been bypassed for the exchange related traffic.

 

Thanks for any suggestions.

Rgds,

Eric

2 Replies

@env296 out of curiosity, are you using port 25 on the WAN ip of the 4g devices? Or do you establish VPN tunnels first and then tunnel the SMTP traffic? I would think a VPN tunnel would help you to bypass any port filters that might be imposed by the 4g cellular carriers. 

Hi, sorry for the huge delay. I did not get a notification apparently.
The vessels always setup a site-to-site VPN and all datacenter traffic goes through that tunnel.
The issue is mostly under control and the root cause was related to QOS capping of the inbound to vessel traffic. Setting the correct qos made the flow more stable. Still Capping/throttling can happen by the provider of the WiFi/wimood/sat/4G connection which is out of our control and a big storm or other disruption factors also cause troubles.

Our solution was to remove transfer limits for mail related traffic on tcp 2525 for inter-server traffic. This still has TLS which doesn’t allow us to compress/optimize the data using riverbed. The next step is to setup dedicated send/receive connectors without TLS for server-to-server communication accross the vpn tunnel and optimize that traffic using riverbed steelheads. Most importantly reduce the maximum size of the messages.

Hope this helps someone.

Rgds
Eric