Poor Outlook performance and Nagle’s algorithm

I stumbled across a performance issue for Outlook, which was really not easy to troubleshoot:

Some users, migrated to Exchange 2013, reported very poor performance in Outlook. Switching between e-mails or folders was just horrible. Sometimes they even couldn’t connect to Exchange and they got an error like this:

 

OL_Performance_01

But not all users experienced this issue. I also had to mention we started seeing this first on Terminal Servers, where cached mode was not available.

You also need to know that this environment is geographically dispersed. There are several locations distributed across the globe. But the Exchange infrastructure is more centralized.

The weird part was that users from locations far away seems not to suffer the issue. Even the same user, which had the issue, using a Terminal Server in a different location far away had no issue.

Troubleshooting

One of the first steps of troubleshooting issues is to start Outlook without any Add-Ins. Note: The list of command-line switches could be found here.

This increased the performance, but still it was not acceptable. As next I tried Fiddler and created some traces, but this didn’t reveal any issue. Moving the mailbox back to Exchange 2010 solved the issue.

So what is the difference?

The main difference is the way Outlook connects to Exchange. While the mailbox is on Exchange 2010 Outlook uses RPC directly over TCP using the defined static ports (best practise and needed for load balancing. More info could be found here). In Exchange 2013 this is not possible anymore. The default protocol is RPC/HTTP (Outlook Anywhere). With Exchange 2013 SP1 a new protocol was introduced: MAPI over HTTP.

To make it short: TCP based to HTTP based access.

Sanity check

As a sanity check we forced Outlook of an user, which had his mailbox on Exchange 2010, to use Outlook Anywhere. For sure the user experienced the same issue.

This confirmed that there was an issue while using HTTP based protocols.

While working on this I was asked from PFE Marc J. about some specific settings on the load balancer: If Nagle’s algorithm on the load balancer is enabled.

I checked our load balancer and indeed Nagle’s algorithm was enabled on the TCP profile. After the algorithm was disabled the issue was resolved.

This is a result of the fact that the HTTP based packets are much smaller than the TCP based packets. The smaller the packets are the more delays the user will experience.

The user in the remote locations were connected through WAN, while the affected users were in the same DC, just one hop away. At the same time the algorithm helps you on slow links, it can cause issue on LAN. Especially for applications, which expect real time response.

Nagle’s algorithm

What’s all about this algorithm? In general it should help to minimize network congestion. But as Outlook uses small packets in the HTTP based requests this is almost killing your performance. Here some links with more information about:

Resolution

In my case the load balancer was a F5. I checked the TCP profile, used for client connectivity, which was a WAN optimized profile, and unchecked the following box:

 

OL_Performance_02

I reviewed and unchecked also the following ones:

OL_Performance_03 OL_Performance_04

Conclusion

I’ve found some postings about tweaking the TCP stack based upon this KB article. I’m a little bit skeptical about this and wouldn’t recommend it as this could cause other issues.

I rather recommend to fix such issues on the server or network device side and leave the defaults from clients as they are. You never know what the next update will bring. Maybe those settings will be reset.

The nice part of this issue is that everything was working for all users far away. Only the ones, which were connecting from within the same DC, suffered the issue as they had a really low latency.

Lessons learned! I hope this will help someone.

7 thoughts on “Poor Outlook performance and Nagle’s algorithm

  1. Hi Ingo, can you share your thoughts why the users from far remote Locations were not facing the error as there Outlook connect would run through the LBs as well. Maybe the smaller bandwidth avoids Network congestion and the Nagle’s algorithm is not triggered? Thanks, Robert

    Like

  2. Pingback: Poor Outlook performance and Nagle’s algorithm | The clueless guy – Hari Babu

  3. Disabling the Nagle’s algorithm is nothing new. F5 already had this recommendation in their F5 guide for Exchange 2010 back in Nov 2009 when E2010 became GA. While the guide was full of mistakes, they tackled this setting with the proper value.

    Agree on the comment to enable Slow Start as this gives some time for a server to warm up after a restart. e.g. 300 seconds is a good value.

    Like

Leave a comment