I stumbled across a performance issue for Outlook, which was really not easy to troubleshoot:
Some users, migrated to Exchange 2013, reported very poor performance in Outlook. Switching between e-mails or folders was just horrible. Sometimes they even couldn’t connect to Exchange and they got an error like this:
But not all users experienced this issue. I also had to mention we started seeing this first on Terminal Servers, where cached mode was not available.
You also need to know that this environment is geographically dispersed. There are several locations distributed across the globe. But the Exchange infrastructure is more centralized.
The weird part was that users from locations far away seems not to suffer the issue. Even the same user, which had the issue, using a Terminal Server in a different location far away had no issue.
One of the first steps of troubleshooting issues is to start Outlook without any Add-Ins. Note: The list of command-line switches could be found here.
This increased the performance, but still it was not acceptable. As next I tried Fiddler and created some traces, but this didn’t reveal any issue. Moving the mailbox back to Exchange 2010 solved the issue.
So what is the difference?
The main difference is the way Outlook connects to Exchange. While the mailbox is on Exchange 2010 Outlook uses RPC directly over TCP using the defined static ports (best practise and needed for load balancing. More info could be found here). In Exchange 2013 this is not possible anymore. The default protocol is RPC/HTTP (Outlook Anywhere). With Exchange 2013 SP1 a new protocol was introduced: MAPI over HTTP.
To make it short: TCP based to HTTP based access.
As a sanity check we forced Outlook of an user, which had his mailbox on Exchange 2010, to use Outlook Anywhere. For sure the user experienced the same issue.
This confirmed that there was an issue while using HTTP based protocols.
While working on this I was asked from PFE Marc J. about some specific settings on the load balancer: If Nagle’s algorithm on the load balancer is enabled.
I checked our load balancer and indeed Nagle’s algorithm was enabled on the TCP profile. After the algorithm was disabled the issue was resolved.
This is a result of the fact that the HTTP based packets are much smaller than the TCP based packets. The smaller the packets are the more delays the user will experience.
The user in the remote locations were connected through WAN, while the affected users were in the same DC, just one hop away. At the same time the algorithm helps you on slow links, it can cause issue on LAN. Especially for applications, which expect real time response.
What’s all about this algorithm? In general it should help to minimize network congestion. But as Outlook uses small packets in the HTTP based requests this is almost killing your performance. Here some links with more information about:
In my case the load balancer was a F5. I checked the TCP profile, used for client connectivity, which was a WAN optimized profile, and unchecked the following box:
I reviewed and unchecked also the following ones:
I’ve found some postings about tweaking the TCP stack based upon this KB article. I’m a little bit skeptical about this and wouldn’t recommend it as this could cause other issues.
I rather recommend to fix such issues on the server or network device side and leave the defaults from clients as they are. You never know what the next update will bring. Maybe those settings will be reset.
The nice part of this issue is that everything was working for all users far away. Only the ones, which were connecting from within the same DC, suffered the issue as they had a really low latency.
Lessons learned! I hope this will help someone.