0x51 LDAP_SERVER_DOWN

Recently I run into the problem where Exchange return with the error:

“An Active Directory error 0x51 occured when trying to check the suitability of Server…”

Server_Down_01

Weird thing this happened not for all commands. It was somehow randomly, but this caused several issues:

  • prompt for credential (which was the most ugly side effect!)
  • errors in scripts
  • CmdLets didn’t return all values
  • …..


When I first saw this error I had a déjà-vu. Last year I had a very long running case with Microsoft, where I had the very similar errors. But back in time Exchange 2010 on Windows Server 2008 R2 was affected. After several Gigabyte of network and LDAP traces it turned out to be an ICMP issue on the OS level:

The LDAP check is using ICMP to evaluate whether the server is up or down. And there was a bug in the ICMP stack, which result in an 0x51 LDAP error even the server was up and healthy. Read this KB for more information.

But now I started seeing this for Exchange 2013 CU10 on Windows Server 2012 R2.

How bad is it?

First I needed to know if this happens on a few server or on all. Therefore I needed to crawl the event logs across all Exchange servers for the EventID 2070. To speed things up I wrote the following function:

function Collect-Events (){
param(
    [Parameter(ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$true,Position=0)]
    [Alias('fqdn')]
    [string] $computername = $env:computername,
    [parameter( Mandatory=$false, ValueFromPipelineByPropertyName=$false,Position=1)]
    [string]$EventID = '2070',
    [parameter( Mandatory=$false, ValueFromPipelineByPropertyName=$false,Position=2)]
    [string]$Eventlog = 'Application',
    [parameter( Mandatory = $false, ValueFromPipelineByPropertyName=$false,Position=3)]
    [DateTime]$StartTime = $((Get-Date).AddHours(-12)),
    [parameter( Mandatory=$false, ValueFromPipelineByPropertyName=$false,Position=4)]
    [ValidateSet("Critical","Error","Warning","Information","Verbose")]
    [string]$Severity
    )
process
    {
        Write-Host "Processing $Computername....."
        If ($Severity) {
        switch ($Severity) {
            "Critical"      {$level = 1}
            "Error"         {$level = 2}
            "Warning"       {$level = 3}
            "Information"   {$level = 4}
            "Verbose"       {$level = 5}
        }
            Get-WinEvent -ComputerName $Computername -FilterHashtable @{logname=$Eventlog;id=$EventID;StartTime=$StartTime;Level=$level} -ErrorAction SilentlyContinue
        }
        Else {
            Get-WinEvent -ComputerName $Computername -FilterHashtable @{logname=$Eventlog;id=$EventID;StartTime=$StartTime} -ErrorAction SilentlyContinue
        }
    }
}

This function has already the correct Event Log(Application) and EventID(2070) predefined. Now you can easily search across your Exchange servers. The following example search for EventID 2070 within the last 2 hours:

$2070 = Get-ExchangeServer | Collect-Events -StartTime (Get-Date).addhours(-2)

It turned out to be a general issue and not only on a few servers. Feel free to use this function to search for different events.

Root cause

After turning on logging for MSExchange ADAccess I could see the servers were heavily using Out-of-Site DC’s and GC’s. Shortly I’ve found the following KB, which explained a lot:

https://support.microsoft.com/kb/3088777

Before you start panic: This is only an issue in larger environments with multiple AD sites! Smaller ones shouldn’t be affected. Just to get an idea: In my case we have over 280 AD sites across the globe and not always the needed network bandwidth and latency, which is in general okay as in our scenario Exchange shouldn’t contact the most of them.

How to fix?

To fix this issue and change the behavior you just have to follow the KB article and edit the file Microsoft.Exchange.Directory.TopologyService.exe.config, and restart the service MSExchangeADTopology.

I have to admin just to restart this service sounds easy, but in the end you have to reboot the server. Not always all depending services could be gracefully restarted with MSExchangeADTopology service.

Conclusion

This change was made in CU6. From Microsoft perspective I understand why the change was made. Cloudwise it makes sense, but for larger on-premise installations this could really cause issue.

I hope this helps someone!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s