Exchange performance:HP NUMA BIOS settings

I was just made aware of a HP specific setting which has a huge impact on performance. But before you start panic, have a look at the conditions:

  • you’re running Proliant Gen9 Servers
  • these servers are equipped with Intel Xeon E5 2600v3 and higher processors
  • you have the default setting for NUMA Group Size Optimization

If you’re not matching these conditions, you can stop reading and relax. If not….you might want to continue reading.

Update 28.08.2017

As mentioned before it affects not only Exchange. Credit goes to Nicholas, who highlighted the following KB for Lync/SfB:

Bug Check 0x133 DPC_WATCHDOG_VIOLATION error on Lync/Skype for Business Edge server

Update 01.08.2017

Another PFE made me aware that the script HealthChecker.ps1 is checking the setting by comparing the values EnvProcessorCount and NumberOfLogicalProcessors

NUMA_15.png

Good example

NUMA_16.png

Bad example

Symptoms

This is something, which is not really obvious. When you check running processes, you will realize that most of them will utilize only half or less of the logical processors.

Thus means only parts of your CPU resources are used at all and this affects the overall performance of applications.

Resolution

HP published already in year 2015 an advisory about this issue:

http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271227&docId=emr_na-c04650594&docLocale=en_US

NUMA_14.png

The default setting is Clustered, which basically leads to the fact that Windows will create for each physical processor a processor group.

In my case there are Proliant Gen9 server, with 2 physical CPUs, 12 cores each. Thus means in my scenario, we will have 2 processor groups. This is by default not an issue, but by default an application is limited to a single group and therefore will see and use ONLY one group. There needs to be code added in order to support multiple groups.

What is the difference?

You can see the difference by changing the graph for the CPU utilization in the Task Manager.

The following shows when NUMA group Optimization is set to Clustered:

NUMA_01.png

Red is processor group 0 and green processor group 1. Obviously the first processor group is more busy

NUMA_02.png

Same as above with graph set to NUMA nodes

And here when set to Flat:

NUMA_03.png

Same view as above with NUMA group Optimization set to FLAT. You can see the more even utilization of cores.

NUMA_04.png

Same as above with graph set to NUMA nodes

How and what to check

There are several ways to check whether you’re affected or not. Here are some examples:

[System.Environment]::ProcessorCount
NUMA_05.png

System with NUMA group Optimization set to Clustered

NUMA_06.png

System with NUMA group Optimization set to FLAT

  • Task Manager

You can try to Set affinity with the Task Manager

NUMA_07.png

System with NUMA group Optimization set to Clustered gives you an error when you try to change affinity on an IIS worker process

NUMA_08.png

System with NUMA group Optimization set to FLAT let you choose

  • PowerShell Get-Process

When you group by ProcessorAffinity and have processes with a value of 0, means these processes are bound to processor group 0:

Get-Process | Where-Object {$_.ProcessorAffinity -ne $null} | group ProcessorAffinity
NUMA_09.png

16 processes running with ProcessorAffinity 0, which means only on processor group 0

NUMA_10.png

none of the processes are bound to a processor group as both CPUs are in one group

You can identify the maximum ProcessorAffinity by the following command (credit goes to BarryCWT and his script):

[int]$ProcCount = 0
Get-CimInstance -ClassName Win32_Processor | foreach { $ProcCount += $_.NumberOfLogicalProcessors}
$MaxProcAffinity = ([math]::pow(2,$ProcCount) - 1)
$MaxProcAffinity

In our example I have 2 CPUs with 12 cores each. This means a value of 16777215 for All Processors

NUMA_11.png

MaxAffinity calculation

To identify the processes, which would take advantage of this configuration you can query for all processes with a ProcessorAffinity set to 0

Get-Process | Where-Object {$_.ProcessorAffinity -eq 0}

NUMA_12.png

As you can see all IIS worker and, especially interesting, noderunner. Why interesting? When you look at the CommandLine of this process, you will see it’s the one. which is responsible for Exchange Index and anyways is very CPU hungry

Get-CimInstance-Class Win32_Process -Filter "ProcessId='100396'" | fl Path,CommandLine

NUMA_13.png

Conclusion

There is an easy fix available, which solves the issue described above. I’m only wondering why this fact isn’t as popular as it should be. On one side it’s really not a good idea from HP to foul by default the OS, on the other side I’ve found some evidence that Microsoft was aware of this issue, at least for CLR:

https://github.com/aspnet/KestrelHttpServer/issues/650

https://stackoverflow.com/questions/12445175/how-many-processor-cores-does-the-net-task-scheduler-support

Either way, you as the end-user are affected and this doesn’t affect only Exchange. We have seen same behaviour on other systems not running Exchange and changing the settings gave us here also a performance boost.

Looking back across the last 2 years, I’m convinced that a few support calls and CritSit could have been avoided.

I hope this information helps you to avoid unnecessary support calls and a smooth upgrade to the latest Exchange version or just more performance for any other application.

11 thoughts on “Exchange performance:HP NUMA BIOS settings

    • That’s true. For Exchange folks it’s a common sense to disable HT as well as setting the Power Management profile to High Performance. In regards of the CPU I wouldn’t go for more than 24 cores in total with HT disabled. Rather do a scale-out if resources can’t handle the load.

      Like

  1. Pingback: HP Proliant Gen9 Numa BIOS Ayarının Exchange Performansına Etkisi - Paylaşabildiklerim

  2. You should mention that there is a reason that the CPU’s are divided into Numa nodes. The reason is that each CPU has its own local memory, and it is very costly to retrieve memory from another CPU’s memory banks. That is why things like SQL and Exchange are Numa aware. So even though it seems like a good idea it might not be, since you data processing will be much slower.

    Like

  3. Turns out that Exchange was a bad example, as Exchange 2013 and 2016 is NOT NUMA aware. I have not been able to find information regarding Exchange 2019. But it does not remove the memory latency problem when traversing Numa groups to access “remote” memory.

    SQL however is NUMA aware as stated.

    Like

    • Hi Brian,
      I didn’t go into these details as I have linked to the Docs from Microsoft for explanation. You’re also correct that Exchange is not NUMA aware, which I mentioned as well. It would need some code change, which SQL already has.
      I agree that this is not perfect, but this solves a huge performance impact and that’s the reason for this post. I was also approached by HP and AFAIK the default settings was changed in the newer versions.
      Ciao,
      Ingo

      Like

Leave a comment