Exchange performance:HP NUMA BIOS settings

I was just made aware of a HP specific setting which has a huge impact on performance. But before you start panic, have a look at the conditions:

  • you’re running Proliant Gen9 Servers
  • these servers are equipped with Intel Xeon E5 2600v3 and higher processors
  • you have the default setting for NUMA Group Size Optimization

If you’re not matching these conditions, you can stop reading and relax. If not….you might want to continue reading.

Update 01.08.2017

Another PFE made me aware that the script HealthChecker.ps1 is checking the setting by comparing the values EnvProcessorCount and NumberOfLogicalProcessors

NUMA_15.png

Good example

NUMA_16.png

Bad example

Symptoms

This is something, which is not really obvious. When you check running processes, you will realize that most of them will utilize only half or less of the logical processors.

Thus means only parts of your CPU resources are used at all and this affects the overall performance of applications.

Resolution

HP published already in year 2015 an advisory about this issue:

http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271227&docId=emr_na-c04650594&docLocale=en_US

NUMA_14.png

The default setting is Clustered, which basically leads to the fact that Windows will create for each physical processor a processor group.

In my case there are Proliant Gen9 server, with 2 physical CPUs, 12 cores each. Thus means in my scenario, we will have 2 processor groups. This is by default not an issue, but by default an application is limited to a single group and therefore will see and use ONLY one group. There needs to be code added in order to support multiple groups.

What is the difference?

You can see the difference by changing the graph for the CPU utilization in the Task Manager.

The following shows when NUMA group Optimization is set to Clustered:

NUMA_01.png

Red is processor group 0 and green processor group 1. Obviously the first processor group is more busy

NUMA_02.png

Same as above with graph set to NUMA nodes

And here when set to Flat:

NUMA_03.png

Same view as above with NUMA group Optimization set to FLAT. You can see the more even utilization of cores.

NUMA_04.png

Same as above with graph set to NUMA nodes

How and what to check

There are several ways to check whether you’re affected or not. Here are some examples:

[System.Environment]::ProcessorCount
NUMA_05.png

System with NUMA group Optimization set to Clustered

NUMA_06.png

System with NUMA group Optimization set to FLAT

  • Task Manager

You can try to Set affinity with the Task Manager

NUMA_07.png

System with NUMA group Optimization set to Clustered gives you an error when you try to change affinity on an IIS worker process

NUMA_08.png

System with NUMA group Optimization set to FLAT let you choose

  • PowerShell Get-Process

When you group by ProcessorAffinity and have processes with a value of 0, means these processes are bound to processor group 0:

Get-Process | Where-Object {$_.ProcessorAffinity -ne $null} | group ProcessorAffinity
NUMA_09.png

16 processes running with ProcessorAffinity 0, which means only on processor group 0

NUMA_10.png

none of the processes are bound to a processor group as both CPUs are in one group

You can identify the maximum ProcessorAffinity by the following command (credit goes to BarryCWT and his script):

[int]$ProcCount = 0
Get-CimInstance -ClassName Win32_Processor | foreach { $ProcCount += $_.NumberOfLogicalProcessors}
$MaxProcAffinity = ([math]::pow(2,$ProcCount) - 1)
$MaxProcAffinity

In our example I have 2 CPUs with 12 cores each. This means a value of 16777215 for All Processors

NUMA_11.png

MaxAffinity calculation

To identify the processes, which would take advantage of this configuration you can query for all processes with a ProcessorAffinity set to 0

Get-Process | Where-Object {$_.ProcessorAffinity -eq 0}

NUMA_12.png

As you can see all IIS worker and, especially interesting, noderunner. Why interesting? When you look at the CommandLine of this process, you will see it’s the one. which is responsible for Exchange Index and anyways is very CPU hungry

Get-CimInstance-Class Win32_Process -Filter "ProcessId='100396'" | fl Path,CommandLine

NUMA_13.png

Conclusion

There is an easy fix available, which solves the issue described above. I’m only wondering why this fact isn’t as popular as it should be. On one side it’s really not a good idea from HP to foul by default the OS, on the other side I’ve found some evidence that Microsoft was aware of this issue, at least for CLR:

https://github.com/aspnet/KestrelHttpServer/issues/650

https://stackoverflow.com/questions/12445175/how-many-processor-cores-does-the-net-task-scheduler-support

Either way, you as the end-user are affected and this doesn’t affect only Exchange. We have seen same behaviour on other systems not running Exchange and changing the settings gave us here also a performance boost.

Looking back across the last 2 years, I’m convinced that a few support calls and CritSit could have been avoided.

I hope this information helps you to avoid unnecessary support calls and a smooth upgrade to the latest Exchange version or just more performance for any other application.

Advertisements

2 thoughts on “Exchange performance:HP NUMA BIOS settings

    • That’s true. For Exchange folks it’s a common sense to disable HT as well as setting the Power Management profile to High Performance. In regards of the CPU I wouldn’t go for more than 24 cores in total with HT disabled. Rather do a scale-out if resources can’t handle the load.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s