I was just made aware of a HP specific setting which has a huge impact on performance. But before you start panic, have a look at the conditions:

you’re running Proliant Gen9 Servers
these servers are equipped with Intel Xeon E5 2600v3 and higher processors
you have the default setting for NUMA Group Size Optimization

If you’re not matching these conditions, you can stop reading and relax. If not….you might want to continue reading.

Update 28.08.2017

As mentioned before it affects not only Exchange. Credit goes to Nicholas, who highlighted the following KB for Lync/SfB:

Bug Check 0x133 DPC_WATCHDOG_VIOLATION error on Lync/Skype for Business Edge server

Update 01.08.2017

Another PFE made me aware that the script HealthChecker.ps1 is checking the setting by comparing the values EnvProcessorCount and NumberOfLogicalProcessors

Good example

Bad example

Symptoms

This is something, which is not really obvious. When you check running processes, you will realize that most of them will utilize only half or less of the logical processors.

Thus means only parts of your CPU resources are used at all and this affects the overall performance of applications.

Resolution

HP published already in year 2015 an advisory about this issue:

http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271227&docId=emr_na-c04650594&docLocale=en_US

The default setting is Clustered, which basically leads to the fact that Windows will create for each physical processor a processor group.

In my case there are Proliant Gen9 server, with 2 physical CPUs, 12 cores each. Thus means in my scenario, we will have 2 processor groups. This is by default not an issue, but by default an application is limited to a single group and therefore will see and use ONLY one group. There needs to be code added in order to support multiple groups.

What is the difference?

You can see the difference by changing the graph for the CPU utilization in the Task Manager.

The following shows when NUMA group Optimization is set to Clustered:

Red is processor group 0 and green processor group 1. Obviously the first processor group is more busy

Same as above with graph set to NUMA nodes

And here when set to Flat:

Same view as above with NUMA group Optimization set to FLAT. You can see the more even utilization of cores.

Same as above with graph set to NUMA nodes

How and what to check

There are several ways to check whether you’re affected or not. Here are some examples:

Query System.Environment.ProcessorCount

[System.Environment]::ProcessorCount

System with NUMA group Optimization set to Clustered

System with NUMA group Optimization set to FLAT

Task Manager

You can try to Set affinity with the Task Manager

System with NUMA group Optimization set to Clustered gives you an error when you try to change affinity on an IIS worker process

System with NUMA group Optimization set to FLAT let you choose

PowerShell Get-Process

When you group by ProcessorAffinity and have processes with a value of 0, means these processes are bound to processor group 0:

Get-Process | Where-Object {$_.ProcessorAffinity -ne $null} | group ProcessorAffinity

16 processes running with ProcessorAffinity 0, which means only on processor group 0

none of the processes are bound to a processor group as both CPUs are in one group

You can identify the maximum ProcessorAffinity by the following command (credit goes to BarryCWT and his script):

[int]$ProcCount = 0
Get-CimInstance -ClassName Win32_Processor | foreach { $ProcCount += $_.NumberOfLogicalProcessors}
$MaxProcAffinity = ([math]::pow(2,$ProcCount) - 1)
$MaxProcAffinity

In our example I have 2 CPUs with 12 cores each. This means a value of 16777215 for All Processors

MaxAffinity calculation

To identify the processes, which would take advantage of this configuration you can query for all processes with a ProcessorAffinity set to 0

Get-Process | Where-Object {$_.ProcessorAffinity -eq 0}

As you can see all IIS worker and, especially interesting, noderunner. Why interesting? When you look at the CommandLine of this process, you will see it’s the one. which is responsible for Exchange Index and anyways is very CPU hungry

Get-CimInstance-Class Win32_Process -Filter "ProcessId='100396'" | fl Path,CommandLine

Conclusion

There is an easy fix available, which solves the issue described above. I’m only wondering why this fact isn’t as popular as it should be. On one side it’s really not a good idea from HP to foul by default the OS, on the other side I’ve found some evidence that Microsoft was aware of this issue, at least for CLR:

https://github.com/aspnet/KestrelHttpServer/issues/650

https://stackoverflow.com/questions/12445175/how-many-processor-cores-does-the-net-task-scheduler-support

Either way, you as the end-user are affected and this doesn’t affect only Exchange. We have seen same behaviour on other systems not running Exchange and changing the settings gave us here also a performance boost.

Looking back across the last 2 years, I’m convinced that a few support calls and CritSit could have been avoided.

I hope this information helps you to avoid unnecessary support calls and a smooth upgrade to the latest Exchange version or just more performance for any other application.

11 thoughts on “Exchange performance:HP NUMA BIOS settings”

smartwindows on July 28, 2017 at 1:24 am said:

Take care with using this setting. Processor Group can go up to 64 LP. If not needed disable Hyper-Threading. https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx

LikeLike

Reply ↓
- Ingo Gegenwarth on July 28, 2017 at 7:42 am said:
  
  That’s true. For Exchange folks it’s a common sense to disable HT as well as setting the Power Management profile to High Performance. In regards of the CPU I wouldn’t go for more than 24 cores in total with HT disabled. Rather do a scale-out if resources can’t handle the load.
  
  LikeLike
  
  Reply ↓
Pingback: HP Proliant Gen9 Numa BIOS Ayarının Exchange Performansına Etkisi - Paylaşabildiklerim
Brian Knutsson on June 26, 2019 at 9:14 am said:

You should mention that there is a reason that the CPU’s are divided into Numa nodes. The reason is that each CPU has its own local memory, and it is very costly to retrieve memory from another CPU’s memory banks. That is why things like SQL and Exchange are Numa aware. So even though it seems like a good idea it might not be, since you data processing will be much slower.

LikeLike

Reply ↓
Brian Knutsson on June 26, 2019 at 9:39 am said:

Turns out that Exchange was a bad example, as Exchange 2013 and 2016 is NOT NUMA aware. I have not been able to find information regarding Exchange 2019. But it does not remove the memory latency problem when traversing Numa groups to access “remote” memory.

SQL however is NUMA aware as stated.

LikeLike

Reply ↓
- Ingo Gegenwarth on June 26, 2019 at 10:03 am said:
  
  Hi Brian,
  I didn’t go into these details as I have linked to the Docs from Microsoft for explanation. You’re also correct that Exchange is not NUMA aware, which I mentioned as well. It would need some code change, which SQL already has.
  I agree that this is not perfect, but this solves a huge performance impact and that’s the reason for this post. I was also approached by HP and AFAIK the default settings was changed in the newer versions.
  Ciao,
  Ingo
  
  LikeLike
  
  Reply ↓
Taylor on August 14, 2019 at 1:13 pm said:

Does anyone know how can we change NUMA Group from clustered to flat using Powershell?

LikeLiked by 1 person

Reply ↓
- Ingo Gegenwarth on August 15, 2019 at 12:46 pm said:
  
  Hi Taylor,
  there is the CLI module from HP, which can be used for this. They have some examples on GitHub: https://github.com/HewlettPackard/PowerShell-ProLiant-SDK
  Ciao,
  Ingo
  
  LikeLike
  
  Reply ↓
  - Taylor on August 19, 2019 at 6:38 pm said:
    
    Can you please be more specific where can I find the Powershell commands that change the NUMA group to flat? I haven’t found it in GitLab.
    
    LikeLike
wtg on September 10, 2021 at 12:33 pm said:

I set to flat, but program doesn’t use all core, it still uses cores only from Numa 0 or Numa 1

LikeLike

Reply ↓
- Ingo Gegenwarth on September 11, 2021 at 8:04 am said:
  
  Which „program“?
  
  LikeLike
  
  Reply ↓