I was just made aware of a HP specific setting which has a huge impact on performance. But before you start panic, have a look at the conditions:
- you’re running Proliant Gen9 Servers
- these servers are equipped with Intel Xeon E5 2600v3 and higher processors
- you have the default setting for NUMA Group Size Optimization
If you’re not matching these conditions, you can stop reading and relax. If not….you might want to continue reading.
Update 28.08.2017
As mentioned before it affects not only Exchange. Credit goes to Nicholas, who highlighted the following KB for Lync/SfB:
Bug Check 0x133 DPC_WATCHDOG_VIOLATION error on Lync/Skype for Business Edge server
Update 01.08.2017
Another PFE made me aware that the script HealthChecker.ps1 is checking the setting by comparing the values EnvProcessorCount and NumberOfLogicalProcessors
Symptoms
This is something, which is not really obvious. When you check running processes, you will realize that most of them will utilize only half or less of the logical processors.
Thus means only parts of your CPU resources are used at all and this affects the overall performance of applications.
Resolution
HP published already in year 2015 an advisory about this issue:
The default setting is Clustered, which basically leads to the fact that Windows will create for each physical processor a processor group.
In my case there are Proliant Gen9 server, with 2 physical CPUs, 12 cores each. Thus means in my scenario, we will have 2 processor groups. This is by default not an issue, but by default an application is limited to a single group and therefore will see and use ONLY one group. There needs to be code added in order to support multiple groups.
What is the difference?
You can see the difference by changing the graph for the CPU utilization in the Task Manager.
The following shows when NUMA group Optimization is set to Clustered:

Red is processor group 0 and green processor group 1. Obviously the first processor group is more busy
And here when set to Flat:

Same view as above with NUMA group Optimization set to FLAT. You can see the more even utilization of cores.
How and what to check
There are several ways to check whether you’re affected or not. Here are some examples:
[System.Environment]::ProcessorCount
- Task Manager
You can try to Set affinity with the Task Manager

System with NUMA group Optimization set to Clustered gives you an error when you try to change affinity on an IIS worker process
- PowerShell Get-Process
When you group by ProcessorAffinity and have processes with a value of 0, means these processes are bound to processor group 0:
Get-Process | Where-Object {$_.ProcessorAffinity -ne $null} | group ProcessorAffinity
You can identify the maximum ProcessorAffinity by the following command (credit goes to BarryCWT and his script):
[int]$ProcCount = 0 Get-CimInstance -ClassName Win32_Processor | foreach { $ProcCount += $_.NumberOfLogicalProcessors} $MaxProcAffinity = ([math]::pow(2,$ProcCount) - 1) $MaxProcAffinity
In our example I have 2 CPUs with 12 cores each. This means a value of 16777215 for All Processors
To identify the processes, which would take advantage of this configuration you can query for all processes with a ProcessorAffinity set to 0
Get-Process | Where-Object {$_.ProcessorAffinity -eq 0}
As you can see all IIS worker and, especially interesting, noderunner. Why interesting? When you look at the CommandLine of this process, you will see it’s the one. which is responsible for Exchange Index and anyways is very CPU hungry
Get-CimInstance-Class Win32_Process -Filter "ProcessId='100396'" | fl Path,CommandLine
Conclusion
There is an easy fix available, which solves the issue described above. I’m only wondering why this fact isn’t as popular as it should be. On one side it’s really not a good idea from HP to foul by default the OS, on the other side I’ve found some evidence that Microsoft was aware of this issue, at least for CLR:
https://github.com/aspnet/KestrelHttpServer/issues/650
Either way, you as the end-user are affected and this doesn’t affect only Exchange. We have seen same behaviour on other systems not running Exchange and changing the settings gave us here also a performance boost.
Looking back across the last 2 years, I’m convinced that a few support calls and CritSit could have been avoided.
I hope this information helps you to avoid unnecessary support calls and a smooth upgrade to the latest Exchange version or just more performance for any other application.
Take care with using this setting. Processor Group can go up to 64 LP. If not needed disable Hyper-Threading. https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx
LikeLike
That’s true. For Exchange folks it’s a common sense to disable HT as well as setting the Power Management profile to High Performance. In regards of the CPU I wouldn’t go for more than 24 cores in total with HT disabled. Rather do a scale-out if resources can’t handle the load.
LikeLike
Pingback: HP Proliant Gen9 Numa BIOS Ayarının Exchange Performansına Etkisi - Paylaşabildiklerim
You should mention that there is a reason that the CPU’s are divided into Numa nodes. The reason is that each CPU has its own local memory, and it is very costly to retrieve memory from another CPU’s memory banks. That is why things like SQL and Exchange are Numa aware. So even though it seems like a good idea it might not be, since you data processing will be much slower.
LikeLike
Turns out that Exchange was a bad example, as Exchange 2013 and 2016 is NOT NUMA aware. I have not been able to find information regarding Exchange 2019. But it does not remove the memory latency problem when traversing Numa groups to access “remote” memory.
SQL however is NUMA aware as stated.
LikeLike
Hi Brian,
I didn’t go into these details as I have linked to the Docs from Microsoft for explanation. You’re also correct that Exchange is not NUMA aware, which I mentioned as well. It would need some code change, which SQL already has.
I agree that this is not perfect, but this solves a huge performance impact and that’s the reason for this post. I was also approached by HP and AFAIK the default settings was changed in the newer versions.
Ciao,
Ingo
LikeLike
Does anyone know how can we change NUMA Group from clustered to flat using Powershell?
LikeLike
Hi Taylor,
there is the CLI module from HP, which can be used for this. They have some examples on GitHub: https://github.com/HewlettPackard/PowerShell-ProLiant-SDK
Ciao,
Ingo
LikeLike
Can you please be more specific where can I find the Powershell commands that change the NUMA group to flat? I haven’t found it in GitLab.
LikeLike
I set to flat, but program doesn’t use all core, it still uses cores only from Numa 0 or Numa 1
LikeLike
Which „program“?
LikeLike