%3CLINGO-SUB%20id%3D%22lingo-sub-255668%22%20slang%3D%22en-US%22%3EWindows%20Server%20and%20Processor%20Cores...%20(Part%201)%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-255668%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSTRONG%3E%20First%20published%20on%20TechNet%20on%20Jul%2030%2C%202012%20%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ERecently%20someone%20asked%20me%20of%20my%20thoughts%20on%20how%20Windows%20Server%20handles%20processor%20cores.%20With%20newer%20processors%20available%20with%20more%20than%202%20or%204%20cores%20each%2C%20it%20seemed%20like%20a%20good%20time%20to%20revisit%20this%20topic.%20If%20you%20have%20a%20system%20with%20multiple%20processor%20sockets%20and%20a%20few%20new%20processors%20with%202%2C%204%2C%206%2C%2010%2C%20or%20more%20cores%20each%E2%80%A6what%20should%20you%20expect%20Windows%20to%20do%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWith%20use%20of%20multi-core%20processors%20becoming%20more%20prevalent%20in%20not%20only%20servers%20but%20desktops%20and%20maybe%20even%20your%20next%20cell%20phone%20or%20TV%20remote%E2%80%A6it%20makes%20sense%20to%20review%20how%20Windows%20Server%20makes%20use%20of%20processors%E2%80%A6since%20that%20is%20where%20you%E2%80%99re%20more%20likely%20to%20see%20higher%20densities%20of%20cores%20on%20a%20single%20physical%20processor.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWindows%20Server%20licensing%20is%20by%20processor%20socket%20(physical%20processor.)%20For%20example%2C%20say%20the%20edition%20of%20Windows%20Server%20you%20have%20indicates%20that%20it%20supports%204%20physical%20processors.%20If%20you%20have%20dual%20core%20processors%20in%20four%20available%20processor%20sockets%2C%20that%20would%20provide%208%20logical%20processors%20for%20the%20OS.%20If%20the%20processors%20also%20support%20Hyperthreading%20(HT)%20and%20the%20system%20has%20this%20option%20enabled%2C%20then%20the%20total%20logical%20processor%20count%20may%20then%20be%2016.%2016%20processors%20as%20compared%20to%208%E2%80%A6seems%20like%20a%20no-brainer%20to%20have%20twice%20as%20many%2C%20right%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EDon%E2%80%99t%20confuse%20processor%20cores%20with%20extra%20logical%20processors%20available%20with%20HT%20enabled.%20Some%20configurations%20with%20HT%20might%20present%20one%20extra%20logical%20processor%20per%20core%20while%20others%20may%20present%20more.%20Cores%20and%20logical%20processors%20from%20HT%20are%20two%20different%20things%20and%20the%20expected%20performance%20may%20be%20different%20than%20expected.%20Additional%20processor%20cores%20are%20practically%20just%20like%20additional%20physical%20processors%20without%20requiring%20extra%20sockets%20for%20them%20on%20the%20system%20board.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThe%20way%20I%20like%20to%20think%20of%20HT%20is%20to%20think%20of%20one%20of%20those%20old%20pizza%20shops%20where%20the%20pizza%20maker%20is%20tossing%20the%20dough%20in%20the%20air%20in%20preparation%20to%20make%20a%20pizza.%20You%20know%E2%80%A6the%20old-fashioned%20way.%20Instead%20of%20being%20able%20to%20toss%20just%20one%20pizza%20at%20a%20time%E2%80%A6imagine%20the%20same%20person%20tossing%20and%20spinning%20two%20of%20them%20simultaneously.%20Performance%20of%20this%20person%20compared%20to%20two%20separate%20people%20performing%20the%20same%20task%20may%20not%20be%20equivalent%20and%20may%20be%20somewhere%20in-between.%20For%20HT%20compatible%20processors%20that%20provide%20an%20extra%20logical%20processor%20per%20physical%20core%2C%20HT%20allows%20a%20processor%20core%20to%20run%20one%20additional%20concurrent%20thread%20per%20HT%20logical%20processor%20exposed%20but%20sharing%20on-chip%20resources%20like%20cache.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EFor%20purposes%20of%20illustration%2C%20assume%20you%20have%20a%20single%20core%20processor%20that%20supports%20HT%20and%20provides%20a%20single%20HT%20logical%20processor.%20In%20that%20configuration%2C%20both%20logical%20processors%20share%20resources%20on%20the%20chip.%20A%20HT%20processor%20typically%20will%20not%20provide%20the%20same%20performance%20as%20two%20single-threaded%20processors%20but%20may%20provide%20better%20performance%20than%20a%20single%20processor.%20With%20expected%20performance%20of%20a%20HT%20logical%20processor%20being%20somewhere%20in-between%2C%20the%20performance%20gains%20achieved%20in%20a%20HT%20configuration%20will%20vary%20by%20application.%20While%20I%20truly%20believe%20that%20HT%20on%20today%E2%80%99s%20hardware%20is%20better%20implemented%20and%20performs%20better%20than%20in%20years%20past%2C%20I%20don%E2%80%99t%20factor%20HT%20into%20sizing%20a%20system%20HT%20can%20be%20a%20good%20performance%20benefit%20to%20have%20on%20hand%20if%20you%20need%20it%2C%20but%20I%E2%80%99ve%20not%20seen%20the%20performance%20to%20be%20that%20much%20greater.%20I%E2%80%99ve%20consistently%20thought%20of%20HT%20as%20yielding%20more%20compute%20power%20than%20dealing%20with%20I%2FO.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EYou%20can%20search%20the%20net%20and%20find%20a%20variety%20of%20opinions%20on%20this%20topic.%20You%20may%20form%20your%20own%20opinion.%20There%20are%20also%20some%20applications%20that%20suggest%20or%20require%20disabling%20HT%20because%20of%20the%20impact%20to%20the%20application.%20The%20advice%20I%E2%80%99ve%20consistently%20given%20has%20been%20to%20size%20systems%20according%20to%20physical%20processors%20and%20cores.%20Then%20use%20Performance%20Monitor%20to%20determine%20if%20HT%20provides%20additional%20gain.%20And%2C%20of%20course%2C%20if%20an%20application%20says%20don%E2%80%99t%20use%20it%E2%80%A6the%20vendor%20may%20have%20a%20reason.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EHow%20many%20cores%20then%20will%20Windows%20Server%20allow%3F%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThe%20number%20of%20possible%20logical%20processors%20prior%20to%20Windows%20Server%202008%20R2%20was%20based%20on%20the%20number%20of%20bits.%20For%20instance%2C%20a%2032-bit%20OS%20could%20use%2032%20logical%20processors%3B%2064-bit%20could%20use%2064%20logical%20processors.%20This%20is%20confirmed%20by%20Mark%20Russinovich%E2%80%99s%20presentation%20on%20R2%E2%80%99s%20kernel%20changes%20(available%20on%20the%20Microsoft%20Download%20Center.)%20Windows%20Server%202008%20R2%20extends%20this%20limit%20by%20allowing%20up%20to%204%20groups%20of%20up%20to%2064%20processors%20each.%20Doing%20the%20math%2C%20that%20translates%20to%20a%20maximum%20of%20256%20logical%20processors%20for%20Windows%20Server%202008%20R2.%20That%20alone%20would%20be%20enough%20for%20me%20to%20jump%20to%20R2%20if%20I%20were%20an%20administrator%20using%20very%20expensive%20hardware%20with%20lots%20of%20processor%20cores%E2%80%A6especially%20for%20virtualization.%3C%2FP%3E%0A%3CP%3EThe%20Windows%20Server%202008%20R2%20kernel%20establishes%20processor%20groups%20(K-Groups)%20at%20boot%20time%3B%20they%20are%20not%20customizable%20by%20an%20administrator%20after%20startup.%20However%2C%20according%20to%20KB2506384%2C%20there%20exists%20a%20way%20to%20manually%20adjust%20K-Group%20assignments%20to%20your%20liking%20for%20the%20next%20boot%20of%20the%20OS.%20K-Groups%20may%20contain%20one%20or%20more%20NUMA%20nodes.%20Windows%20attempts%20to%20place%20all%20processors%20from%20a%20given%20NUMA%20node%20in%20the%20same%20group%20where%20possible.%20Systems%20with%20less%20than%2064%20logical%20processors%20will%20have%20only%20a%20single%20group.%20From%20a%20scheduling%20standpoint%2C%20threads%20are%20assigned%20to%20only%20one%20group%20at%20a%20time.%20Also%2C%20an%20interrupt%20may%20target%20only%20the%20processors%20of%20a%20single%20group.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EWhat%20happens%20when%20a%20physical%20processor%20has%20multiple%20cores%20or%20a%20given%20core%20has%20multiple%20logical%20processors%3F%20%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThe%20answer%20to%20this%20question%20depends%20on%20whether%20you're%20using%20Windows%20Server%202008%20R2%20RTM%20or%20with%20applicable%20updates%20that%20alter%20default%20behavior.%20Using%20the%20RTM%20version%20of%20Windows%20Server%202008%20R2%2C%20the%20kernel%20attempts%20to%20place%20all%20cores%20of%20a%20given%20physical%20processor%20in%20the%20same%20group%20whenever%20possible.%20If%20using%20processors%20where%20the%20number%20of%20cores%20per%20chip%20isn%E2%80%99t%20an%20even%20multiple%20of%202%2C%20then%20some%20cores%20on%20a%20physical%20processor%20may%20be%20split%20between%20groups.%20For%20example%2C%20if%20using%2012%20processors%20with%206%20cores%20each%2C%20the%20total%20number%20of%20processor%20cores%20would%20be%2072.%20This%20would%20result%20in%20one%20group%20of%2064%20processors%2C%20and%20a%20second%20group%20of%208.%20The%20eleventh%20physical%20processor%20would%20have%204%20cores%20in%20the%20first%20group%2C%20with%20the%20remaining%20two%20cores%20the%20second%20group%20along%20with%20all%20six%20cores%20of%20processor%2012.%20For%20some%20applications%2C%20uneven%20groups%20can%20be%20problematic.%20Additionally%2C%20minor%20hardware%20differences%20between%20seemingly%20identical%20systems%20could%20result%20in%20one%20with%20a%20%7B64%2C8%7D%20grouping%20and%20another%20with%20%7B8%2C64%7D.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIf%20using%20Windows%20Server%202008%20R2%20with%20KB%202510206%20(or%20future%20service%20pack%20containing%20this%20update)%2C%20the%20kernel%20will%20attempt%20to%20balance%20processors%20amongst%20groups.%20With%20the%20preceding%20example%20of%2072%20logical%20processor%20cores%2C%20the%20resulting%20groups%20would%20each%20contain%2036.%20The%20update%20provides%20predictability%20and%20balance%20without%20requiring%20manual%20K-Group%20specification%20as%20per%20KB2506384.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIf%20using%20Windows%20Server%202008%20with%20more%20than%2064%20cores%2C%20you%20would%20not%20be%20able%20to%20utilize%20extra%20cores%20above%20that%20limit%20even%20though%20they%20may%20exist.%20Windows%20Server%202008%20R2%20can%20utilize%20processor%20groups%20and%20allow%20use%20of%20these%20additional%20cores%20up%20to%20the%20maximum%20of%20256.%20This%20isn%E2%80%99t%20the%20only%20reason%20to%20consider%20moving%20to%20Windows%20Server%202008%20R2%E2%80%A6there%20certainly%20are%20many%20more.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EAdditional%20References%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E2510206%20Performance%20issues%20when%20more%20than%2064%20logical%20processors%20are%20used%20in%20Windows%20Server%202008%20R2%3CBR%20%2F%3E%3CA%20href%3D%22http%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2510206%2FEN-US%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttp%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2510206%2FEN-US%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E2546706%20A%20Windows%20Server%202008%20R2-based%20computer%20that%20has%20some%20NUMA-based%20processors%20and%20more%20than%20256%20logical%20processors%20runs%20in%20SMP%20mode%20as%20a%2064-processor%20system%20and%20may%20experience%20decreased%20performance%3CBR%20%2F%3E%3CA%20href%3D%22http%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2546706%2FEN-US%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttp%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2546706%2FEN-US%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E2517752%20%220x0000000A%22%20Stop%20error%20occurs%20during%20the%20shutdown%20process%20on%20a%20computer%20that%20is%20running%20Windows%20Server%202008%20and%20that%20has%20more%20than%2064%20processors%20installed%3CBR%20%2F%3E%3CA%20href%3D%22http%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2517752%2FEN-US%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttp%3A%2F%2Fsupport.microsoft.com%2Fkb%2F2517752%2FEN-US%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSPAN%3ESysinternals%20CoreInfo%20tool%20can%20show%20logical%20processor%20to%20physical%20processor%20mapping%3CSTRONG%3E%3CBR%20%2F%3E%3C%2FSTRONG%3E%3CA%20href%3D%22http%3A%2F%2Ftechnet.microsoft.com%2Fen-us%2Fsysinternals%2Fcc835722%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttp%3A%2F%2Ftechnet.microsoft.com%2Fen-us%2Fsysinternals%2Fcc835722%3C%2FA%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-255668%22%20slang%3D%22en-US%22%3E%3CP%3EFirst%20published%20on%20TechNet%20on%20Jul%2030%2C%202012%20Recently%20someone%20asked%20me%20of%20my%20thoughts%20on%20how%20Windows%20Server%20handles%20processor%20cores.%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-255668%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EMartinLucas%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Microsoft

First published on TechNet on Jul 30, 2012

 

Recently someone asked me of my thoughts on how Windows Server handles processor cores. With newer processors available with more than 2 or 4 cores each, it seemed like a good time to revisit this topic. If you have a system with multiple processor sockets and a few new processors with 2, 4, 6, 10, or more cores each…what should you expect Windows to do?

 

With use of multi-core processors becoming more prevalent in not only servers but desktops and maybe even your next cell phone or TV remote…it makes sense to review how Windows Server makes use of processors…since that is where you’re more likely to see higher densities of cores on a single physical processor.

 

Windows Server licensing is by processor socket (physical processor.) For example, say the edition of Windows Server you have indicates that it supports 4 physical processors. If you have dual core processors in four available processor sockets, that would provide 8 logical processors for the OS. If the processors also support Hyperthreading (HT) and the system has this option enabled, then the total logical processor count may then be 16. 16 processors as compared to 8…seems like a no-brainer to have twice as many, right?

 

Don’t confuse processor cores with extra logical processors available with HT enabled. Some configurations with HT might present one extra logical processor per core while others may present more. Cores and logical processors from HT are two different things and the expected performance may be different than expected. Additional processor cores are practically just like additional physical processors without requiring extra sockets for them on the system board.

 

The way I like to think of HT is to think of one of those old pizza shops where the pizza maker is tossing the dough in the air in preparation to make a pizza. You know…the old-fashioned way. Instead of being able to toss just one pizza at a time…imagine the same person tossing and spinning two of them simultaneously. Performance of this person compared to two separate people performing the same task may not be equivalent and may be somewhere in-between. For HT compatible processors that provide an extra logical processor per physical core, HT allows a processor core to run one additional concurrent thread per HT logical processor exposed but sharing on-chip resources like cache.

 

For purposes of illustration, assume you have a single core processor that supports HT and provides a single HT logical processor. In that configuration, both logical processors share resources on the chip. A HT processor typically will not provide the same performance as two single-threaded processors but may provide better performance than a single processor. With expected performance of a HT logical processor being somewhere in-between, the performance gains achieved in a HT configuration will vary by application. While I truly believe that HT on today’s hardware is better implemented and performs better than in years past, I don’t factor HT into sizing a system HT can be a good performance benefit to have on hand if you need it, but I’ve not seen the performance to be that much greater. I’ve consistently thought of HT as yielding more compute power than dealing with I/O.

 

You can search the net and find a variety of opinions on this topic. You may form your own opinion. There are also some applications that suggest or require disabling HT because of the impact to the application. The advice I’ve consistently given has been to size systems according to physical processors and cores. Then use Performance Monitor to determine if HT provides additional gain. And, of course, if an application says don’t use it…the vendor may have a reason.

 

How many cores then will Windows Server allow?

 

The number of possible logical processors prior to Windows Server 2008 R2 was based on the number of bits. For instance, a 32-bit OS could use 32 logical processors; 64-bit could use 64 logical processors. This is confirmed by Mark Russinovich’s presentation on R2’s kernel changes (available on the Microsoft Download Center.) Windows Server 2008 R2 extends this limit by allowing up to 4 groups of up to 64 processors each. Doing the math, that translates to a maximum of 256 logical processors for Windows Server 2008 R2. That alone would be enough for me to jump to R2 if I were an administrator using very expensive hardware with lots of processor cores…especially for virtualization.

The Windows Server 2008 R2 kernel establishes processor groups (K-Groups) at boot time; they are not customizable by an administrator after startup. However, according to KB2506384, there exists a way to manually adjust K-Group assignments to your liking for the next boot of the OS. K-Groups may contain one or more NUMA nodes. Windows attempts to place all processors from a given NUMA node in the same group where possible. Systems with less than 64 logical processors will have only a single group. From a scheduling standpoint, threads are assigned to only one group at a time. Also, an interrupt may target only the processors of a single group.

 

What happens when a physical processor has multiple cores or a given core has multiple logical processors?

 

The answer to this question depends on whether you're using Windows Server 2008 R2 RTM or with applicable updates that alter default behavior. Using the RTM version of Windows Server 2008 R2, the kernel attempts to place all cores of a given physical processor in the same group whenever possible. If using processors where the number of cores per chip isn’t an even multiple of 2, then some cores on a physical processor may be split between groups. For example, if using 12 processors with 6 cores each, the total number of processor cores would be 72. This would result in one group of 64 processors, and a second group of 8. The eleventh physical processor would have 4 cores in the first group, with the remaining two cores the second group along with all six cores of processor 12. For some applications, uneven groups can be problematic. Additionally, minor hardware differences between seemingly identical systems could result in one with a {64,8} grouping and another with {8,64}.

 

If using Windows Server 2008 R2 with KB 2510206 (or future service pack containing this update), the kernel will attempt to balance processors amongst groups. With the preceding example of 72 logical processor cores, the resulting groups would each contain 36. The update provides predictability and balance without requiring manual K-Group specification as per KB2506384.

 

If using Windows Server 2008 with more than 64 cores, you would not be able to utilize extra cores above that limit even though they may exist. Windows Server 2008 R2 can utilize processor groups and allow use of these additional cores up to the maximum of 256. This isn’t the only reason to consider moving to Windows Server 2008 R2…there certainly are many more.

 

Additional References

 

2510206 Performance issues when more than 64 logical processors are used in Windows Server 2008 R2
http://support.microsoft.com/kb/2510206/EN-US

 

2546706 A Windows Server 2008 R2-based computer that has some NUMA-based processors and more than 256 logical processors runs in SMP mode as a 64-processor system and may experience decreased performance
http://support.microsoft.com/kb/2546706/EN-US

 

2517752 "0x0000000A" Stop error occurs during the shutdown process on a computer that is running Windows Server 2008 and that has more than 64 processors installed
http://support.microsoft.com/kb/2517752/EN-US

 

Sysinternals CoreInfo tool can show logical processor to physical processor mapping
http://technet.microsoft.com/en-us/sysinternals/cc835722