Home
%3CLINGO-SUB%20id%3D%22lingo-sub-809776%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-809776%22%20slang%3D%22en-US%22%3ESeems%20like%20an%20interesting%20way%20to%20solve%20the%20issue.%20I%E2%80%99m%20just%20really%20curious%20why%20you%20even%20went%20to%20the%20effort%20to%20implement%20it%3F%20Firstly%20there%20are%20many%20known%20solutions%2C%20which%20often%20work%20much%20better%20(As%20you%20already%20mentioned%20in%20this%20article).%20Secondly%2C%20given%20the%20existence%20of%20Hekaton%2C%20I%20would%20have%20expected%20there%20to%20be%20a%20massively%20more%20elegant%20solution.%20Not%20that%20many%20people%20what%20this%20issue%2C%20in%20my%20experience.%20As%20such%20there%20doesn%E2%80%99t%3B%20seem%20to%20be%20a%20real%20urge%20to%20throw%20together%20a%20solution.%20Would%20it%20not%20be%20worth%20investing%20the%20time%20in%20something%20like%20a%20Hekaton%20merge%20table%3F%20I%20mean%20imagine%20a%20lock-less%2C%20latch-less%20insert%20system%20(Oh%20yeah%20it%20exists).%20Then%20every%20(Randomly%20pulled%20out%20of%20thin%20air)%20512%20inserts%20you%20merge%20those%20into%20the%20main%20table.%20Once%20that%20the%20merge%20completes%20you%20remove%20that%20merge%20table.%20In%20the%20mean%20time%20you%20create%20a%20new%20merge%20table.%20I%20mean%20the%20solution%20is%20really%20quite%20simple%2C%20exists%20in%20other%20database%20engines%20(Look%20at%20WiredTiger%2C%20which%20MongoDB%20bought%20-%20I%20really%20wish%20a%20better%20company%20had%20bought%20it).%20Merge%20tables%20aren%E2%80%99t%20new%20or%20exciting%20technology.%20They%20do%20however%20work%20and%20have%20done%20for%20a%20long%20time.%20Oh%20would%20it%20also%20be%20possible%20to%20stop%20half%20the%20data%20in%20non-leaf%20levels%20moving%20to%20new%20pages%20with%20page%20splits%20in%20an%20identity%20index%20please%3F%20I%20don%E2%80%99t%20see%20why%20all%20non-leaf%20level%20pages%20have%20to%20be%2050%25%20full%20until%20the%20next%20index%20rebuild.%20Especially%20because%20I%20shouldn%E2%80%99t%20need%20to%20ever%20rebuild%20an%20identity%20index.%20Given%20that%20would%20be%20most%20clustered%20indexes%20that%20I%20have%20that%20would%20make%20quite%20a%20difference%20to%20maintenance%20windows.%20Cool%20that%20you%E2%80%99re%20doing%20things%20to%20make%20SQL%20Server%20better.%20I%20would%20just%20prefer%20it%20to%20be%20a%20bit%20longer%20term%20and%20more%20solution%20oriented%20rather%20than%20sticking%20plaster%20oriented.%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-809777%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-809777%22%20slang%3D%22en-US%22%3EOh%20cool%20my%20comment%20lost%20all%20my%20formatting.%20I%20also%20seem%20unable%20to%20edit%20it...%20So%20a%20little%20correction.%20Where%20I%20wrote%20%E2%80%9CNot%20many%20people%20what%20this%20issue%E2%80%9D.%20I%20meant%20to%20write%20%E2%80%9CNot%20many%20people%20have%20this%20issue%E2%80%9D.%20Now%20I%E2%80%99ll%20see%20if%20more%20carriage%20returns%20allow%20me%20to%20format%20my%20text.%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-811327%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-811327%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F394090%22%20target%3D%22_blank%22%3E%40Rmwin%3C%2FA%3E-%20good%20suggestions.%26nbsp%3B%20For%20a%20sequential%20index%20yes%20the%20lowest%20level%20pages%20should%20not%20be%2050%25%20fill%20that's%20usually%20inefficient%2C%20except%20possibly%20for%20some%20odd%20configurations.%26nbsp%3B%20However%20I%20believe%20the%20B-tree%20will%20still%20have%20to%20be%20re-balanced%20occasionally%20even%20with%20an%20optimized%20insert%20that%20minimizes%20half%20full%20pages%3B%20that%20might%20be%20done%20very%20efficiently%20by%20just%20rearranging%20a%20few%20sub-trees.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-811482%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-811482%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F394090%22%20target%3D%22_blank%22%3E%40Rmwin%3C%2FA%3EThanks%20for%20the%20feedback!%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ELet%20me%20address%20your%20comment%20about%20page%20splits%20first.%20When%20a%20new%20page%20is%20added%20to%20the%20end%20of%20any%20level%20of%20an%20index%20(could%20be%20leaf%20or%20intermediate%20level)%2C%20we%20do%20not%20move%20any%20rows%20around%2C%20we%20simply%20add%20a%20new%20empty%20page.%20The%20only%20time%20pages%20are%20left%2050%25%20full%20is%20when%20a%20new%20page%20is%20added%20in%20the%20middle%20of%20the%20level.%20You%20can%20see%20this%20in%20action%20by%20creating%20an%20empty%20table%20with%20a%20clustered%20index%20on%20a%20sequential%20key%20and%20inserting%20just%20enough%20rows%20to%20fill%20one%20page%2C%20then%20add%20one%20more%20row%20and%20look%20at%20the%20resulting%20pages.%20You'll%20see%20three%20pages%3A%20the%20first%20full%20page%2C%20the%20new%20empty%20page%2C%20and%20a%20new%20root%20page%20that%20points%20to%20the%20two%20other%20pages%20which%20now%20form%20the%20leaf%20level%20of%20the%20index.%20If%20you%20do%20this%20again%20with%20a%20clustered%20index%20on%20a%20non-sequential%20key%20such%20as%20a%20uniqueidentifier%2C%20after%20the%20page%20split%20happens%20you'll%20see%20that%20both%20the%20leaf%20pages%20contain%2050%25%20of%20the%20rows.%20I%20can%20post%20a%20demo%20script%20for%20this%20later%20if%20it%20would%20be%20helpful.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAs%20I%20mentioned%2C%20we%20did%20discuss%20(at%20length%2C%20I%20assure%20you%20%3A))%20many%20other%20options%20for%20solving%20this%20problem.%20While%20a%20Hekaton%20merge%20table%20seems%20like%20a%20straightforward%20solution%2C%20there%20are%20actually%20a%20lot%20of%20complexities%20to%20manage%20such%20as%20different%20transactional%20semantics%2C%20different%20indexing%20etc.%20that%20make%20this%20challenging%20to%20do%20automatically.%20This%20is%20something%20you%20can%20actually%20implement%20yourself%2C%20and%20we%20do%20have%20some%20customers%20that%20use%20a%20similar%20solution%20(it's%20often%20called%20the%20%3CA%20href%3D%22https%3A%2F%2Fcloudblogs.microsoft.com%2Fsqlserver%2F2013%2F09%2F19%2Fin-memory-oltp-common-design-pattern-high-data-input-rateshock-absorber%2F%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3Eshock%20absorber%3C%2FA%3E).%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWhile%20you%20may%20not%20have%20encountered%20this%20problem%20frequently%2C%20we%20do%20see%20many%20customers%20facing%20this%20issue%2C%20particularly%20on%20servers%20with%20a%20high%20core%20count%20(e.g.%20128%20cores%20or%20more).%20The%20hope%20is%20that%20this%20new%20flow%20control%20mechanism%20may%20also%20be%20used%20to%20improve%20throughput%20in%20other%20areas%20of%20the%20engine%20that%20have%20a%20tendency%20to%20form%20convoys%20under%20contention.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-811492%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-811492%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F394644%22%20target%3D%22_blank%22%3E%40jahummer%3C%2FA%3EB-trees%20are%20self-balancing%20due%20to%20the%20way%20they%20are%20built%20from%20the%20bottom%20up%20(i.e.%20leaf%20first%2C%20then%20push%20up%20a%20new%20level%20when%20needed)%2C%20so%20they%20never%20need%20to%20be%20%22re-balanced%22%20per%20se.%20Even%20a%20sequential%20index%20may%20need%20to%20be%20reorganized%2C%20however%2C%20if%20there%20are%20a%20lot%20of%20in%20place%20updates%20that%20cause%20rows%20to%20be%20moved%20from%20one%20page%20to%20another.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-812574%22%20slang%3D%22en-US%22%3ERe%3A%20Behind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-812574%22%20slang%3D%22en-US%22%3ELet's%20hope%20those%20customers%20with%20many%20cores%20are%20running%20software%20from%20vendors%20that%20are%20advanced%20enough%20to%20implement%20something%20like%20this.%20I've%20had%20discussions%20with%20vendors%20to%20try%20to%20convince%20them%20that%20using%20cascading%20updates%20is%20a%20better%20way%20to%20cascade%20foreign%20key%20changes%20than%20%22instead%20of%22%20triggers.%20And%20I've%20seen%20vendors%20not%20use%20foreign%20keys%20at%20all%2C%20instead%20opting%20to%20provide%20a%20routine%20for%20%22checking%20links%22.%20I%20just%20had%20a%20vendor%20explain%20to%20me%20that%20snapshot%20isolation%20is%20not%20necessary%20because%20they%20always%20set%20the%20isolation%20mode%20to%20read%20uncommitted%20for%20all%20of%20their%20reports.%20There%20are%20so%20many%20great%20SQL%20Server%20features%20that%20go%20unused%20in%20the%20typical%20application.%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-806888%22%20slang%3D%22en-US%22%3EBehind%20the%20Scenes%20on%20OPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-806888%22%20slang%3D%22en-US%22%3E%3CP%3EIn%20SQL%20Server%202019%20CTP%203.1%20a%20new%20index%20option%20was%20added%20called%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fsql%2Ft-sql%2Fstatements%2Fcreate-index-transact-sql%23sequential-keys%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3EOPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FA%3Ethat%20is%20intended%20to%20address%20throughput%20issues%20that%20may%20occur%20when%20a%20workload%20has%20a%20high%20number%20of%20concurrent%20inserts.%20These%20workloads%20commonly%20face%20an%20issue%20known%20as%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2FSQL-Server%2FPAGELATCH-EX-waits-and-heavy-inserts%2Fba-p%2F384289%22%20target%3D%22_blank%22%20rel%3D%22noopener%22%3Elast%20page%20insert%20contention%3C%2FA%3E.%20%3CBR%20%2F%3E%3CBR%20%2F%3EIn%20SQL%20Server%2C%20indexes%20are%20stored%20as%20%3CA%20href%3D%22https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FB-tree%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3EB-trees%3C%2FA%3Ewhich%20are%20ordered%20by%20the%20index%20key.%20In%20a%20clustered%20index%2C%20the%20leaf%20level%20contains%20the%20data%20pages%20themselves%2C%20ordered%20by%20the%20index%20key.%20This%20means%20that%20when%20you%20are%20inserting%20rows%20into%20a%20table%20that%20has%20a%20clustered%20index%20where%20the%20key%20is%20a%20value%20that%20is%20always%20increasing%2C%20such%20as%20an%20identity%20column%20or%20a%20datetime%20column%2C%20all%20the%20new%20rows%20will%20be%20inserted%20on%20the%20last%20page%20of%20the%20B-tree.%20As%20concurrency%20increases%2C%20contention%20for%20this%20page%20in%20memory%20will%20increase%2C%20which%20limits%20scalability%20of%20such%20workloads.%20This%20problem%20can%20be%20identified%20by%20many%20threads%20waiting%20with%20a%20wait%20type%20of%20PAGELATCH_xx%20on%20the%20same%20page%20resource%20in%20a%20user%20database.%20In%20an%20OLTP%20system%20that%20is%20frequently%20inserting%20data%2C%20PAGELATCH%20waits%20will%20often%20be%20observed%2C%20particularly%20on%20a%20busy%20system.%3CBR%20%2F%3E%3CBR%20%2F%3EThere%20are%20many%20different%20methods%20to%20address%20this%20issue%20that%20have%20been%20suggested%20in%20the%20past%2C%20most%20of%20them%20involve%20making%20changes%20to%20either%20the%20application%20or%20the%20structure%20of%20the%20index.%20The%20following%20article%20outlines%20some%20of%20the%20most%20common%20solutions%3A%3CBR%20%2F%3E%3CBR%20%2F%3EHow%20to%20resolve%20last-page%20insert%20PAGELATCH_EX%20contention%20in%20SQL%20Server%3CBR%20%2F%3E%3CA%20href%3D%22https%3A%2F%2Fsupport.microsoft.com%2Fkb%2F4460004%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fsupport.microsoft.com%2Fkb%2F4460004%3C%2FA%3E%3CBR%20%2F%3E%3CBR%20%2F%3EWhen%20thinking%20about%20addressing%20this%20problem%20automatically%20in%20SQL%20Server%2C%20these%20options%20as%20well%20as%20others%20were%20considered%2C%20but%20most%20of%20them%20require%20some%20sort%20of%20performance%20trade-off%20and%20would%20involve%20major%20changes%20to%20the%20database%20engine.%20What%20we%20decided%20to%20do%20instead%20was%20to%20look%20closer%20at%20the%20contention%20itself%20and%20see%20if%20there%20was%20some%20way%20to%20improve%20scalability%20without%20making%20major%20changes%20to%20the%20way%20the%20data%20is%20physically%20stored.%3CBR%20%2F%3E%3CBR%20%2F%3EWhile%20PAGELATCH%20contention%20does%20introduce%20additional%20latency%20in%20an%20application%2C%20it%20is%20something%20called%20a%20latch%20%3CEM%3Econvoy%3C%2FEM%3Ethat%20is%20truly%20detrimental%20to%20application%20scalability.%20This%20%3CA%20href%3D%22https%3A%2F%2Fblog.acolyer.org%2F2019%2F07%2F01%2Fthe-convoy-phenomenon%2F%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Eblog%20article%3C%2FA%3Egives%20a%20good%20description%20of%20convoys%20and%20why%20they%20happen.%20Traffic%20jams%20are%20a%20common%20analogy%20used%20to%20describe%20the%20problem.%20If%20you%20have%20a%20road%20that%20is%20at%20maximum%20capacity%2C%20as%20long%20as%20all%20the%20traffic%20continues%20to%20move%20at%20the%20same%20speed%2C%20throughput%20will%20be%20consistent.%20If%20you%20introduce%20a%20bottleneck%20on%20the%20road%20that%20forces%20all%20the%20cars%20to%20go%20through%20a%20single%20lane%2C%20this%20will%20cause%20traffic%20to%20back%20up%20behind%20the%20bottleneck.%20Traffic%20will%20continue%20to%20make%20progress%20at%20a%20slower%20rate%2C%20but%20if%20the%20rate%20of%20cars%20entering%20the%20highway%20remains%20constant%2C%20the%20traffic%20jam%20will%20get%20larger%20and%20larger%20unless%20the%20bottleneck%20is%20removed.%20In%20this%20situation%2C%20if%20something%20occurs%20that%20causes%20the%20drivers%20entering%20the%20single%20lane%20to%20hit%20the%20brakes%2C%20such%20as%20a%20slow%20driver%20or%20a%20hazard%20on%20the%20road%2C%20the%20rate%20of%20throughput%20drops%20severely%20and%20%3CSPAN%20style%3D%22display%3A%20inline%20!important%3B%20float%3A%20none%3B%20background-color%3A%20%23ffffff%3B%20color%3A%20%23333333%3B%20cursor%3A%20text%3B%20font-family%3A%20inherit%3B%20font-size%3A%2016px%3B%20font-style%3A%20normal%3B%20font-variant%3A%20normal%3B%20font-weight%3A%20300%3B%20letter-spacing%3A%20normal%3B%20line-height%3A%201.7142%3B%20orphans%3A%202%3B%20text-align%3A%20left%3B%20text-decoration%3A%20none%3B%20text-indent%3A%200px%3B%20text-transform%3A%20none%3B%20-webkit-text-stroke-width%3A%200px%3B%20white-space%3A%20normal%3B%20word-spacing%3A%200px%3B%22%3Etraffic%20really%20piles%20up%3C%2FSPAN%3E.%20At%20this%20point%2C%20throughput%20won't%20recover%20until%20the%20rate%20of%20cars%20entering%20the%20road%20slows%20down%20dramatically%2C%20much%20lower%20than%20what%20the%20road%20would%20normally%20be%20able%20to%20handle.%3CBR%20%2F%3E%3CBR%20%2F%3EWith%20last%20page%20insert%20contention%2C%20as%20the%20number%20of%20insert%20threads%20increases%2C%20the%20queue%20for%20the%20page%20latch%20increases%20which%20in%20turn%20increases%20latency.%20Throughput%20will%20also%20decrease%2C%20but%20if%20something%20slows%20down%20one%20of%20the%20threads%20that%20is%20holding%20the%20latch%2C%20this%20can%20trigger%20a%20convoy%20and%20throughput%20suddenly%20falls%20off%20a%20cliff.%20This%20typically%20happens%20when%20a%20page%20fills%20up%20and%20a%20new%20page%20must%20be%20added%20to%20the%20index%20(also%20known%20as%20a%20page%20split).%20The%20insert%20that%20triggers%20the%20new%20page%20will%20naturally%20have%20to%20hold%20the%20latch%20for%20longer%20than%20normal%20while%20the%20new%20page%20operation%20completes.%20This%20causes%20the%20queue%20to%20build%20up%20behind%20the%20latch.%20Adding%20a%20new%20page%20also%20requires%20an%20exclusive%20latch%20on%20the%20parent%20page%2C%20which%20can%20cause%20latch%20requests%20to%20queue%20at%20that%20level%20as%20well.%20At%20this%20point%2C%20throughput%20falls%20off%20a%20cliff.%3C%2FP%3E%0A%3CP%3EOPTIMIZE_FOR_SEQUENTIAL_KEY%20aims%20to%20do%20two%20things%20%E2%80%93%20control%20the%20rate%20at%20which%20new%20threads%20are%20allowed%20to%20request%20the%20latch%2C%20and%20favor%20threads%20that%20are%20likely%20to%20keep%20the%20throughput%20high.%20These%20techniques%20will%20not%20prevent%20the%20contention%20or%20reduce%20latency%2C%20but%20they%20will%20help%20keep%20throughput%20consistent%20as%20concurrency%20increases.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWhen%20using%20this%20option%2C%20you%20may%20not%20see%20a%20decrease%20in%20PAGELATCH%20waits%2C%20in%20fact%20you%20may%20even%20see%20an%20increase%20in%20waits%20with%20a%20new%20wait%20type%20called%20BTREE_INSERT_FLOW_CONTROL.%20Despite%20these%20waits%2C%20you%20should%20be%20able%20to%20achieve%20much%20better%20throughput%2C%20and%20scale%20to%20a%20much%20higher%20number%20of%20concurrent%20threads%2C%20without%20hitting%20the%20proverbial%20cliff.%20If%20you're%20not%20experiencing%20the%20convoy%20phenomenon%20in%20your%20workload%2C%20you%20may%20not%20see%20a%20huge%20benefit%20from%20this%20option%2C%20and%20you%20may%20even%20see%20a%20slight%20degradation%20in%20performance%20due%20to%20the%20new%20flow%20control%20waits.%20You%20should%20only%20use%20this%20option%20if%20you%20have%20a%20very%20heavily%20contentious%20workload%20%E2%80%93%20one%20where%20the%20number%20of%20threads%20inserting%20into%20the%20index%20is%20much%20higher%20than%20the%20number%20of%20schedulers%20%E2%80%93%20on%20a%20clustered%20index%20with%20a%20sequential%20key%20(note%20that%20non-clustered%20indexes%20can%20experience%20this%20problem%20as%20well%2C%20but%20because%20they%20have%20a%20smaller%20row%20size%20they%20don%E2%80%99t%20have%20as%20high%20a%20tendency%20to%20form%20convoys%20so%20they%20are%20less%20likely%20to%20benefit%20from%20this%20option).%20We%20are%20also%20working%20on%20providing%20some%20new%20reporting%20in%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fsql%2Fazure-data-studio%2Fdownload%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3EAzure%20Data%20Studio%3C%2FA%3Eand%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fsql%2Fssms%2Fdownload-sql-server-management-studio-ssms%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3ESQL%20Server%20Management%20Studio%3C%2FA%3Ethat%20will%20help%20detect%20this%20scenario%20and%20recommend%20the%20option%20where%20appropriate.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EHopefully%20that%E2%80%99s%20enough%20of%20an%20explanation%20for%20the%20average%20user%2C%20but%20I%20know%20many%20of%20you%20out%20there%20are%20thinking%20%E2%80%9CYeah%20but%20what%20are%20you%20DOING%20under%20the%20covers%3F%20How%20does%20this%20really%20work%3F%E2%80%9D%20To%20dig%20a%20little%20deeper%2C%20as%20you%20may%20have%20guessed%20from%20the%20name%20of%20the%20new%20wait%20type%2C%20enabling%20OPTIMIZE_FOR_SEQUENTIAL_KEY%20turns%20on%20what%20we%20call%20flow%20control%20for%20the%20index%20in%20question.%20When%20OPTIMIZE_FOR_SEQUENTIAL_KEY%20is%20set%20to%20ON%20for%20an%20index%2C%20we%20introduce%20an%20upstream%20flow%20control%20mechanism%20for%20all%20the%20threads%20that%20request%20to%20enter%20the%20critical%20section%20of%20SQL%20Server%20code%20that%20is%20protected%20by%20a%20page%20latch.%20Unlike%20a%20semaphore%20which%20simply%20controls%20the%20traffic%20based%20on%20a%20fixed%20number%20of%20threads%2C%20this%20new%20mechanism%20controls%20the%20flow%20of%20traffic%20based%20on%20a%20set%20of%20conditions%20such%20as%20which%20CPU%20the%20thread%20is%20on%2C%20the%20state%20of%20the%20thread%2C%20and%2For%20the%20rate%20at%20which%20the%20thread%20is%20completing%20work.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWith%20OPTIMIZE_FOR_SEQUENTIAL_KEY%2C%20we%20will%20limit%20the%20number%20of%20threads%20allowed%20to%20request%20the%20latch%20to%20one%20per%20scheduler%20(i.e.%20CPU%20core)%2C%20which%20should%20reduce%20time%20spent%20in%20the%20runnable%20queue%20once%20the%20latch%20is%20acquired.%20In%20addition%20to%20this%2C%20we%20will%20also%20favor%20threads%20that%20are%20more%20likely%20to%20make%20progress%20once%20they%20acquire%20the%20latch.%20This%20%E2%80%9Cunfair%E2%80%9D%20mechanism%20gives%20preference%20to%20threads%20that%20are%20likely%20to%20complete%20their%20work%20in%20a%20single%20quantum%20over%20threads%20that%20will%20undergo%20several%20context%20switches%20while%20holding%20the%20latch.%20While%20this%20unfairness%20may%20result%20in%20some%20threads%20experiencing%20additional%20latency%2C%20overall%20throughput%20will%20be%20increased%20because%20less%20time%20will%20be%20spent%20waiting%20for%20other%20resources%20while%20holding%20the%20latch%20and%20blocking%20other%20threads.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EKeep%20in%20mind%20that%20this%20flow%20control%20is%20not%20going%20to%20improve%20individual%20insert%20latency%2C%20in%20fact%20quite%20the%20opposite.%20Think%20of%20it%20like%20the%20traffic%20lights%20on%20a%20highway%20on-ramp%20%E2%80%93%20the%20cars%20coming%20onto%20the%20highway%20are%20going%20to%20wait%20longer%20before%20they%20are%20allowed%20to%20enter.%20It%E2%80%99s%20going%20to%20make%20them%20a%20bit%20slower%20than%20normal%2C%20but%20the%20result%20is%20the%20traffic%20keeps%20moving%20at%20a%20constant%20(albeit%20slower)%20rate%20instead%20of%20piling%20up.%20Our%20initial%20testing%20shows%20up%20to%2030%25%20improvement%20in%20throughput%20on%20some%20workloads%2C%20but%20of%20course%20your%20mileage%20may%20vary.%20The%20other%20solutions%20to%20last%20page%20insert%20contention%20outlined%20in%20the%20%3CA%20href%3D%22https%3A%2F%2Fsupport.microsoft.com%2Fkb%2F4460004%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3EKB%20article%3C%2FA%3Ereferenced%20above%20are%20also%20still%20valid%2C%20and%20in%20many%20cases%20will%20yield%20an%20even%20better%20throughput%20improvement%20over%20OPTIMIZE_FOR_SEQUENTIAL_KEY.%20The%20best%20way%20to%20address%20the%20problem%20is%20to%20test%20the%20various%20options%20and%20choose%20the%20one%20that%20works%20best%20for%20your%20application%20and%20environment.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIf%20you'd%20like%20to%20see%20an%20example%20of%20OPTIMIZE_FOR_SEQUENTIAL_KEY%20in%20action%2C%20head%20over%20to%20the%20%3CA%20href%3D%22https%3A%2F%2Faka.ms%2Fsqlserversamples%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3ESQL%20Server%20Samples%3C%2FA%3Erepository%20on%20GitHub%20and%20download%20the%20%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fsql-server-samples%2Ftree%2Fmaster%2Fsamples%2Ffeatures%2Foptimize-for-sequential-key%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3Esample%20demo%3C%2FA%3E.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-806888%22%20slang%3D%22en-US%22%3E%3CP%3EIn%20SQL%20Server%202019%20CTP%203.1%20a%20new%20index%20option%20was%20added%20called%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fsql%2Ft-sql%2Fstatements%2Fcreate-index-transact-sql%23sequential-keys%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3EOPTIMIZE_FOR_SEQUENTIAL_KEY%3C%2FA%3Ethat%20is%20intended%20to%20address%20a%20common%20issue%20known%20as%20last%20page%20insert%20contention.%20This%20articles%20takes%20a%20closer%20look%20at%20how%20the%20technology%20behind%20this%20new%20option%20works%20and%20the%20problem%20it%20attempts%20to%20address.%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-806888%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3ESQLServerTiger%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Microsoft

In SQL Server 2019 CTP 3.1 a new index option was added called OPTIMIZE_FOR_SEQUENTIAL_KEY that is intended to address throughput issues that may occur when a workload has a high number of concurrent inserts. These workloads commonly face an issue known as last page insert contention.

In SQL Server, indexes are stored as B-trees which are ordered by the index key. In a clustered index, the leaf level contains the data pages themselves, ordered by the index key. This means that when you are inserting rows into a table that has a clustered index where the key is a value that is always increasing, such as an identity column or a datetime column, all the new rows will be inserted on the last page of the B-tree. As concurrency increases, contention for this page in memory will increase, which limits scalability of such workloads. This problem can be identified by many threads waiting with a wait type of PAGELATCH_xx on the same page resource in a user database. In an OLTP system that is frequently inserting data, PAGELATCH waits will often be observed, particularly on a busy system.

There are many different methods to address this issue that have been suggested in the past, most of them involve making changes to either the application or the structure of the index. The following article outlines some of the most common solutions:

How to resolve last-page insert PAGELATCH_EX contention in SQL Server
https://support.microsoft.com/kb/4460004

When thinking about addressing this problem automatically in SQL Server, these options as well as others were considered, but most of them require some sort of performance trade-off and would involve major changes to the database engine. What we decided to do instead was to look closer at the contention itself and see if there was some way to improve scalability without making major changes to the way the data is physically stored.

While PAGELATCH contention does introduce additional latency in an application, it is something called a latch convoy that is truly detrimental to application scalability. This blog article gives a good description of convoys and why they happen. Traffic jams are a common analogy used to describe the problem. If you have a road that is at maximum capacity, as long as all the traffic continues to move at the same speed, throughput will be consistent. If you introduce a bottleneck on the road that forces all the cars to go through a single lane, this will cause traffic to back up behind the bottleneck. Traffic will continue to make progress at a slower rate, but if the rate of cars entering the highway remains constant, the traffic jam will get larger and larger unless the bottleneck is removed. In this situation, if something occurs that causes the drivers entering the single lane to hit the brakes, such as a slow driver or a hazard on the road, the rate of throughput drops severely and traffic really piles up. At this point, throughput won't recover until the rate of cars entering the road slows down dramatically, much lower than what the road would normally be able to handle.

With last page insert contention, as the number of insert threads increases, the queue for the page latch increases which in turn increases latency. Throughput will also decrease, but if something slows down one of the threads that is holding the latch, this can trigger a convoy and throughput suddenly falls off a cliff. This typically happens when a page fills up and a new page must be added to the index (also known as a page split). The insert that triggers the new page will naturally have to hold the latch for longer than normal while the new page operation completes. This causes the queue to build up behind the latch. Adding a new page also requires an exclusive latch on the parent page, which can cause latch requests to queue at that level as well. At this point, throughput falls off a cliff.

OPTIMIZE_FOR_SEQUENTIAL_KEY aims to do two things – control the rate at which new threads are allowed to request the latch, and favor threads that are likely to keep the throughput high. These techniques will not prevent the contention or reduce latency, but they will help keep throughput consistent as concurrency increases.

 

When using this option, you may not see a decrease in PAGELATCH waits, in fact you may even see an increase in waits with a new wait type called BTREE_INSERT_FLOW_CONTROL. Despite these waits, you should be able to achieve much better throughput, and scale to a much higher number of concurrent threads, without hitting the proverbial cliff. If you're not experiencing the convoy phenomenon in your workload, you may not see a huge benefit from this option, and you may even see a slight degradation in performance due to the new flow control waits. You should only use this option if you have a very heavily contentious workload – one where the number of threads inserting into the index is much higher than the number of schedulers – on a clustered index with a sequential key (note that non-clustered indexes can experience this problem as well, but because they have a smaller row size they don’t have as high a tendency to form convoys so they are less likely to benefit from this option). We are also working on providing some new reporting in Azure Data Studio and SQL Server Management Studio that will help detect this scenario and recommend the option where appropriate.

 

Hopefully that’s enough of an explanation for the average user, but I know many of you out there are thinking “Yeah but what are you DOING under the covers? How does this really work?” To dig a little deeper, as you may have guessed from the name of the new wait type, enabling OPTIMIZE_FOR_SEQUENTIAL_KEY turns on what we call flow control for the index in question. When OPTIMIZE_FOR_SEQUENTIAL_KEY is set to ON for an index, we introduce an upstream flow control mechanism for all the threads that request to enter the critical section of SQL Server code that is protected by a page latch. Unlike a semaphore which simply controls the traffic based on a fixed number of threads, this new mechanism controls the flow of traffic based on a set of conditions such as which CPU the thread is on, the state of the thread, and/or the rate at which the thread is completing work.

 

With OPTIMIZE_FOR_SEQUENTIAL_KEY, we will limit the number of threads allowed to request the latch to one per scheduler (i.e. CPU core), which should reduce time spent in the runnable queue once the latch is acquired. In addition to this, we will also favor threads that are more likely to make progress once they acquire the latch. This “unfair” mechanism gives preference to threads that are likely to complete their work in a single quantum over threads that will undergo several context switches while holding the latch. While this unfairness may result in some threads experiencing additional latency, overall throughput will be increased because less time will be spent waiting for other resources while holding the latch and blocking other threads.

 

Keep in mind that this flow control is not going to improve individual insert latency, in fact quite the opposite. Think of it like the traffic lights on a highway on-ramp – the cars coming onto the highway are going to wait longer before they are allowed to enter. It’s going to make them a bit slower than normal, but the result is the traffic keeps moving at a constant (albeit slower) rate instead of piling up. Our initial testing shows up to 30% improvement in throughput on some workloads, but of course your mileage may vary. The other solutions to last page insert contention outlined in the KB article referenced above are also still valid, and in many cases will yield an even better throughput improvement over OPTIMIZE_FOR_SEQUENTIAL_KEY. The best way to address the problem is to test the various options and choose the one that works best for your application and environment.

 

If you'd like to see an example of OPTIMIZE_FOR_SEQUENTIAL_KEY in action, head over to the SQL Server Samples repository on GitHub and download the sample demo.

6 Comments
Occasional Visitor
Seems like an interesting way to solve the issue. I’m just really curious why you even went to the effort to implement it? Firstly there are many known solutions, which often work much better (As you already mentioned in this article). Secondly, given the existence of Hekaton, I would have expected there to be a massively more elegant solution. Not that many people what this issue, in my experience. As such there doesn’t; seem to be a real urge to throw together a solution. Would it not be worth investing the time in something like a Hekaton merge table? I mean imagine a lock-less, latch-less insert system (Oh yeah it exists). Then every (Randomly pulled out of thin air) 512 inserts you merge those into the main table. Once that the merge completes you remove that merge table. In the mean time you create a new merge table. I mean the solution is really quite simple, exists in other database engines (Look at WiredTiger, which MongoDB bought - I really wish a better company had bought it). Merge tables aren’t new or exciting technology. They do however work and have done for a long time. Oh would it also be possible to stop half the data in non-leaf levels moving to new pages with page splits in an identity index please? I don’t see why all non-leaf level pages have to be 50% full until the next index rebuild. Especially because I shouldn’t need to ever rebuild an identity index. Given that would be most clustered indexes that I have that would make quite a difference to maintenance windows. Cool that you’re doing things to make SQL Server better. I would just prefer it to be a bit longer term and more solution oriented rather than sticking plaster oriented.
Occasional Visitor
Oh cool my comment lost all my formatting. I also seem unable to edit it... So a little correction. Where I wrote “Not many people what this issue”. I meant to write “Not many people have this issue”. Now I’ll see if more carriage returns allow me to format my text.
Occasional Visitor

@Rmwin - good suggestions.  For a sequential index yes the lowest level pages should not be 50% fill that's usually inefficient, except possibly for some odd configurations.  However I believe the B-tree will still have to be re-balanced occasionally even with an optimized insert that minimizes half full pages; that might be done very efficiently by just rearranging a few sub-trees.

Microsoft

@Rmwin Thanks for the feedback! 

 

Let me address your comment about page splits first. When a new page is added to the end of any level of an index (could be leaf or intermediate level), we do not move any rows around, we simply add a new empty page. The only time pages are left 50% full is when a new page is added in the middle of the level. You can see this in action by creating an empty table with a clustered index on a sequential key and inserting just enough rows to fill one page, then add one more row and look at the resulting pages. You'll see three pages: the first full page, the new empty page, and a new root page that points to the two other pages which now form the leaf level of the index. If you do this again with a clustered index on a non-sequential key such as a uniqueidentifier, after the page split happens you'll see that both the leaf pages contain 50% of the rows. I can post a demo script for this later if it would be helpful.

 

As I mentioned, we did discuss (at length, I assure you :)) many other options for solving this problem. While a Hekaton merge table seems like a straightforward solution, there are actually a lot of complexities to manage such as different transactional semantics, different indexing etc. that make this challenging to do automatically. This is something you can actually implement yourself, and we do have some customers that use a similar solution (it's often called the shock absorber). 

 

While you may not have encountered this problem frequently, we do see many customers facing this issue, particularly on servers with a high core count (e.g. 128 cores or more). The hope is that this new flow control mechanism may also be used to improve throughput in other areas of the engine that have a tendency to form convoys under contention.

Microsoft

@jahummer B-trees are self-balancing due to the way they are built from the bottom up (i.e. leaf first, then push up a new level when needed), so they never need to be "re-balanced" per se. Even a sequential index may need to be reorganized, however, if there are a lot of in place updates that cause rows to be moved from one page to another.

Regular Visitor
Let's hope those customers with many cores are running software from vendors that are advanced enough to implement something like this. I've had discussions with vendors to try to convince them that using cascading updates is a better way to cascade foreign key changes than "instead of" triggers. And I've seen vendors not use foreign keys at all, instead opting to provide a routine for "checking links". I just had a vendor explain to me that snapshot isolation is not necessary because they always set the isolation mode to read uncommitted for all of their reports. There are so many great SQL Server features that go unused in the typical application.