This blog provides an example of turning a customer escalation to Tiger team into a successful engagement and using the learnings to deliver improvements in the SQL Server product to scale on modern hardware for specific workload.
As hardware trends change over the years, customer environments move to newer hardware with larger number of cores per socket and TBs of memory, performance issues related to NUMA memory access and database lock contention become predominant. In addition to this, the democratization of storage has resulted in superfast IO technologies such as FusionIO, SSD etc that perform 5-10x faster than a SAN at a fraction of the cost. What this means is IO is no longer a bottleneck. With scale-up hardware, customer expectations increase with the assumption that SQL Server applications will perform better and faster with more transaction throughput.
A major class of contention issues in SQL Server are attributed to excessive waits on thread safe memory objects (CMEMTHREAD) which many workloads experience on modern hardware. The next section describes the problem as well as the solution that addresses this in SQL Server.
Heap allocators, called memory objects in SQL Server, allow to allocate memory from the heap. CMemThread is a thread-safe memory object allowing for concurrent memory allocations from multiple threads. For correct bookkeeping, CMemThread objects rely on synchronization constructs, a mutex in that case, to ensure only a single thread is updating critical pieces of information at a time.
A major disadvantage, however, is that the use of mutexes can lead to contention if many threads are allocating from the same memory object in a highly concurrent fashion. To that end, SQL Server introduced the concept of a partitioned memory object where each partition is represented by a single CMemThread object. The partitioning of a memory object as described above is statically defined in code and cannot be changed after creation. As memory allocation patterns vary widely based on aspects like hardware and memory usage, it is impossible to come up with the perfect partitioning pattern upfront. In the vast majority of cases no partitioning (single partition) will suffice but some scenarios may lead to contention (see the “old experience” on the left side of the illustration below) that can be prevented only with a highly partitioned memory object. It is not desirable to partition each memory object as more partitions result in inefficiencies and increase memory fragmentation.
Dynamic promotion of memory objects improvement introduces a new mechanism (see the “new experience” on the right side of the illustration below) that is able to
1. Determine bottlenecks caused by memory object contention at runtime, i.e., detect contention, and
2. Dynamically increase number of partitions at runtime. The partitioning scheme first creates as many CMEMTHREAD objects as the number of nodes and if the contention is still not reduced, SQL Server will dynamically increase the number of CMEMTHREAD objects to match number of logical cores the instance is using.
In tests done in a lab environment, the performance monitor chart below shows up to 3x improvement in throughput from 3000 batch requests/sec to 8000 batch requests/sec (green line) and 60x reduction in waits from an average of 57000 waits/sec to 0 waits/sec (red line).
type | creation_options | partition_type | contention_factor | waiting_tasks_count | exclusive_access_count |
MEMOBJ_XP | 4194306 | 1 | 0 | 0 | 0 |
wait_type | waiting_tasks_count | wait_time_ms | max_wait_time_ms | signal_wait_time_ms |
CMEMTHREAD | 0 | 0 | 0 | 0 |
type | creation_options | partition_type | contention_factor | waiting_tasks_count | exclusive_access_count |
MEMOBJ_XP | 4194306 | 1 | 1.92846 | 10452972 | 4880465 |
wait_type | waiting_tasks_count | wait_time_ms | max_wait_time_ms | signal_wait_time_ms |
CMEMTHREAD | 293273 | 5383 | 2 | 2124 |
type | creation_options | partition_type | contention_factor | waiting_tasks_count | exclusive_access_count |
MEMOBJ_XP | 4194306 | 3 | 0.27525 | 8663203 | 30767614 |
wait_type | waiting_tasks_count | wait_time_ms | max_wait_time_ms | signal_wait_time_ms |
CMEMTHREAD | 7167 | 232 | 0 | 38 |
A new extended event sqlos.pmo_promotion is available to help with detecting if dynamic memory object partitioning is occurring. This event will fire when
1. an un-partitioned CMEMTHREAD object (i.e. only one copy exists at the instance level when it starts) is partitioned by node after contention is detected on the object
2. a node partitioned CMEMTHREAD object (i.e. as many copies as number of nodes the instance is using) is partitioned by cores after contention is detected on any node.
Sample Extended Event session:
name | timestamp | contention_factor | contention_factor_this_partition | partition_type | pmo_type | cpu_id | numa_node_id |
pmo_promotion | 6/25/16 8:52 AM | 2.187220097 | 2.187220097 | PartitionedByNode | MEMOBJ_XP | 3 | 0 |
pmo_promotion | 6/25/16 8:52 AM | 0.981935024 | 1.963870049 | PartitionedByCpu | MEMOBJ_XP | 36 | 1 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.