The customer faced challenges in managing their customers' calling activity data, which affected their billing and payment services. Due to the increasing volume of data, processing times exceeded 24 hours, causing a delay in real-time processing and resulting in reports reflecting data that was more than a day old. However, after implementing our advanced solution, the customer witnessed significant improvements in their financial transactional operations.
To address the issue, a meticulous analysis was conducted, and several strategic measures were implemented. Initially, the team tried to mitigate the problem by increasing CPU and memory allocation for each machine. Unfortunately, this solution proved ineffective as the issue resurfaced shortly afterward.
Subsequently, an opensource tool was installed on the SQL Server VM, allowing comprehensive metrics data to be gathered. After spending an additional four hours carefully examining the logs, a critical discovery was made. It was found that the statistics had a consistently low sample rate, consistently falling below 5%. To resolve this issue, the team recommended adjusting the sample rate parameter in the Update Statistics jobs and enabling PERSIST_SAMPLE_PERCENT to align with the maintenance job's sample rate. Additionally, they advised increasing the MAXDOP setting from 1 to 8.
When SQL Server creates or updates statistics and a sampling rate is not manually specified, the default sampling rate may not accurately represent the data distribution, leading to a degradation of query plan efficiency.
Let’s check stats on this table then. I’m searching for stats on table orders with column col2 (part of the join argument):
SELECT ss.stats_id, ss.name, filter_definition, last_updated, rows,
rows_sampled, steps, unfiltered_rows, modification_counter, persisted_sample_percent,
(rows_sampled * 100)/rows AS sample_percent
FROM sys.stats ss
INNER JOIN sys.stats_columns sc
ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id
INNER JOIN sys.all_columns ac
ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr
WHERE ss.[object_id] = OBJECT_ID('[Sales].[SalesOrderHeaderBulk]')
AND ac.name = 'OrderDate';
We see a statistic for the index used above, with a sampling rate of 6 percent. Let’s say I really need to improve estimations, and that having a higher sampling rate could just do that.
So I’ll just update this statistic, and while I could try to find a better (and higher) sampling rate, for this demo I’ll just update with full scan:
UPDATE STATISTICS [Sales].[SalesOrderHeaderBulk] [IX_OrderDate] WITH FULLSCAN
I really want to always update this statistic with 100 percent sampling rate, so I can now use the following PERSIST_SAMPLE_PERCENT keyword:
UPDATE STATISTICS [Sales].[SalesOrderHeaderBulk] [IX_OrderDate]
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON
The table name has been changed for demonstration purpose from customer’s database.
Upon implementing these recommendations, the customer successfully resolved the replication link issue, resulting in a remarkable improvement in the SQL Server's processing capacity.