In the past week, we have dedicated our efforts to a service request where our customer asked questions:
In the following discussion, I would like to share with you the details of this explanation.
The first thing is to know, how buffer pool works: Azure SQL Server employs a sophisticated approach to manage its buffer pool, a critical component in the overall performance of the database system. The lifecycle of data within the buffer pool, especially in response to query execution, involves several key stages:
When a query is submitted to Azure SQL Server, it first assesses whether the required data pages are available in the buffer pool. The buffer pool is a cache in the server’s memory, holding copies of data pages read from the database files on storage.
For this scenario, let's try to perform an example, with the following table:
CREATE TABLE [MSxyzTest].[_x_y_z_MS_HighCPU](
[Id] [bigint] NOT NULL,
[TextToSearch] [varchar](200) NULL,
PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
DECLARE @i bigint = 1;
WHILE @i <= 4000000
BEGIN
INSERT INTO MSxyzTest._x_y_z_MS_HighCPU (Id, TextToSearch)
VALUES (@i, NEWID()); -- NEWID() genera un valor aleatorio para el ejemplo
SET @i = @i + 1;
END;
Once we have the data we are going to run a CHECKPOINT to save the data to the storage and run DBCC DROPCLEANBUFFER to clean the buffer pool.
CHECKPOINT
:
Purpose: The CHECKPOINT
command in SQL Server is used to manually trigger the process of writing all dirty pages (pages in the buffer pool that have been modified but not yet written to disk) and log records to disk. By doing so, it ensures that the data is stored persistently.
Use Cases: This command is particularly useful in testing environments for simulating a scenario where SQL Server needs to recover from an unexpected shutdown. Executing a CHECKPOINT
ensures that all the recent changes are safely written to disk.
DBCC DROPCLEANBUFFERS
:
Purpose: DBCC DROPCLEANBUFFERS
is a command used to clear the buffer pool, specifically by removing the clean buffers. It's primarily used to test the performance of queries without the benefit of the cache, mimicking a scenario where queries have to retrieve data from the disk.
Use Cases: This is especially useful in a development or testing environment when you want to measure the performance of queries and ensure that your tests are not artificially benefitted by data already being cached in the buffer pool.
Non-Production Environments: Both CHECKPOINT
and DBCC DROPCLEANBUFFERS
should ideally be executed only in non-production environments. They are valuable tools for testing and understanding how queries and data management processes behave under certain conditions.
Impact on Performance: It's critical to note that these commands can have significant implications on performance:
CHECKPOINT
can cause a surge in disk I/O as it writes a large amount of data to disk.DBCC DROPCLEANBUFFERS
can lead to a temporary performance degradation. When the buffer pool is cleared, subsequent queries will not benefit from cached data, resulting in increased disk I/O and slower query performance as data must be read from disk again.Caution in Production: In a production environment, these commands can disrupt the normal optimization and efficiency of the database. Clearing the buffer pool or forcing checkpoints can lead to unnecessary strain on the system, impacting the performance of the database for all users.
The next action we are going to run the following query: select COUNT(*) from [MSxyzTest].[_x_y_z_MS_HighCPU] and we could see two things:
If we run the following query we could see the distribution in the buffer pool about our table and space usage.
DECLARE @TableObjectId INT;
SET @TableObjectId = OBJECT_ID('[MSxyzTest].[_x_y_z_MS_HighCPU]');
SELECT
bd.page_type,
o.name AS ObjectName,
i.name AS IndexName,
bd.file_id,
bd.page_level,
COUNT(*) AS NumberOfPages,
SUM(bd.row_count) AS TotalRowCount,
AVG([bd].[references]) AS AvgReferences,
MIN([bd].[references]) AS MinReferences,
MAX([bd].[references]) AS MaxReferences,
AVG(bd.row_count) AS AvgRowCount,
MIN(bd.row_count) AS MinRowCount,
MAX(bd.row_count) AS MaxRowCount,
AVG(bd.read_microsec) AS AvgReadMicrosec,
MIN(bd.read_microsec) AS MinReadMicrosec,
MAX(bd.read_microsec) AS MaxReadMicrosec,
AVG(CAST(bd.is_modified AS INT)) AS AvgIsModified,
MIN(CAST(bd.is_modified AS INT)) AS MinIsModified,
MAX(CAST(bd.is_modified AS INT)) AS MaxIsModified,
SUM(CAST(bd.is_modified AS INT)) AS SumIsModified
FROM
sys.dm_os_buffer_descriptors AS bd
INNER JOIN
sys.allocation_units AS au
ON bd.allocation_unit_id = au.allocation_unit_id
INNER JOIN
sys.partitions AS p
ON au.container_id = p.partition_id
INNER JOIN
sys.objects AS o
ON p.object_id = o.object_id
LEFT JOIN
sys.indexes AS i
ON p.index_id = i.index_id AND p.object_id = i.object_id
WHERE
o.object_id = @TableObjectId
GROUP BY
bd.page_type,
o.name,
i.name,
bd.file_id,
bd.page_level;
So, as expected if we run again the query: select COUNT(*) from [MSxyzTest].[_x_y_z_MS_HighCPU] and we run SELECT * FROM sys.dm_db_resource_stats the results will be memory in use and data io low.
So, the first answer will be that data io will be when we need to bring data from storage to the memory. So, for this reason, it is very important to use buffer pool and have the most frecuent data in the memory to improve the performance and SQL Server will not release the memory until the following aspects.
For example, let's update 10000 rows, Update top(1000) [MSxyzTest].[_x_y_z_MS_HighCPU] set TextToSearch ='Example _x_2' running the same query about buffer descriptions we could see an indicators that some pages in memory have been modified. If we run only DBCC DROPCLEANBUFFER only the pages that have not been modified will be release, we need to run a checkpoint to free these pages from the buffer pool.
DECLARE @TableObjectId INT;
SET @TableObjectId = OBJECT_ID('[MSxyzTest].[_x_y_z_MS_HighCPU]');
SELECT
bd.page_type,
o.name AS ObjectName,
i.name AS IndexName,
bd.file_id,
bd.page_level,
COUNT(*) AS NumberOfPages,
SUM(bd.row_count) AS TotalRowCount,
AVG([bd].[references]) AS AvgReferences,
MIN([bd].[references]) AS MinReferences,
MAX([bd].[references]) AS MaxReferences,
AVG(bd.row_count) AS AvgRowCount,
MIN(bd.row_count) AS MinRowCount,
MAX(bd.row_count) AS MaxRowCount,
AVG(bd.read_microsec) AS AvgReadMicrosec,
MIN(bd.read_microsec) AS MinReadMicrosec,
MAX(bd.read_microsec) AS MaxReadMicrosec,
AVG(CAST(bd.is_modified AS INT)) AS AvgIsModified,
MIN(CAST(bd.is_modified AS INT)) AS MinIsModified,
MAX(CAST(bd.is_modified AS INT)) AS MaxIsModified,
SUM(CAST(bd.is_modified AS INT)) AS SumIsModified
FROM
sys.dm_os_buffer_descriptors AS bd
INNER JOIN
sys.allocation_units AS au
ON bd.allocation_unit_id = au.allocation_unit_id
INNER JOIN
sys.partitions AS p
ON au.container_id = p.partition_id
INNER JOIN
sys.objects AS o
ON p.object_id = o.object_id
LEFT JOIN
sys.indexes AS i
ON p.index_id = i.index_id AND p.object_id = i.object_id
WHERE
o.object_id = @TableObjectId
GROUP BY
bd.page_type,
o.name,
i.name,
bd.file_id,
bd.page_level;
Finally, we have other situations that Azure SQL Server needs to release memory from the buffer pool and here the counter called Life Page Expectancy is our friend to analyse if we need more memory or not for our current workload needs.
The buffer pool in Azure SQL Server plays a pivotal role in optimizing query performance by reducing the reliance on slower stroage operations. Its management involves a delicate balance between holding frequently accessed data in memory for rapid access and ensuring data persistence and integrity through regular checkpoints. This efficient handling of the buffer pool is central to the performance and reliability of Azure SQL Server databases.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.