Impact of schema changes in the primary replica on read-only queries in the secondary replica

Microsoft

May 25, 2023

Introduction

In High Availability architectures, every individual database, elastic pool database, and managed instance within the Premium and Business Critical service tiers is provisioned automatically with a primary read-write replica, accompanied by one or more secondary read-only replicas.

To optimize performance and effectively manage application access profiles, a widely adopted practice is to delegate specific read-only workloads, such as reporting and analytics, to the read-only replica. By offloading these workloads, the performance of the primary read-write replica is enhanced, and the system gains the ability to scale out application access profiles.

This practice offers several benefits. Firstly, it improves the overall performance of the High Availability architecture by reducing the load on the primary read-write replica. Secondly, it allows for the efficient utilization of system resources by distributing workloads and leveraging read-only replicas. Furthermore, offloading read-only workloads enables organizations to effectively handle increasing demands and ensure smooth and uninterrupted operations.

By strategically leveraging the primary and secondary replicas in this manner, organizations can achieve optimal performance, scalability, and resource utilization within their High Availability architectures.

Read-write and Read-only Replicas

If the SQL connection string is configured with ApplicationIntent=ReadOnly, the application will be automatically redirected to a read-only replica of that SQL database or managed instance. For information on how to use the ApplicationIntent property, see Specifying Application Intent.

The following diagram illustrates a typical business use case where both OLTP and analytics workloads are configured.

For detailed information on read-only replica, see concept of read-only replica.

The Impact

Commonly, it is believed that a read-only replica is isolated, and queries running on the primary read-write replica won't affect the read-only replica. However, there are scenarios like schema changes, where queries executed on the primary replica can impact the performance of read-only queries on the read replica.

When you create, update or drop an index on the primary replica, it can cause blocking and locking on the secondary replicas. This is because such index tasks require a schema modification lock (Sch-M lock), which is a very high-level lock that is used to protect the schema of the table. When the primary replica is creating or updating the index, it will hold this Sch-M lock on the table, which can prevent other transactions from making schema changes or accessing the table until the lock is released.

The problem occurs when queries are executed on the secondary replicas. These queries need to acquire schema stability locks (Sch-S locks) on the table, which are required to ensure the table's schema does not change during query execution. However, because the primary replica holds the Sch-M lock, these queries cannot acquire the necessary Sch-S locks. As a result, they will be blocked and can result in the observed LCK_M_* waits.

Once the primary replica has finished creating the index and released the Sch-M lock, the Sch-S locks can be acquired, and the blocked queries can resume. However, the length of the blocking and the duration of the LCK_M_* waits can vary depending on the size of the index being created, the number of queries running on the secondary replicas, and the complexity of the queries themselves.

Solution

There are a few potential solutions that can be implemented to avoid blocking issue,

Perform schema modification operations during periods of low usage.
Utilize the Online Index rebuild option.
Use the Partition switching mechanism.

It is important to note that Online Index rebuild is a resource-intensive operation, and it may take longer to rebuild the index when using this option. Additionally, read-only queries can be run on the read replica during the Online Index rebuild process, but there may be some performance impact due to the lack of an index for new data.

For more information and best practices, see Index maintenance in Azure SQL database and Azure SQL Managed Instance

The Partition switching mechanism is another option. This involves creating a standby staging table with the same schema and index as the original table.

The data is loaded into the staging table using bulk load with the index disabled, and then enabled at the end after the rebuild. Finally, the staging table is switched to the main table using partition functionality within a transaction.

A simplified partition-switching syntax involving the whole table using the WAIT_AT_LOW_PRIORITY option,

ALTER TABLE dbo.FactSales_Staging SWITCH TO dbo.FactSales
      WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 2 MINUTES, ABORT_AFTER_WAIT = BLOCKERS));

WAIT_AT_LOW_PRIORITY option allows us to control the switching behavior better, see more on WAIT_AT_LOW_PRIORITY

Feedback and Suggestions

If you have feedback or suggestions for improving this data migration asset, please send an email to Azure Databases SQL Customer Success Engineering Team. Thank you for your support!

Updated May 25, 2023

Version 1.0

database platform cse

database platform cse – sql

nataraje

Microsoft

Joined March 17, 2020

View Profile

Modernization Best Practices and Reusable Assets Blog

Follow this blog board to get notified when there's new activity