Recommendations for Index Maintenance with AlwaysOn Availability Groups
Published Jan 15 2019 04:27 PM 34.2K Views
Microsoft

First published on MSDN on Mar 03, 2015

 

 

SYMPTOMS

 

Consider the following scenario

 

  • The database is part of AlwaysOn Availability Groups
  • You run long and log-intensive transactions like Index maintenance/rebuilds

 

You observe one or more of the following symptoms:

 

  • Poor performing DML operations in availability databases on the primary replica if synchronous secondary replicas are present.
  • Huge log send queue on the primary replica.
  • Considerable log growth in those availability databases in which the index rebuild occurs
  • Redo backlog in secondary replicas.

 

CAUSE

 

Large maintenance operations like ALTER INDEX or CREATE INDEX can generate huge amounts of logged changes by the nature of the operation. These transactions can utilize parallelism to use multiple threads generating logged changes to accomplish the large transaction. This is in addition to the log generated by the regular day to day operations of your application.

 

For synchronous commit environments, these large transactions can create contention with other production changes, reducing the overall performance of the primary replica. Synchronous-commit mode emphasizes high availability over performance, at the cost of increased transaction latency. Under synchronous commit mode, SQL Server waits to send the transaction confirmation to the client until the secondary replica has hardened the log to disk.  All this logged activity is being captured by a per-database single threaded log capture thread. In addition, encryption and compression routines are also single threaded. When these single-threaded routines are processing very large amounts of logged activity, your application transactions may suffer performance degradation.

 

Log-intensive transactions like rebuilding indexes can cause the log file to grow significantly as the log cannot be truncated until redo has completed the changes in all secondary replicas.

 

 

MITIGATION

 

Transactions like ALTER INDEX or CREATE INDEX are log intensive by nature. You can't eliminate the log records generated but you can do intelligent maintenance to reduce the impact of the index rebuild on production activities.  Here are some steps to minimize the impact:

 

Mitigation Steps From Index Maintenance Perspective

 

  • Do you need to rebuild indexes in the first place? Most environments do not require index rebuilding in most cases. The reason is that indexes are read from disk the first time but are later accessed in memory. Therefore, any non-contiguous order of index pages on disk becomes irrelevant in a large percentage of the time.
  • Consider primarily updating statistics: statistics update with full scan is the primary benefit from index maintenance. Index fragmentation is rarely, if ever, the root cause for slow performance issues. Again, once index pages are read into memory from disk, on-disk fragmentation has little significance. 

 

In the infrequent cases where you do need to reorganize or re-build index, consider these:

 

  • Run index maintenance during off peak period.
  • Frequency of the maintenance should be considered with the goal of minimizing impact of log generation. You can do maintenance in phases, where each maintenance phase covers a subset of indexes at a particular time, instead of considering all indexes in a single go.
  • Rebuild indexes based on true need / impact to your production environment. The following scripts offer such dynamic index maintenance (defrag at a certain percentage of fragmentation or rebuild at a higher fragmentation). You can use one of these to accomplish this. But again, consider carefully if you need to anything more than keeping statistics up to date. These scripts are offered for convenience and are not supported by Microsoft so use after testing and validation on your side:

 

  • Use MAXDOP setting in the ALTER INDEX command to reduce the concurrent index rebuild activity. Operations such as creating, rebuilding, or dropping indexes can be resource intensive and can cause insufficient resources for other applications and database operations for the duration of the index operation. When this problem occurs, you can manually configure the maximum number of processors that are used to run the index statement by limiting the number of processors to use for the index operation. Lower MADOP setting can also reduce fragmentation with online Index rebuild operation. Below is an example of ALTER INDEX  REBUILD with MAXDOP =1:
    USE [AdventureWorks2014]
    GO
    ALTER INDEX [PK_Employee_BusinessEntityID]
    ON [HumanResources].[Employee]
    REBUILD WITH (MAXDOP=1, ONLINE= ON);
    GO​

 

 

 

  • Consider using table partitioning. This way you can rebuild portions of the index piece by piece by using ALTER INDEX…REBUILD PARTITION. SQL Server 2014 supports the ONLINE rebuild of partitioned indexes. Below is a sample script to rebuild only on partition 1:
    USE [AdventureWorks2014]
    GO
    ALTER INDEX [IX_TransactionHistory_ProductID]
    ON [Production].[TransactionHistory]
    REBUILD PARTITION = 1
    WITH (ONLINE = ON);
    GO​
  • For SQL server 2014 and later versions, additional ALTER INDEX parameters like WAIT_AT_LOW_PRIORITY, MAX_DURATION, and ABORT_AFTER_WAIT can also be used. Following example rebuilds index with the ONLINE option including the low priority lock options:

 

 

 

 

 

USE [AdventureWorks2014]
GO
ALTER INDEX ALL ON Production.Product
REBUILD WITH
(ONLINE = ON ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 4 MINUTES, ABORT_AFTER_WAIT = BLOCKERS )));
GO

 

 

 

 

 

 

 

Mitigation Steps From Availability Group Perspective

 

For synchronous commit environments , before issuing long and log-intensive transactions like ALTER INDEX or CREATE INDEX, you may additionally consider to make all synchronous replicas asynchronous to help reduce the transactional latency. Once the index rebuild transactions are completed, the commit mode should be switched back to synchronous.

 

For both Synchronous AND Asynchronous environments, in general any step that can help with redo performance will positively impact long and log-intensive transactions on AG environments. Here are some key points to keep in mind:

 

  • A busy secondary such as a secondary with resource bottleneck or a large reporting workload on the secondary replica, can slow down the performance of the secondary replica because of resource contention, and the redo thread can fall behind. The redo thread on the secondary replica can also be blocked from making data definition language (DDL) changes by a long-running read-only query. The diagnosis and mitigations steps for these issues are discussed in the article Troubleshoot: Availability Group Exceeded RTO .
  • We recommend that you always switch to the High Performance power plan for all replica machines for all operating systems. In some circumstances this can have better performance for the single threaded redo process.

 

REFERENCES

 

2 Comments
Copper Contributor

The supposed "Best Practices" are not best practices and we're never meant to be taken as "Best Practices".  The recommendation for the original author and creator of REORGANIZE and REBUILD is to, and I quote... "take those numbers with a pinch of salt and don’t treat them as absolute."

https://www.sqlskills.com/blogs/paul/where-do-the-books-online-index-fragmentation-thresholds-come-f...

 

I'll also tell you that REORGANIZE does NOT work the way most people think it does.  It does NOT "follow the Fill Factor" in any way ever close to a REBUILD.  Instead, all it does is it tries to shrink the index by combining "some" pages UP TO the Fill Factor.  It does NOT create any Free Space above the Fill Factor.  The bad part about that is that 99.99% of the people in the world use 5 or 10% to 30% LOGICAL fragmentation to identify when REORGANIZE should be done and they do a REBUILD only if the LOGICAL fragmentation goes over 30%.  Since REORGANIZE DOES remove LOGICAL fragmentation, it very quickly gets "stuck" on indexes and they frequently never be rebuilt ever again.  That means that no Free Space is created above the Fill Factor.  That means that virtually every index you lowered the Fill Factor to prevent fragmentation is NOT working correctly and the fragmentation such an action was supposed to prevent is now perpetual and actually gets worse after each REORGANIZE because all the critical space above the Fill Factor is permanently full because REORGANIZE removes free space below the Fill Factor and compresses pages.

 

The phenomena prevents perfect wide or random distribution indexes from working correctly.  For example, a correctly maintained Random GUID index (and I have the proof in multiple repeatable tests) can withstand 100,000 per day inserts for 58 days (that's a total of 5.8 MILLION rows, folks!) with less than 1% logical fragmentation and absolutely no index maintenance during that entire period.  The way I did that was to STOP using REORGANIZE.  In a parallel test, REORGANIZE cause fairly massive and perpetual page splits all day every day during that same 58 day period along with fragmentation going over 5% and needing another REORGANIZE every 4 days.

 

During that same testing, it was proven that it's actually better to do NO index maintenance than it is to ever use REORGANIZE.  The only 2 places that you should use reorganize is to compress LOBs (it even sucks at that and needs a rebuild right afterwards) and to remove physical fragmentation on an Insert Hot Spot cause by an Insert/Update pattern on ever increasing indexes.  And, other fragmenting indexes are also affected in a similar manner... not just Random GUIDs.

 

For anyone interested (and, no.... this is NOT meant to be any form of spam), I'm giving the hour long version of my 2.5 hour presentation on the subject on the 28th of July at "EightKB".  You can register to attend at the following URL...

https://eightkb.online/

 

--Jeff Moden

 

Copper Contributor

As you can see from my first comment above, this article has been drastically changed for the better since that comment.  It's a real shame that we can't view the edit stream(s).  Not only would it interesting as to how this article reached its current state but it would be interesting for some to see what the previous states were.  Last and certain not the least, it would be nice to see who made what changes over time..

Co-Authors
Version history
Last update:
‎Mar 13 2023 03:26 PM
Updated by: