Lesson Learned #414: Demystifying Execution Plans and Query Optimization in Azure SQL Database
Published Aug 08 2023 08:06 AM 3,777 Views

In the world of database management, achieving optimal query performance is a crucial goal.As data volumes grow and applications become more complex, understanding how queries are executed and optimized becomes paramount. This is where execution plans come into play. In this article, I would like to give an example, why a query might have a new execution plan. Of course, we have many factors that influence them but I would like to explain a real scenario working with a customer and why the execution plan changed. 

 

What are Execution Plans?

 

An execution plan is a strategic roadmap that the database engine constructs to execute a SQL query efficiently. Think of it as a blueprint that guides the database engine in choosing the best path to retrieve and manipulate data. The execution plan outlines the sequence of operations, such as scanning tables, filtering rows, and joining data, necessary to fulfill the query.

 

Factors Influencing Execution Plans

 

Several factors influence the generation of execution plans, directly impacting query performance:

1. Statistics

Statistics play a crucial role in query optimization. They provide insights into the distribution of data within tables, helping the query optimizer make informed decisions about the most efficient way to access and manipulate data.

2. Indexes

Indexes significantly affect execution plans. They provide shortcuts for retrieving data by pre-sorting and organizing table data. The query optimizer considers index availability and selects plans that utilize them effectively.

3. Data Distribution

The distribution of data across tables affects execution plans. Uneven data distribution can lead to suboptimal plan choices, resulting in slow query performance.

4. Query Complexity

The complexity of a query influences the optimizer's plan selection. Complex queries may have multiple execution plan possibilities, and the optimizer must evaluate trade-offs to determine the most efficient approach.

5. Database Design

A well-designed database schema can promote efficient execution plans. Properly normalized tables and relationships contribute to better query performance.

 

Query Optimization Process

 

The query optimization process in Azure SQL Database involves a series of steps aimed at finding the optimal execution plan:

  1. Parsing: The query is parsed to identify its structure and semantics.

  2. Algebraic Simplification: The query is transformed using algebraic rules to simplify its structure without altering the result.

  3. Logical Optimization: Different logical plans are generated, representing various ways to execute the query.

  4. Physical Optimization: The logical plans are transformed into physical execution plans that outline the specific steps required for data retrieval.

  5. Cost-Based Optimization: The optimizer estimates the cost of each physical plan based on factors like I/O, CPU usage, and memory usage. The plan with the lowest estimated cost is chosen.

 

Capturing and Analyzing Execution Plans

To analyze execution plans, you can use these techniques:

1. SET STATISTICS XML ON

Using the SET STATISTICS XML ON command before executing a query captures the execution plan XML. This XML provides a detailed breakdown of the plan's components, operators, and estimated costs.

2. Query Store

Azure SQL Database offers the Query Store feature, which automatically captures execution plans for queries. It provides historical data on plan changes, query performance, and plan choice reasons.

 

Analyzing Execution Plans

 

Analyzing execution plans involves understanding their components:

  1. Operators: Operators represent specific actions performed on data, such as scans, joins, and filters.

  2. Iterators: Iterators implement operators and execute specific actions on data, following the plan's instructions.

  3. Estimates: Estimates provide information about expected row counts, helping to evaluate plan efficiency.

 

Optimizing Query Performance

To optimize query performance using execution plans:

1. Index Optimization

Create and maintain appropriate indexes based on query patterns to enhance data retrieval efficiency.

2. Statistics Maintenance

Regularly update statistics to ensure the query optimizer has accurate information for plan selection.

3. Query Rewriting

Rewrite complex queries to simplify their structure and reduce plan complexity.

4. Query Hints

Use query hints to guide the optimizer's decision-making process and enforce specific execution plans when necessary.

 

Playing with an example:

 

In this scenario, we have the following table definition:

 

 

 

-------------------------------------------------
-- Different Execution plans due to:
----- auto-created statistics by SQL.
----- Force Plan 
----- An Index 
-------------------------------------------------
DROP TABLE IF EXISTS [dbo].[Notes]

CREATE TABLE [dbo].[Notes](
	[ID] [int] NULL,
	[NAME] [varchar](200) NULL,
	[id2] [int] IDENTITY(1,1) NOT NULL,
 CONSTRAINT [PK_Notes] PRIMARY KEY CLUSTERED ([id2] ASC))

-------------------------------------------------
-- Create the store procedure
--------------------------------------------------
CREATE OR ALTER PROCEDURE dbo.GiveNotes
   @N AS INT = null
AS 
SELECT count(Name),name FROM Notes where ID<@n group by Name

--------------------------------------------------
-- Insert data
--------------------------------------------------
INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) 
INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes

 

 

Also, we have the following Python code:

 

 

import os
#os.environ['TDSDUMP'] = 'stdout'
import pyodbc
import time
import random

def ConnectToTheDB(timeout):
    try:
        print('Connecting to the DB')
        start_time = time.time()    
        conn = pyodbc.connect("DRIVER={ODBC Driver 17 for SQL Server};server=tcp:servername.database.windows.net,1433;UID=XXX;PWD=XXX;database=dbname;APP=Test-Python")  
        conn.timeout=timeout
        print("Connected to the Database %s seconds ---" % ((time.time() - start_time)) )
        return conn
    except BaseException as e:
        print("An error occurred connecting to the DB - " + format(e))
        return 

def RunDifferentExecutionPlan():
    try:
        conn = ConnectToTheDB(0)
        cursor = conn.cursor()
        nLoop=1
        while  nLoop<100000:
            nLoop=nLoop+1
            numero = random.randint(1, 600000)        
            print('Run Different Execution Plan starting up....' + str(numero))
            start_time = time.time()    
            cursor.execute("EXEC dbo.GiveNotes " + str(numero))
            print("- (Value1) Execution Time: %s seconds ---" % ((time.time() - start_time)) )
        conn.close()
    except BaseException as e:
        print("An error executing the command - " + format(e))
    finally:
        print('Run Different Execution Plan finished....')

RunDifferentExecutionPlan()

 

 

 

Starting the Python application, we could see the query is taking a few ms and results are:

 

  • Run Different Execution Plan starting up....120327 - (Value1) Execution Time: 0.04686784744262695 seconds ---
  • Run Different Execution Plan starting up....428186 - (Value1) Execution Time: 0.0344843864440918 seconds ---
  • Run Different Execution Plan starting up....550799 - (Value1) Execution Time: 0.03460264205932617 seconds ---
  • Run Different Execution Plan starting up....138980 - (Value1) Execution Time: 0.034508466720581055 seconds ---
  • Run Different Execution Plan starting up....544059 - (Value1) Execution Time: 0.0342402458190918 seconds ---

Checking the Query Data Store we could see execution plan

 

Jose_Manuel_Jurado_0-1691505610368.png

 

So, in this situation, checking the statistics we could see that Azure SQL Database created two statistics. The distribution of the data is very low:

 

  • One for ID column
  • Another for Name column 

 

So, let's try to add rows here running the following query: INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes

 

Right now, our query took the following time:

 

  • Run Different Execution Plan starting up....264670 - (Value1) Execution Time: 2.550680637359619 seconds ---
  • Run Different Execution Plan starting up....245952 - (Value1) Execution Time: 2.746143102645874 seconds ---
  • Run Different Execution Plan starting up....131653 - (Value1) Execution Time: 2.5940659046173096 seconds ---
  • Run Different Execution Plan starting up....174082 - (Value1) Execution Time: 2.577497959136963 seconds ---

and right now, we could see another execution execution plan:

 

Jose_Manuel_Jurado_1-1691505976708.png

 

Why? we have a new statistics that cause a new statistics for the modifications that we have. 

 

Jose_Manuel_Jurado_2-1691506050688.png

 

Other example that we have is depending on the number of rows in the table, even forcing the execution query, we could have a new execution plan that we need to create a missing index. 

 

Jose_Manuel_Jurado_0-1691506613426.png

 

That's mean that adding a new index, we could pass from 

 

  • Run Different Execution Plan starting up....549809 - (Value1) Execution Time: 54.04367709159851 seconds ---
  • Run Different Execution Plan starting up....442605 - (Value1) Execution Time: 56.18192672729492 seconds ---
  • Run Different Execution Plan starting up....251577 - (Value1) Execution Time: 34.42381572723389 seconds ---

 

to 

 

  • Run Different Execution Plan starting up....442605 - (Value1) Execution Time: 2.18192672729492 seconds ---
  • Run Different Execution Plan starting up....251577 - (Value1) Execution Time: 2.42381572723389 seconds ---

Enjoy!

1 Comment
Version history
Last update:
‎Aug 08 2023 08:09 AM
Updated by: