In the world of database management, achieving optimal query performance is a crucial goal.As data volumes grow and applications become more complex, understanding how queries are executed and optimized becomes paramount. This is where execution plans come into play. In this article, I would like to give an example, why a query might have a new execution plan. Of course, we have many factors that influence them but I would like to explain a real scenario working with a customer and why the execution plan changed.
An execution plan is a strategic roadmap that the database engine constructs to execute a SQL query efficiently. Think of it as a blueprint that guides the database engine in choosing the best path to retrieve and manipulate data. The execution plan outlines the sequence of operations, such as scanning tables, filtering rows, and joining data, necessary to fulfill the query.
Several factors influence the generation of execution plans, directly impacting query performance:
Statistics play a crucial role in query optimization. They provide insights into the distribution of data within tables, helping the query optimizer make informed decisions about the most efficient way to access and manipulate data.
Indexes significantly affect execution plans. They provide shortcuts for retrieving data by pre-sorting and organizing table data. The query optimizer considers index availability and selects plans that utilize them effectively.
The distribution of data across tables affects execution plans. Uneven data distribution can lead to suboptimal plan choices, resulting in slow query performance.
The complexity of a query influences the optimizer's plan selection. Complex queries may have multiple execution plan possibilities, and the optimizer must evaluate trade-offs to determine the most efficient approach.
A well-designed database schema can promote efficient execution plans. Properly normalized tables and relationships contribute to better query performance.
The query optimization process in Azure SQL Database involves a series of steps aimed at finding the optimal execution plan:
Parsing: The query is parsed to identify its structure and semantics.
Algebraic Simplification: The query is transformed using algebraic rules to simplify its structure without altering the result.
Logical Optimization: Different logical plans are generated, representing various ways to execute the query.
Physical Optimization: The logical plans are transformed into physical execution plans that outline the specific steps required for data retrieval.
Cost-Based Optimization: The optimizer estimates the cost of each physical plan based on factors like I/O, CPU usage, and memory usage. The plan with the lowest estimated cost is chosen.
To analyze execution plans, you can use these techniques:
Using the SET STATISTICS XML ON
command before executing a query captures the execution plan XML. This XML provides a detailed breakdown of the plan's components, operators, and estimated costs.
Azure SQL Database offers the Query Store feature, which automatically captures execution plans for queries. It provides historical data on plan changes, query performance, and plan choice reasons.
Analyzing execution plans involves understanding their components:
Operators: Operators represent specific actions performed on data, such as scans, joins, and filters.
Iterators: Iterators implement operators and execute specific actions on data, following the plan's instructions.
Estimates: Estimates provide information about expected row counts, helping to evaluate plan efficiency.
To optimize query performance using execution plans:
Create and maintain appropriate indexes based on query patterns to enhance data retrieval efficiency.
Regularly update statistics to ensure the query optimizer has accurate information for plan selection.
Rewrite complex queries to simplify their structure and reduce plan complexity.
Use query hints to guide the optimizer's decision-making process and enforce specific execution plans when necessary.
In this scenario, we have the following table definition:
-------------------------------------------------
-- Different Execution plans due to:
----- auto-created statistics by SQL.
----- Force Plan
----- An Index
-------------------------------------------------
DROP TABLE IF EXISTS [dbo].[Notes]
CREATE TABLE [dbo].[Notes](
[ID] [int] NULL,
[NAME] [varchar](200) NULL,
[id2] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_Notes] PRIMARY KEY CLUSTERED ([id2] ASC))
-------------------------------------------------
-- Create the store procedure
--------------------------------------------------
CREATE OR ALTER PROCEDURE dbo.GiveNotes
@N AS INT = null
AS
SELECT count(Name),name FROM Notes where ID<@n group by Name
--------------------------------------------------
-- Insert data
--------------------------------------------------
INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1)
INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes
Also, we have the following Python code:
import os
#os.environ['TDSDUMP'] = 'stdout'
import pyodbc
import time
import random
def ConnectToTheDB(timeout):
try:
print('Connecting to the DB')
start_time = time.time()
conn = pyodbc.connect("DRIVER={ODBC Driver 17 for SQL Server};server=tcp:servername.database.windows.net,1433;UID=XXX;PWD=XXX;database=dbname;APP=Test-Python")
conn.timeout=timeout
print("Connected to the Database %s seconds ---" % ((time.time() - start_time)) )
return conn
except BaseException as e:
print("An error occurred connecting to the DB - " + format(e))
return
def RunDifferentExecutionPlan():
try:
conn = ConnectToTheDB(0)
cursor = conn.cursor()
nLoop=1
while nLoop<100000:
nLoop=nLoop+1
numero = random.randint(1, 600000)
print('Run Different Execution Plan starting up....' + str(numero))
start_time = time.time()
cursor.execute("EXEC dbo.GiveNotes " + str(numero))
print("- (Value1) Execution Time: %s seconds ---" % ((time.time() - start_time)) )
conn.close()
except BaseException as e:
print("An error executing the command - " + format(e))
finally:
print('Run Different Execution Plan finished....')
RunDifferentExecutionPlan()
Starting the Python application, we could see the query is taking a few ms and results are:
Checking the Query Data Store we could see execution plan
So, in this situation, checking the statistics we could see that Azure SQL Database created two statistics. The distribution of the data is very low:
So, let's try to add rows here running the following query: INSERT INTO Notes (ID,Name) SELECT RAND()*(100000 - 1) + 1, 'Info:'+convert(varchar(200),RAND()*(100000 - 1) + 1) FROM Notes
Right now, our query took the following time:
and right now, we could see another execution execution plan:
Why? we have a new statistics that cause a new statistics for the modifications that we have.
Other example that we have is depending on the number of rows in the table, even forcing the execution query, we could have a new execution plan that we need to create a missing index.
That's mean that adding a new index, we could pass from
to
Enjoy!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.