Author(s): Sarath Sasidharan is a Program Manager in Azure Synapse Customer Success Engineering (CSE) team.
Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements. The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics
Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back. Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.
This architecture represents a query submission done via a user/client application. The following steps are highlighted in bold numbers :
In order to understand this, let's dive into each of these steps in detail.
This is the first major step in your query lifecycle. Different steps are executed here:
Parsing and binding are usually quick and done in no time. The third step could at times be a bottleneck. If you have a very complicated query, with a large number of tables involved the search space calculation can cause delays. The algorithms are smart enough to handle many complicated scenarios and have thresholds on when to stop calculations. However, in some scenarios, bad queries can impact this.
The best fix for this step is to see if the query quality can be improved. If this has already been done, you may consider adding the FORCE ORDER hint to your query. Force order will enforce the order of joins to be executed as written as opposed to the optimizer reordering joins to attempt optimization. For memo compilation, this hint also reduces the search space of possible plans so there is less work to do in this step. This needs to be used with caution since in most scenarios it's best to leave it to the Query optimizer to decide the best plan.
Memo compilation results from the previous step are used as a starting point for this stage. The results are used to :
This step is where the actual execution of the query starts. The DSQL plan generated in the previous step contains SQL/DMS statements which are sent to the compute nodes. Based on the plan, there are instructions on where these statements need to run and if it is a data movement step or a SQL step.
This step contributes to a majority of query performance issues. They are mostly caused due to the wrong usage of the platform and poor query quality.
This concludes the query lifecycle in Synapse. These steps and remediations have been applied to our customers and we have seen massive improvements in performance and efficient usage of the platform.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.