Today, I worked on a very interesting performance service request when our customer is using Linked Server and/or Elastic Query joining with other local and remote queries. Following, I would like to share with you my lessons learned about the tests done.
We have the following scenario:
- TableRemote1 that contains 22M rows running in a SQL Server remote server
- TableRemote2 that contains 10M rows running in a SQL Server remote server.
- TableLocal that contains 10M rows running in a SQL Server local server.
Test 1: Running the following query:
SELECT TOP 100000 * FROM TableRemote1
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1
The execution plan was:
- Run a query against SQL Server Remote Server to obtain all the rows of TableRemote1
- After it, join the data using Col1 with TableLocal Table.
Test 2: Running the following query:
SELECT TOP 100000 * FROM TableRemote1
INNER JOIN TableRemote2 ON TableRemote1.Col1 = TableRemote2.Col1
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1
The execution plan was:
- SQL SERVER sends the following query to the SQL SERVER remote server: SELECT ... FROM TableRemote1,TableRemote2 WHERE TableRemote1.Col1 = TableRemote2.Col1. which is great in terms of performance.
- After it, join the data using Col1 with TableLocal Table.
Test 3: Running the following query:
SELECT TOP 100000 * FROM TableRemote1
LEFT JOIN TableRemote2 ON TableRemote1.Col1 = TableRemote2.Col1
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1
The execution plan was:
- SQL SERVER sends two queries:
- First, obtain all the rows of the following query to SELECT ... FROM TableRemote1
- Once the first query finished, runs, SELECT .. FROM TableRemote2
- After it, join the data using Col1 with TableLocal Table.
- Here, I saw an difference between Elastic Query and Linked Server. In Linked Server seems that we have some improvements that we don't have in Elastic Query, for example:
- Running some DBCC SHOWSTATISTICS to review the cardinality.
- Seems that the queries against TableRemote1 and TableRemote2 are running in parallel meanwhile Elastic Query is running one by one.
So, my lessons learned after reviewing these scenarios:
- Use sp_execute_remote for Elastic Database Query and OpenQuery for Linked Server to have a single resulset from SQL Server Remote Server, will be the best and faster solution, because the TSQL will force query execution on the remote server.
- If could be possible, try to allocate all the tables that you need to run the left join in an unique server (local/remote).
Enjoy!
Updated Mar 15, 2022
Version 2.0Jose_Manuel_Jurado
Microsoft
Joined November 29, 2018
Azure Database Support Blog
Follow this blog board to get notified when there's new activity