Forum Discussion

GooTen's avatar
GooTen
Copper Contributor
Jan 18, 2024

Assistance Requested for SQL Execution Efficiency Issue

I am experiencing a significant difference in execution times between two SQL queries. The first query (sql1) takes about 30 seconds to execute, while the second query (sql2) completes in just 0.1 seconds.

sql1:

 

 

SELECT
    top 20 row_number() over (ORDER BY User.code Asc) as rowno,
    User.id AS ID,
    User.code AS Code,
    User.name_CHS AS Name,
    User.usergroup AS UserGroup,
    User.sysorgid AS SysOrgId,
    User.tenantid AS TenantId,
    User.seclevel AS SecLevel,
    User.usertype AS UserType,
    User.note AS Note,
    User.LastLoginTime AS LastLoginTime,
    User.OrgIdPath AS OrgIdPath
FROM
    User
WHERE
    User.id IN (
        SELECT DISTINCT makerid FROM f2022 WHERE ledger = 'b4cb0d26-dd2f-4ae4-bb7a-861ba9dc2fbb'
        UNION
        SELECT DISTINCT makerid FROM f2023 WHERE ledger = 'b4cb0d26-dd2f-4ae4-bb7a-861ba9dc2fbb'
    )
ORDER BY
    User.code Asc;

 

 

 

sql2:

 

 

SELECT
    top 20 row_number() over (ORDER BY User.code Asc) as rowno,
    User.id AS ID,
    User.code AS Code,
    User.name_CHS AS Name,
    User.usergroup AS UserGroup,
    User.sysorgid AS SysOrgId,
    User.tenantid AS TenantId,
    User.seclevel AS SecLevel,
    User.usertype AS UserType,
    User.note AS Note,
    User.LastLoginTime AS LastLoginTime,
    User.OrgIdPath AS OrgIdPath
FROM
    User
WHERE
    User.id IN (
        SELECT DISTINCT makerid FROM f2022 WHERE ledger = 'b4cb0d26-dd2f-4ae4-bb7a-861ba9dc2fbb'
    )
ORDER BY
    User.code Asc;

 

 

 

 

The main difference between these two queries is the inclusion of a UNION operator in the subquery of sql1, whereas sql2 does not have this. Additionally, the data volume in both f2022 and f2023 is roughly equivalent.

I am seeking assistance to understand why there is such a large discrepancy in execution times and how I might optimize the first query. Thank you in advance for your help.

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    GooTen 

     

    Hi, Chen.

     

    Assuming these queries are against two tables contained in the same database:

     

    • If you run the second query but change the table from [fy2022] to [fy2023], does it still run fast or slow the first time around?
    • If it's slow the first time around, check that you have an index (NONCLUSTERED would almost certainly be the preferable type) on the [makerid] column of the [fy2023] table.

     

    It's pure guesswork on my part but it sounds like your [fy2022] table has an index on the [makerid] column while your [fy2023] table does not.

     

    Cheers,

    Lain

    • GooTen's avatar
      GooTen
      Copper Contributor

      LainRobertson 

      Thank you for your response.

      1. Even after switching the tables in the first query, the execution remains fast.
      2. In both f2022 and f2023, there are no indexes on makerid.

      I can upload screenshots of the execution plans for these queries, but they are in Chinese. I'm not sure if you'll be able to understand them.

      sql1:

       

      sql2:

       

      • LainRobertson's avatar
        LainRobertson
        Silver Contributor

        GooTen 

         

        Hi, Chen.

         

        These seem to be the estimated execution plans. Are you able to instead include the actual execution plans?

         

        Here's the button from SSMS for including actual execution plans.

         

         

        The actual execution plan shows how much real time (in milliseconds) was spent at each stage:

         

         

         

        If you can also add the following two lines to the end of each query after the "SORT BY" statement - only while you're capturing the actual execution plans (remove it again afterwards), that would be appreciated:

         

        OPTION
        	(RECOMPILE);

         

        Also, I have to correct myself as I wasn't paying proper attention to what I had written about the indexes.

         

        In my previous reply, I've incorrectly asked about an index on [makerid] when I'd meant to refer to the [ledger] column.

         

        Similarly, is the [id] column in the [User] table indexed?

         

        Your execution plans do involve a lot of table scanning, however, this may not be an issue. It just means it might be worth looking into, and the actual execution plan will help quantify that.

         

        Cheers,

        Lain

  • olafhelper's avatar
    olafhelper
    Bronze Contributor

    The main difference between these two queries is the inclusion of a UNION operator in the subquery of sql1,


    A UNION operator performs always a sort operator to remove duplicates.

    In you subquery + EXISTS duplicates don't matter, so change it to UNION ALL to avoid a sort.

    See UNION (Transact-SQL) - SQL Server | Microsoft Learn

    • UNION ALL - Includes duplicates.
    • UNION - Excludes duplicates.
    • GooTen's avatar
      GooTen
      Copper Contributor
      I have tried using UNION ALL, as well as changing the query to use EXISTS or JOIN, but the results were the same. I have added screenshots of the execution plans above. Could you please take a look and help me understand?
      • LainRobertson's avatar
        LainRobertson
        Silver Contributor

        GooTen 

         

        Hi, Chen.

         

        The actual execution plans confirm that you should have a NONCLUSTERED index on the [ledger] column - in both tables. Half the long-running time is spent running the table scans on the two GLAccount tables.

         

        Unfortunately, while we can see that you have some indexes on each table, we can't see what columns are included in those index definitions. However, it seems safe to say that none of them apply to the [ledger] column given it features as a suggestion - the green text near the top - in the second execution plan.

         

        So, there's two things I'd be looking to do here:

         

        1. Create a new NONCLUSTERED index on the 2022 and 2023 tables;
        2. Re-work the query to use JOINs (not required but my personal preference for this scenario).

         

        Index creation

         

        CREATE NONCLUSTERED INDEX
        	IX_admin_F2022_ledger
        ON	
        	F2022 (ledger)
        INCLUDE
        	(makerid);
        GO
        
        CREATE NONCLUSTERED INDEX
        	IX_admin_F2023_ledger
        ON	
        	F2023 (ledger)
        INCLUDE
        	(makerid);
        GO

         

         

        JOIN-based query

         

        SELECT
        	DISTINCT
            [User].[id] AS [ID]
            , [User].[code] AS [Code]
            , [User].[name_CHS] AS [Name]
            , [User].[usergroup] AS [UserGroup]
            , [User].[sysorgid] AS [SysOrgId]
            , [User].[tenantid] AS [TenantId]
            , [User].[seclevel] AS [SecLevel]
            , [User].[usertype] AS [UserType]
            , [User].[note] AS [Note]
            , [User].[LastLoginTime] AS [LastLoginTime]
            , [User].[OrgIdPath] AS [OrgIdPath]
        FROM
        	[User]
        	LEFT OUTER JOIN [F2022] ON
        		[User].[id] = [F2022].makerid
        	LEFT OUTER JOIN [F2023] ON
        		[User].[id] = [F2023].makerid
        WHERE
        	(
        		[F2022].[ledger] = 'b4cb0d26-dd2f-4ae4-bb7a-861ba9dc2fbb'
        		AND [F2022].[makerid] IS NOT NULL
        	)
        	OR
        	(
        		[F2023].[ledger] = 'b4cb0d26-dd2f-4ae4-bb7a-861ba9dc2fbb'
        		AND [F2023].[makerid] IS NOT NULL
        	)
        ORDER BY
        	[User].[id]
        OPTION
        	(RECOMPILE);

         

         

        Note: I didn't include the TOP 20 as I'm curious how it performs across the whole data set, however, you can easily add the "TOP 20" statement immediately below the "DISTINCT" statement if you like.

         

        Once you've created those indexes, you can see how your original queries perform, and optionally how the suggested example I've supplied above performs (I've left the OPTION (RECOMPILE) for testing purposes).

         

        Cheers,

        Lain

Resources