Forum Discussion

dhanashreeee's avatar
dhanashreeee
Copper Contributor
Jan 13, 2024
Solved

A very basic SQL join question

Hi everyone,

 

Apologizing in advance for this very basic SQL join question. This gets asked in the interviews a lot and I get confused everytime.Perhaps my basics aren't as clear. So I was hoping anyone would help me clear this. I have tried performing this partically but it confuses me even more. There are two tables. Table 1 with one column having values 1,1,2,2 and Table 2 with one column having values 1,1,1,2,2,2. What would the output of right join and left join be for this?

  • dhanashreeee 

     

    Hi, Dhanashree.

     

    Perhaps it's intended as something of a trick question where the answer is "both the right and left join queries produce the same output".

     

    In fact, because [table1] and [table2] contain exactly the same distinct values (being '1' and '2'), you will get the same output for all the "normal" join types (INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER). Only CROSS JOIN will give you a different result, but they're not asking about CROSS JOIN so I won't go into that.

     

    The queries will produce twelve rows:

     

    • There are two '1' values in [table1] and three matching '1' values in [table2] (i.e. 2 rows x 3 rows = 6 rows);
    • There are two '2' values in [table1] and three matching '2' values in [table2] (i.e. 2 rows x 3 rows = 6 rows)
    • 6 rows + 6 rows = 12 rows.

     

    So, the key takeaway here is that if they given you the following:

     

    • Both tables contain exactly the same values (doesn't matter how many times each is repeated) and no extra values that only exist in one of the tables;
    • The queries are using LEFT, RIGHT, INNER or OUTER joins.

     

    And they then ask you "what would the output be", the answer is "the output is the same for both queries".

     

    Additional information

    None of what's below is necessary for answering your question, but might help if you want to understand joins a bit better. There's also some links to relevant Microsoft documentation:

     

     

    First, here's some T-SQL for creating the tables and queries you've alluded to in your question.

     

    CREATE TABLE [table1]
    (
    	[id] [int]
    )
    GO
    
    CREATE TABLE [table2]
    (
    	[id] [int]
    )
    GO
    
    INSERT INTO
    	[table1]
    VALUES
    	(1)
    	, (1)
    	, (2)
    	, (2)
    GO
    
    INSERT INTO
    	[table2]
    VALUES
    	(1)
    	, (1)
    	, (1)
    	, (2)
    	, (2)
    	, (2)
    GO
    
    -- Query 1 showing the LEFT OUTER JOIN.
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] LEFT OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO
    
    -- Query 2 showing the RIGHT OUTER JOIN.
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] RIGHT OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output (from both queries)

    As you can see, the output is the same for both queries (Query 1 is top; query 2 is bottom).

     

     

    This because every distinct value from [table1] has a match in [table2] and vice versa.

     

    Using a simple mathematics analogy, it's like asking how is the answer for "2 x 3" different to "3 x 2", for which the answer is "there is no difference as they are the same".

     

    So, what's the difference between LEFT OUTER and RIGHT OUTER?

     

    First, let's now add a new value to [table1] and then a different new value to [table2].

     

    -- Add a new value to [table1].
    INSERT INTO
    	[table1]
    VALUES
    	(3);
    
    -- Add a new value to [table2].
    INSERT INTO
    	[table2]
    VALUES
    	(4);

     

    And then re-run queries 1 and 2 from above.

     

    New output

     

    Notice how row 13 contains the new value from the relevant table, but a value of NULL from the partner table?

     

    If we look at [table1], query 1 (the LEFT OUTER JOIN scenario) and the matching new output, the reason we see row 13 is because LEFT OUTER is saying "I want to see ALL values from this table, even if there's no matches from the other table".

     

    So, SQL Server dutifully shows is the new value of '3' we added to [table1] and a NULL from [table2], since there is no value of '3' to be matched in [table2]. i.e. we see all values from [table1] but only those that match from [table2].

     

    And it's the same in reverse for the [table2], query 2 and it's associated output where we're using the RIGHT OUTER to instruct SQL Server to include all values from [table2] but only those that match from [table1] (hence '4' is shown from [table2] with a NULL from [table1].

     

    If we run an INNER JOIN now - after having added the new values above, then those new values are excluded since INNER JOIN tells SQL Server to only include rows where the value exists in both tables.

     

    Example

    -- Query 3
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] INNER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output

     

    Conversely, if we use FULL OUTER, we're asking SQL Server to include all values from both tables even where there are no matches in the opposing table, which results in the inclusion of both of the new values we added above.

     

    Example

    -- Query 4
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] FULL OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output

     

    Cheers,

    Lain

2 Replies

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    dhanashreeee 

     

    Hi, Dhanashree.

     

    Perhaps it's intended as something of a trick question where the answer is "both the right and left join queries produce the same output".

     

    In fact, because [table1] and [table2] contain exactly the same distinct values (being '1' and '2'), you will get the same output for all the "normal" join types (INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER). Only CROSS JOIN will give you a different result, but they're not asking about CROSS JOIN so I won't go into that.

     

    The queries will produce twelve rows:

     

    • There are two '1' values in [table1] and three matching '1' values in [table2] (i.e. 2 rows x 3 rows = 6 rows);
    • There are two '2' values in [table1] and three matching '2' values in [table2] (i.e. 2 rows x 3 rows = 6 rows)
    • 6 rows + 6 rows = 12 rows.

     

    So, the key takeaway here is that if they given you the following:

     

    • Both tables contain exactly the same values (doesn't matter how many times each is repeated) and no extra values that only exist in one of the tables;
    • The queries are using LEFT, RIGHT, INNER or OUTER joins.

     

    And they then ask you "what would the output be", the answer is "the output is the same for both queries".

     

    Additional information

    None of what's below is necessary for answering your question, but might help if you want to understand joins a bit better. There's also some links to relevant Microsoft documentation:

     

     

    First, here's some T-SQL for creating the tables and queries you've alluded to in your question.

     

    CREATE TABLE [table1]
    (
    	[id] [int]
    )
    GO
    
    CREATE TABLE [table2]
    (
    	[id] [int]
    )
    GO
    
    INSERT INTO
    	[table1]
    VALUES
    	(1)
    	, (1)
    	, (2)
    	, (2)
    GO
    
    INSERT INTO
    	[table2]
    VALUES
    	(1)
    	, (1)
    	, (1)
    	, (2)
    	, (2)
    	, (2)
    GO
    
    -- Query 1 showing the LEFT OUTER JOIN.
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] LEFT OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO
    
    -- Query 2 showing the RIGHT OUTER JOIN.
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] RIGHT OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output (from both queries)

    As you can see, the output is the same for both queries (Query 1 is top; query 2 is bottom).

     

     

    This because every distinct value from [table1] has a match in [table2] and vice versa.

     

    Using a simple mathematics analogy, it's like asking how is the answer for "2 x 3" different to "3 x 2", for which the answer is "there is no difference as they are the same".

     

    So, what's the difference between LEFT OUTER and RIGHT OUTER?

     

    First, let's now add a new value to [table1] and then a different new value to [table2].

     

    -- Add a new value to [table1].
    INSERT INTO
    	[table1]
    VALUES
    	(3);
    
    -- Add a new value to [table2].
    INSERT INTO
    	[table2]
    VALUES
    	(4);

     

    And then re-run queries 1 and 2 from above.

     

    New output

     

    Notice how row 13 contains the new value from the relevant table, but a value of NULL from the partner table?

     

    If we look at [table1], query 1 (the LEFT OUTER JOIN scenario) and the matching new output, the reason we see row 13 is because LEFT OUTER is saying "I want to see ALL values from this table, even if there's no matches from the other table".

     

    So, SQL Server dutifully shows is the new value of '3' we added to [table1] and a NULL from [table2], since there is no value of '3' to be matched in [table2]. i.e. we see all values from [table1] but only those that match from [table2].

     

    And it's the same in reverse for the [table2], query 2 and it's associated output where we're using the RIGHT OUTER to instruct SQL Server to include all values from [table2] but only those that match from [table1] (hence '4' is shown from [table2] with a NULL from [table1].

     

    If we run an INNER JOIN now - after having added the new values above, then those new values are excluded since INNER JOIN tells SQL Server to only include rows where the value exists in both tables.

     

    Example

    -- Query 3
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] INNER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output

     

    Conversely, if we use FULL OUTER, we're asking SQL Server to include all values from both tables even where there are no matches in the opposing table, which results in the inclusion of both of the new values we added above.

     

    Example

    -- Query 4
    SELECT
    	[table1].[id] AS [table1.id]
    	, [table2].[id] AS [table2.id]
    FROM
    	[table1] FULL OUTER JOIN
    	[table2] ON [table1].[id] = [table2].[id]
    GO

     

    Output

     

    Cheers,

    Lain

    • dhanashreeee's avatar
      dhanashreeee
      Copper Contributor

      LainRobertson Thank you very much and giving such a detailed reply to my query. This is so much helpful and clarifies so many things for me. :smile: God bless you...

Resources