Forum Discussion

SAGAR1985's avatar
SAGAR1985
Copper Contributor
Jul 24, 2024

How to create view joining two table of similar records without duplicating.

Hi All, Hope all are doing great. I have a small query regarding the Join function to create view. I have two table in which table 1 is having 40k records and table 2 is having 40k records. But when...
  • SivertSolem's avatar
    SivertSolem
    Jul 24, 2024

    SAGAR1985 I think I might see where the issue lies.

    Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.

     

     

    select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
    --42678 rows

     

     


    If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:

     

     

    Select 
    	Personal_ID, /* The personal_ID */
    	Calendar_day, /* The calendar day */
    	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
    from [dbo].[ARC_SKILL]
    where Calendar_day = '2024-05-31'
    Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
    Order by NumberOfDuplicates Desc /* Largest number on top */
    /* You may have to use 'order by COUNT(Personal_ID) desc' instead */
    
    Select 
    	Personal_ID, /* The personal_ID */
    	Calendar_day, /* The calendar day */
    	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
    from [dbo].[Skill_Analysis]
    where Calendar_day = '2024-05-31'
    Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
    Order by NumberOfDuplicates Desc /* Largest number on top */
    /* You may have to use 'order by COUNT(Personal_ID) desc' instead */

     

     


    In your counting table at the end, we see the columns

    Row LabelView CountActual
    RL1x^2x
    RL2y^2y


    This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].

    For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.

    ----
    In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
    I hope there's another column you can add in the join predicate in order to get the expected result.

Resources