Forum Discussion
How to create view joining two table of similar records without duplicating.
Hi All,
Hope all are doing great. I have a small query regarding the Join function to create view. I have two table in which table 1 is having 40k records and table 2 is having 40k records. But when i am joining the both to create view, it is showing 47k records. Please note that i am combining table using two common column in both, one is "Calendar_Date" and one is "Personal_ID". In some cases few Personal ID has duplicate records for each month. So while creating the view the, these duplicate records gets multiplied . Hence the total records exceeds 47k. How to make it same 40k records which i am having in each table.
Any help would be highly appreciated.
Regards,
Sagar
SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31'; --42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[ARC_SKILL] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */ Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columnsRow Label View Count Actual RL1 x^2 x RL2 y^2 y
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.
- SivertSolemIron Contributor
Hi,
It would be helpful if you posted your schemas and/or your actual query, together with some sample data that illustrates the problem.
I expect there's something with your join predicate, but without seeing it I'm unable to give assistance other than referring to general training on JOINs.
Combine multiple tables with JOINs in T-SQL - Training | Microsoft Learn
How to Join on Multiple Columns | LearnSQL.com- SAGAR1985Copper Contributor
Thanks SivertSolem,
I can share the code and few sample data with it.
The below is my table 1 total no of rows indicated after --.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';--42678
One more table has no of rows given after --
select * from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31';--42678
In both there 42678 no of records/rows. But when i have created View with Left Join , the no of rows is showing 47390.
select * from [dbo].[VW_ARC_SKILL] where Calendar_day = '2024-05-31';--47390
Now coming to my View code. Please find it below.
------
ALTER VIEW [dbo].[VW_ARC_SKILL]
AS
SELECT
B.Calendar_day AS Calendar_day,
B.Employee_Name AS Employee_Name,
B.Personal_ID AS Personal_ID,
B.Sub_Band AS Sub_Band,
B.Band_Grouping AS Band_Grouping,
S.Job AS Job,
B.Job_Group AS Job_Group,
B.Never_Billed_Ageing AS Never_Billed_Ageing,
B.Employee_Group AS Employee_Group,
B.WFO_WT AS WFO_WT,
B.WFH_WT AS WFH_WT,
S.PhyACT_Days_Worked AS PhyACT_Days_Worked,
S.PhyACT_Total_Working_Days AS PhyACT_Total_Working_Days,
B.WFO_Category AS WFO_Category,
B.HR_L2 AS HR_L2
FROM
[DB_Nippon_ARC].[dbo].[ARC_Skill] AS B
LEFT JOIN
[DB_Nippon_ARC].[dbo].[Skill_Analysis] AS S ON B.Calendar_day = S.Calendar_day AND B.Personal_ID = S.Personal_ID;
GO
---------
And for verification purpose i have taken the total view data and found few Personal ID have been duplicated multiple times.
For example:
-Row Labels In View count Actual count 51512295 121 11 52220994 100 10 51789131 100 10 40181872 100 10 52221344 81 9 52160816 81 9 52160892 81 9 52220993 64 8 52212900 64 8 52212169 49 7 52234502 36 6 52212233 36 6 52212904 36 6 52221374 36 6 52141407 36 6 52037147 36 6 51985793 36 6 51938909 36 6 51627184 36 6 40201464 36 6 40101433 36 6 51511318 36 6 40130571 36 6 52227723 25 5 52227721 25 5 52237307 25 5
---------------------------------
In the above if you look at the top right is showing actual no of times the personal ID is repeated, but while joining into view, in view the repetition has been multiplied. for example 1st Personal ID
51512295 is actually present for 11 times but while joining in view it is coming as 11*11= 121 times.
How to corrcet this to view only for 42678 records !
Regards,
Sagar- SivertSolemIron Contributor
SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31'; --42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[ARC_SKILL] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */ Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columnsRow Label View Count Actual RL1 x^2 x RL2 y^2 y
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.