Forum Discussion
SAGAR1985
Jul 24, 2024Copper Contributor
How to create view joining two table of similar records without duplicating.
Hi All, Hope all are doing great. I have a small query regarding the Join function to create view. I have two table in which table 1 is having 40k records and table 2 is having 40k records. But when...
- Jul 24, 2024
SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31'; --42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[ARC_SKILL] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */ Select Personal_ID, /* The personal_ID */ Calendar_day, /* The calendar day */ Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */ from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31' Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */ Order by NumberOfDuplicates Desc /* Largest number on top */ /* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columnsRow Label View Count Actual RL1 x^2 x RL2 y^2 y
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.
SAGAR1985
Jul 24, 2024Copper Contributor
Thanks SivertSolem,
I can share the code and few sample data with it.
The below is my table 1 total no of rows indicated after --.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';--42678
One more table has no of rows given after --
select * from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31';--42678
In both there 42678 no of records/rows. But when i have created View with Left Join , the no of rows is showing 47390.
select * from [dbo].[VW_ARC_SKILL] where Calendar_day = '2024-05-31';--47390
Now coming to my View code. Please find it below.
------
ALTER VIEW [dbo].[VW_ARC_SKILL]
AS
SELECT
B.Calendar_day AS Calendar_day,
B.Employee_Name AS Employee_Name,
B.Personal_ID AS Personal_ID,
B.Sub_Band AS Sub_Band,
B.Band_Grouping AS Band_Grouping,
S.Job AS Job,
B.Job_Group AS Job_Group,
B.Never_Billed_Ageing AS Never_Billed_Ageing,
B.Employee_Group AS Employee_Group,
B.WFO_WT AS WFO_WT,
B.WFH_WT AS WFH_WT,
S.PhyACT_Days_Worked AS PhyACT_Days_Worked,
S.PhyACT_Total_Working_Days AS PhyACT_Total_Working_Days,
B.WFO_Category AS WFO_Category,
B.HR_L2 AS HR_L2
FROM
[DB_Nippon_ARC].[dbo].[ARC_Skill] AS B
LEFT JOIN
[DB_Nippon_ARC].[dbo].[Skill_Analysis] AS S ON B.Calendar_day = S.Calendar_day AND B.Personal_ID = S.Personal_ID;
GO
---------
And for verification purpose i have taken the total view data and found few Personal ID have been duplicated multiple times.
For example:
-
Row Labels | In View count | Actual count | ||
51512295 | 121 | 11 | ||
52220994 | 100 | 10 | ||
51789131 | 100 | 10 | ||
40181872 | 100 | 10 | ||
52221344 | 81 | 9 | ||
52160816 | 81 | 9 | ||
52160892 | 81 | 9 | ||
52220993 | 64 | 8 | ||
52212900 | 64 | 8 | ||
52212169 | 49 | 7 | ||
52234502 | 36 | 6 | ||
52212233 | 36 | 6 | ||
52212904 | 36 | 6 | ||
52221374 | 36 | 6 | ||
52141407 | 36 | 6 | ||
52037147 | 36 | 6 | ||
51985793 | 36 | 6 | ||
51938909 | 36 | 6 | ||
51627184 | 36 | 6 | ||
40201464 | 36 | 6 | ||
40101433 | 36 | 6 | ||
51511318 | 36 | 6 | ||
40130571 | 36 | 6 | ||
52227723 | 25 | 5 | ||
52227721 | 25 | 5 | ||
52237307 | 25 | 5 |
---------------------------------
In the above if you look at the top right is showing actual no of times the personal ID is repeated, but while joining into view, in view the repetition has been multiplied. for example 1st Personal ID
51512295 is actually present for 11 times but while joining in view it is coming as 11*11= 121 times.
How to corrcet this to view only for 42678 records !
Regards,
Sagar
SivertSolem
Jul 24, 2024Iron Contributor
SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
--42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[ARC_SKILL]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[Skill_Analysis]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columns
Row Label | View Count | Actual |
RL1 | x^2 | x |
RL2 | y^2 | y |
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.
- SAGAR1985Jul 24, 2024Copper ContributorHi SivertSolem,
Thanks a lot for your quick support and insight to resolve. It really helped and after adding Project_Code column into that view, the count is really checked and minor variation is coming. But still way better than the previous one. Thanks for all your effort. Really appreciated.
Regards,
Sagar