SOLVED

How to create view joining two table of similar records without duplicating.

Copper Contributor

Hi All,

Hope all are doing great. I have a small query regarding the Join function to create view. I have two table in which table 1 is having 40k records and table 2 is having 40k records. But when i am joining the both to create view, it is showing 47k records. Please note that i am combining table using two common column in both, one is "Calendar_Date" and one is "Personal_ID". In some cases few Personal ID has duplicate records for each month.  So while creating the view the, these duplicate records gets multiplied . Hence the total records exceeds 47k. How to make it same 40k records which i am having in each table.

 

Any help would be highly appreciated. 

 

Regards,

Sagar

4 Replies

@SAGAR1985 

Hi,
It would be helpful if you posted your schemas and/or your actual query, together with some sample data that illustrates the problem.


I expect there's something with your join predicate, but without seeing it I'm unable to give assistance other than referring to general training on JOINs.
Combine multiple tables with JOINs in T-SQL - Training | Microsoft Learn
How to Join on Multiple Columns | LearnSQL.com

Thanks @SivertSolem,
I can share the code and few sample data with it.

The below is my table 1 total no of rows indicated after --.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';--42678

One more table has no of rows given after --
select * from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31';--42678
In both there 42678 no of records/rows. But when i have created View with Left Join , the no of rows is showing 47390.

select * from [dbo].[VW_ARC_SKILL] where Calendar_day = '2024-05-31';--47390

Now coming to my View code. Please find it below.
------
ALTER VIEW [dbo].[VW_ARC_SKILL]
AS
SELECT
B.Calendar_day AS Calendar_day,
B.Employee_Name AS Employee_Name,
B.Personal_ID AS Personal_ID,
B.Sub_Band AS Sub_Band,
B.Band_Grouping AS Band_Grouping,
S.Job AS Job,
B.Job_Group AS Job_Group,
B.Never_Billed_Ageing AS Never_Billed_Ageing,
B.Employee_Group AS Employee_Group,
B.WFO_WT AS WFO_WT,
B.WFH_WT AS WFH_WT,
S.PhyACT_Days_Worked AS PhyACT_Days_Worked,
S.PhyACT_Total_Working_Days AS PhyACT_Total_Working_Days,
B.WFO_Category AS WFO_Category,
B.HR_L2 AS HR_L2


FROM
[DB_Nippon_ARC].[dbo].[ARC_Skill] AS B
LEFT JOIN
[DB_Nippon_ARC].[dbo].[Skill_Analysis] AS S ON B.Calendar_day = S.Calendar_day AND B.Personal_ID = S.Personal_ID;
GO
---------
And for verification purpose i have taken the total view data and found few Personal ID have been duplicated multiple times.
For example:
-

Row Labels In View count Actual count
51512295 121 11
52220994 100 10
51789131 100 10
40181872 100 10
52221344 81 9
52160816 81 9
52160892 81 9
52220993 64 8
52212900 64 8
52212169 49 7
52234502 36 6
52212233 36 6
52212904 36 6
52221374 36 6
52141407 36 6
52037147 36 6
51985793 36 6
51938909 36 6
51627184 36 6
40201464 36 6
40101433 36 6
51511318 36 6
40130571 36 6
52227723 25 5
52227721 25 5
52237307 25 5


---------------------------------
In the above if you look at the top right is showing actual no of times the personal ID is repeated, but while joining into view, in view the repetition has been multiplied. for example 1st Personal ID
51512295 is actually present for 11 times but while joining in view it is coming as 11*11= 121 times.

How to corrcet this to view only for 42678 records !

Regards,
Sagar

best response confirmed by SAGAR1985 (Copper Contributor)
Solution

@SAGAR1985 I think I might see where the issue lies.

Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.

 

 

select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
--42678 rows

 

 


If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:

 

 

Select 
	Personal_ID, /* The personal_ID */
	Calendar_day, /* The calendar day */
	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[ARC_SKILL]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */

Select 
	Personal_ID, /* The personal_ID */
	Calendar_day, /* The calendar day */
	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[Skill_Analysis]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */

 

 


In your counting table at the end, we see the columns

Row LabelView CountActual
RL1x^2x
RL2y^2y


This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].

For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.

----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.

Hi @SivertSolem,
Thanks a lot for your quick support and insight to resolve. It really helped and after adding Project_Code column into that view, the count is really checked and minor variation is coming. But still way better than the previous one. Thanks for all your effort. Really appreciated.

Regards,
Sagar
1 best response

Accepted Solutions
best response confirmed by SAGAR1985 (Copper Contributor)
Solution

@SAGAR1985 I think I might see where the issue lies.

Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.

 

 

select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
--42678 rows

 

 


If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:

 

 

Select 
	Personal_ID, /* The personal_ID */
	Calendar_day, /* The calendar day */
	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[ARC_SKILL]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */

Select 
	Personal_ID, /* The personal_ID */
	Calendar_day, /* The calendar day */
	Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[Skill_Analysis]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */

 

 


In your counting table at the end, we see the columns

Row LabelView CountActual
RL1x^2x
RL2y^2y


This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].

For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.

----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.

View solution in original post