Jul 24 2024 02:46 AM
Hi All,
Hope all are doing great. I have a small query regarding the Join function to create view. I have two table in which table 1 is having 40k records and table 2 is having 40k records. But when i am joining the both to create view, it is showing 47k records. Please note that i am combining table using two common column in both, one is "Calendar_Date" and one is "Personal_ID". In some cases few Personal ID has duplicate records for each month. So while creating the view the, these duplicate records gets multiplied . Hence the total records exceeds 47k. How to make it same 40k records which i am having in each table.
Any help would be highly appreciated.
Regards,
Sagar
Jul 24 2024 03:07 AM
Hi,
It would be helpful if you posted your schemas and/or your actual query, together with some sample data that illustrates the problem.
I expect there's something with your join predicate, but without seeing it I'm unable to give assistance other than referring to general training on JOINs.
Combine multiple tables with JOINs in T-SQL - Training | Microsoft Learn
How to Join on Multiple Columns | LearnSQL.com
Jul 24 2024 04:46 AM - edited Jul 24 2024 04:49 AM
Thanks @SivertSolem,
I can share the code and few sample data with it.
The below is my table 1 total no of rows indicated after --.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';--42678
One more table has no of rows given after --
select * from [dbo].[Skill_Analysis] where Calendar_day = '2024-05-31';--42678
In both there 42678 no of records/rows. But when i have created View with Left Join , the no of rows is showing 47390.
select * from [dbo].[VW_ARC_SKILL] where Calendar_day = '2024-05-31';--47390
Now coming to my View code. Please find it below.
------
ALTER VIEW [dbo].[VW_ARC_SKILL]
AS
SELECT
B.Calendar_day AS Calendar_day,
B.Employee_Name AS Employee_Name,
B.Personal_ID AS Personal_ID,
B.Sub_Band AS Sub_Band,
B.Band_Grouping AS Band_Grouping,
S.Job AS Job,
B.Job_Group AS Job_Group,
B.Never_Billed_Ageing AS Never_Billed_Ageing,
B.Employee_Group AS Employee_Group,
B.WFO_WT AS WFO_WT,
B.WFH_WT AS WFH_WT,
S.PhyACT_Days_Worked AS PhyACT_Days_Worked,
S.PhyACT_Total_Working_Days AS PhyACT_Total_Working_Days,
B.WFO_Category AS WFO_Category,
B.HR_L2 AS HR_L2
FROM
[DB_Nippon_ARC].[dbo].[ARC_Skill] AS B
LEFT JOIN
[DB_Nippon_ARC].[dbo].[Skill_Analysis] AS S ON B.Calendar_day = S.Calendar_day AND B.Personal_ID = S.Personal_ID;
GO
---------
And for verification purpose i have taken the total view data and found few Personal ID have been duplicated multiple times.
For example:
-
Row Labels | In View count | Actual count | ||
51512295 | 121 | 11 | ||
52220994 | 100 | 10 | ||
51789131 | 100 | 10 | ||
40181872 | 100 | 10 | ||
52221344 | 81 | 9 | ||
52160816 | 81 | 9 | ||
52160892 | 81 | 9 | ||
52220993 | 64 | 8 | ||
52212900 | 64 | 8 | ||
52212169 | 49 | 7 | ||
52234502 | 36 | 6 | ||
52212233 | 36 | 6 | ||
52212904 | 36 | 6 | ||
52221374 | 36 | 6 | ||
52141407 | 36 | 6 | ||
52037147 | 36 | 6 | ||
51985793 | 36 | 6 | ||
51938909 | 36 | 6 | ||
51627184 | 36 | 6 | ||
40201464 | 36 | 6 | ||
40101433 | 36 | 6 | ||
51511318 | 36 | 6 | ||
40130571 | 36 | 6 | ||
52227723 | 25 | 5 | ||
52227721 | 25 | 5 | ||
52237307 | 25 | 5 |
---------------------------------
In the above if you look at the top right is showing actual no of times the personal ID is repeated, but while joining into view, in view the repetition has been multiplied. for example 1st Personal ID
51512295 is actually present for 11 times but while joining in view it is coming as 11*11= 121 times.
How to corrcet this to view only for 42678 records !
Regards,
Sagar
Jul 24 2024 05:47 AM - edited Jul 24 2024 05:53 AM
Solution@SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
--42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[ARC_SKILL]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[Skill_Analysis]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columns
Row Label | View Count | Actual |
RL1 | x^2 | x |
RL2 | y^2 | y |
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.
Jul 24 2024 06:16 AM
Jul 24 2024 05:47 AM - edited Jul 24 2024 05:53 AM
Solution@SAGAR1985 I think I might see where the issue lies.
Your query for the [dbo].[ARC_Skill] table is selecting distinct Personal_IDs, meaning you are counting unique IDs. If there's duplicates, you're not seeing them.
select distinct Personal_ID from [dbo].[ARC_Skill] where Calendar_day = '2024-05-31';
--42678 rows
If you do the following queries, I believe you'll find that you have the same amount of duplicates in both your tables:
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[ARC_SKILL]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
Select
Personal_ID, /* The personal_ID */
Calendar_day, /* The calendar day */
Count(Personal_ID) as NumberOfDuplicates /* Number of rows with this given Personal_ID and Calendar_day */
from [dbo].[Skill_Analysis]
where Calendar_day = '2024-05-31'
Group By Personal_ID, Calendar_day /* Grouping enables counting rows pr. Personal_ID */
Order by NumberOfDuplicates Desc /* Largest number on top */
/* You may have to use 'order by COUNT(Personal_ID) desc' instead */
In your counting table at the end, we see the columns
Row Label | View Count | Actual |
RL1 | x^2 | x |
RL2 | y^2 | y |
This means you've accidentally gotten a cartesian cross product where your tables contain the same amount of duplicates for the combination of [Personal_ID] and [Calendar_day].
For RL '51512295', that means all 11 rows with a given [Personal_ID] and [Calendar_day] in [ARC_Skill] has been matched separately with all 11 rows with the same [Personal_ID] and [Calendar_day] in [Skill_Analysis]. 11*11 = 121.
----
In conclusion, [Personal_ID] and [Calendar_day] are not sufficient to uniquely match between these tables.
I hope there's another column you can add in the join predicate in order to get the expected result.