Forum Discussion
jmclej3
Feb 14, 2024Copper Contributor
how to unify 2 tables that have the same origin?
I have a table with values that have a flow vision (recours) and others with a stock vision (PSAP). I split this table in two in order to transform my flow vision fields into stock vision, it works well but I get stuck when I try to reunify my two tables in one.
I have tried several approaches that you can see in the code below.
Either I generate a new CTE columnsStockFinal in order for it to have the same number of lines that columnsFluxToStock and that I can do a inner join between columnsFluxToStock and columnsStockFinal.
Or I keep columnsStock and I do a full outer join with columnsFluxToStock.
For each missing date, a line must appear with the value of PSAP that represents the last known stock at date for the tuple/key : Silo / GAR / Lob.
That leads to :
For 2023-03-31 / Completude / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28, that is 2100.
For 2023-03-31 / EDI / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28, that is 1100.
For 2023-04-30 / EDI / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28 or for 2023-03-31, that is 1100.
Of course, the values must not be hard-coded.
I must then get the following result :
ViewDate Silo GAR Lob recours PSAP
2023-01-31 Completude GAR1 lob1 100 2000
2023-01-31 EDI GAR1 lob1 10 1000
2023-02-28 Completude GAR1 lob1 150 2100
2023-02-28 EDI GAR1 lob1 15 1100
2023-03-31 Completude GAR1 lob1 150 2100
2023-03-31 EDI GAR1 lob1 15 1100
2023-04-30 Completude GAR1 lob1 210 2200
2023-04-30 EDI GAR1 lob1 15 1100
2023-05-31 Completude GAR1 lob1 280 2300
2023-05-31 EDI GAR1 lob1 21 1200
This is the code I made so far with some other tries in comments :
drop table if exists ColumnsMixedFluxStock;
CREATE TABLE ColumnsMixedFluxStock (
ViewDate DATE,
silo varchar(10),
GAR varchar(10),
lob varchar(10),
recours int,
PSAP int
);
INSERT INTO ColumnsMixedFluxStock VALUES
('2023-01-31', 'EDI', 'GAR1', 'lob1', 10, 1000),
('2023-02-28', 'EDI', 'GAR1', 'lob1', 5, 1100),
('2023-05-31', 'EDI', 'GAR1', 'lob1', 6, 1200),
('2023-01-31', 'Completude', 'GAR1', 'lob1', 100, 2000),
('2023-02-28', 'Completude', 'GAR1', 'lob1', 50, 2100)
,('2023-04-30', 'Completude', 'GAR1', 'lob1', 60, 2200),
('2023-05-31', 'Completude', 'GAR1', 'lob1', 70, 2300)
;
with ColumnsFlux as (
select Silo, ViewDate, GAR, Lob, recours
from ColumnsMixedFluxStock
),
ColumnsStock as (
select Silo, ViewDate, GAR, Lob, PSAP
from ColumnsMixedFluxStock
),
min_max_dates AS (
SELECT MIN(ViewDate) as min_date, MAX(ViewDate) as max_date
FROM ColumnsMixedFluxStock
),
date_range AS (
SELECT explode(sequence(to_date(min_date), to_date(max_date), interval 1 month)) as ViewDate
FROM min_max_dates
),
columnsFluxToStock AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
SUM(coalesce(i2.recours, 0)) as recours
FROM ColumnsFlux i2
cross join date_range i1 on
i1.ViewDate >= i2.ViewDate
GROUP BY i1.ViewDate, i2.silo, i2.GAR, i2.Lob
)
/*columnsStockFinal AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
coalesce(last_value(i2.PSAP) OVER (PARTITION BY /*i2.ViewDate,*/ i2.silo, i2.GAR, i2.Lob ORDER BY i2.ViewDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as PSAP
FROM date_range i1
LEFT JOIN ColumnsStock i2 on
i1.ViewDate >= i2.ViewDate
) select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob*/
/*all_combinations AS (
SELECT dr.ViewDate, cs.silo, cs.GAR, cs.Lob
FROM date_range dr
CROSS JOIN (SELECT DISTINCT silo, GAR, Lob FROM ColumnsStock) cs
),
columnsStockFinal AS (
SELECT ac.ViewDate, ac.silo, ac.GAR, ac.Lob,
COALESCE(LAST_VALUE(cs.PSAP) OVER (PARTITION BY cs.silo, cs.GAR, cs.Lob ORDER BY ac.ViewDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) AS PSAP
FROM all_combinations ac
LEFT JOIN ColumnsStock cs ON ac.silo = cs.silo AND ac.GAR = cs.GAR AND ac.Lob = cs.Lob AND ac.ViewDate = cs.ViewDate
) select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob*/
/*
columnsStockFinal AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
coalesce(i2.PSAP, 0) as PSAP
FROM ColumnsStock i2
cross join date_range i1 on
i1.ViewDate = i2.ViewDate
GROUP BY i1.ViewDate, i2.silo, i2.GAR, i2.Lob
) --select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob
*/
/*select cs.Silo, cs.ViewDate, cs.GAR, cs.Lob,
cfts.recours,
cs.PSAP
from columnsStockFinal cs --columnsStock cs
inner join
--full outer join
--inner join après avoir fait une autre CTE columnsStockFinal avec un cross join de ColumnsStock et date_range
columnsFluxToStock cfts on
cs.Silo = cfts.Silo
AND cs.ViewDate = cfts.ViewDate
AND cs.GAR = cfts.GAR
AND cs.Lob = cfts.Lob
--where recours != 0 or PSAP != 0
order by ViewDate, Silo, GAR, Lob*/
select coalesce(cs.Silo, cfts.Silo) as Silo, coalesce(cs.ViewDate, cfts.ViewDate) ViewDate, coalesce(cs.GAR, cfts.GAR) as GAR, coalesce(cs.Lob, cfts.Lob) as Lob,
cfts.recours,
cs.PSAP
from columnsStock cs --columnsStock cs
full outer join
columnsFluxToStock cfts on
cs.Silo = cfts.Silo
AND cs.ViewDate = cfts.ViewDate
AND cs.GAR = cfts.GAR
AND cs.Lob = cfts.Lob
--where recours != 0 or PSAP != 0
order by ViewDate, Silo, GAR, Lob
Thanks for your help.
I am on Databricks using spark sql but it is pretty regular sql used here so it should be fine
1 Reply
Sort By
- LainRobertsonSilver Contributor
Hi, Julien.
I'm unfamiliar with DataBricks and any variations there may be with SQL Server, however, here's something that may provide some guidance/hints on how to proceed.
I apologise in advance for not reading your existing attempts - I'm rushing through this one. I simply focused on your example output and explanation around carrying values forward. Correct me where I need correcting!
Firstly, here's the supporting constructs that I've added in addition to your own.
Constructs
DROP TABLE IF EXISTS [rollupDates]; CREATE TABLE [rollupDates] ( [date] [date] NOT NULL PRIMARY KEY ); GO TRUNCATE TABLE [rollupDates]; DECLARE @currentYear AS [int] = YEAR(GETDATE()) - 1; DECLARE @offset AS [int] = 1; SET NOCOUNT ON; WHILE (@offset < 13) BEGIN INSERT INTO [rollupDates] VALUES -- Get [next month] - [1 day] to account for leap years. (DATEADD(dd, -1, DATEADD(MM, @offset, DATEFROMPARTS(@currentYear, 1, 1)))); SET @offset = @offset + 1; END; SET NOCOUNT OFF;
Query
SELECT [rollupMetrics].[date] , [rollupMetrics].[silo] , [rollupMetrics].[gar] , [rollupMetrics].[lob] , CASE WHEN [c].[recours] IS NOT NULL THEN [c].[recours] ELSE ( SELECT TOP 1 [recours] FROM [ColumnsMixedFluxStock] WHERE [ViewDate] < [rollupMetrics].[date] AND [silo] = [rollupMetrics].[silo] AND [gar] = [rollupMetrics].[gar] AND [lob] = [rollupMetrics].[lob] ORDER BY [ViewDate] DESC ) END AS [recours] , CASE WHEN [c].[psap] IS NOT NULL THEN [c].[psap] ELSE ( SELECT TOP 1 [psap] FROM [ColumnsMixedFluxStock] WHERE [ViewDate] < [rollupMetrics].[date] AND [silo] = [rollupMetrics].[silo] AND [gar] = [rollupMetrics].[gar] AND [lob] = [rollupMetrics].[lob] ORDER BY [ViewDate] DESC ) END AS [psap] FROM ( SELECT * FROM [rollupDates] CROSS JOIN (SELECT DISTINCT [silo], [gar], [lob] FROM [ColumnsMixedFluxStock]) AS [data] ) AS [rollupMetrics] LEFT OUTER JOIN [ColumnsMixedFluxStock] AS [c] ON [rollupMetrics].[date] = [c].[ViewDate] AND [rollupMetrics].[silo] = [c].[silo] AND [rollupMetrics].[gar] = [c].[gar] AND [rollupMetrics].[lob] = [c].[lob] ORDER BY [rollupMetrics].[date] , [rollupMetrics].[silo] , [rollupMetrics].[gar] , [rollupMetrics].[lob];
Output
This is based on a calendar year (2023), meaning there's a bit more than your example output.
Explanation
You're looking to plug gaps where rows for dates are missing, however, you cannot create any kind of join on a date that doesn't exist in at least one of the tables, so the [rollupDates] table provides you with an enumeration of those dates, cross joined with the unique combination of silo, gar and lob.
I've assumed the reporting dates are the last day of each month.
This complete "reference" allows you to leverage a standard LEFT OUTER JOIN with your [ColumnsMixedFluxStock] table.
I'd say you'll understand what the query is doing without me having to explain, so I'll leave it there
Cheers,
Lain