Forum Discussion

jmclej3's avatar
jmclej3
Copper Contributor
Feb 14, 2024

how to unify 2 tables that have the same origin?

I have a table with values that have a flow vision (recours) and others with a stock vision (PSAP). I split this table in two in order to transform my flow vision fields into stock vision, it works well but I get stuck when I try to reunify my two tables in one.

I have tried several approaches that you can see in the code below.

Either I generate a new CTE columnsStockFinal in order for it to have the same number of lines that columnsFluxToStock and that I can do a inner join between columnsFluxToStock and columnsStockFinal.

Or I keep columnsStock and I do a full outer join with columnsFluxToStock.

For each missing date, a line must appear with the value of PSAP that represents the last known stock at date for the tuple/key : Silo / GAR / Lob.

That leads to :

For 2023-03-31 / Completude / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28, that is 2100.

For 2023-03-31 / EDI / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28, that is 1100.

For 2023-04-30 / EDI / GAR1 / lob1, I should have the last value of PSAP directly anteriorly available, so for 2023-02-28 or for 2023-03-31, that is 1100.

Of course, the values must not be hard-coded.

I must then get the following result :

 

ViewDate    Silo    GAR Lob recours PSAP
2023-01-31  Completude  GAR1    lob1    100 2000
2023-01-31  EDI GAR1    lob1    10  1000
2023-02-28  Completude  GAR1    lob1    150 2100
2023-02-28  EDI GAR1    lob1    15  1100
2023-03-31  Completude  GAR1    lob1    150 2100
2023-03-31  EDI GAR1    lob1    15  1100
2023-04-30  Completude  GAR1    lob1    210 2200
2023-04-30  EDI GAR1    lob1    15  1100
2023-05-31  Completude  GAR1    lob1    280 2300
2023-05-31  EDI GAR1    lob1    21  1200

 

 This is the code I made so far with some other tries in comments :

 

drop table if exists ColumnsMixedFluxStock;
CREATE TABLE ColumnsMixedFluxStock (
    ViewDate DATE,
    silo varchar(10),
    GAR varchar(10),
    lob varchar(10),
    recours int,
    PSAP int
);
 
INSERT INTO ColumnsMixedFluxStock VALUES
('2023-01-31', 'EDI', 'GAR1', 'lob1', 10, 1000),
('2023-02-28', 'EDI', 'GAR1', 'lob1', 5, 1100),
        ('2023-05-31', 'EDI', 'GAR1', 'lob1', 6, 1200),
 
('2023-01-31', 'Completude', 'GAR1', 'lob1', 100, 2000),
('2023-02-28', 'Completude', 'GAR1', 'lob1', 50, 2100)
 
        ,('2023-04-30', 'Completude', 'GAR1', 'lob1', 60, 2200),
         ('2023-05-31', 'Completude', 'GAR1', 'lob1', 70, 2300)
;
 
with ColumnsFlux as (
select Silo, ViewDate, GAR, Lob, recours
from ColumnsMixedFluxStock
),
ColumnsStock as (
select Silo, ViewDate, GAR, Lob, PSAP
from ColumnsMixedFluxStock
),
min_max_dates AS (
  SELECT MIN(ViewDate) as min_date, MAX(ViewDate) as max_date
  FROM ColumnsMixedFluxStock
),
date_range AS (
  SELECT explode(sequence(to_date(min_date), to_date(max_date), interval 1 month)) as ViewDate
  FROM min_max_dates
),
columnsFluxToStock AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
       SUM(coalesce(i2.recours, 0)) as recours
FROM  ColumnsFlux i2
cross join date_range i1 on 
                   i1.ViewDate >= i2.ViewDate
GROUP BY i1.ViewDate, i2.silo, i2.GAR, i2.Lob
)
 
/*columnsStockFinal AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
       coalesce(last_value(i2.PSAP) OVER (PARTITION BY /*i2.ViewDate,*/    i2.silo, i2.GAR, i2.Lob ORDER BY i2.ViewDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as PSAP
FROM date_range i1
LEFT JOIN ColumnsStock i2 on 
                   i1.ViewDate >= i2.ViewDate
) select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob*/
 
/*all_combinations AS (
  SELECT dr.ViewDate, cs.silo, cs.GAR, cs.Lob
  FROM date_range dr
  CROSS JOIN (SELECT DISTINCT silo, GAR, Lob FROM ColumnsStock) cs
),
columnsStockFinal AS (
  SELECT ac.ViewDate, ac.silo, ac.GAR, ac.Lob,
         COALESCE(LAST_VALUE(cs.PSAP) OVER (PARTITION BY cs.silo, cs.GAR, cs.Lob ORDER BY ac.ViewDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) AS PSAP
  FROM all_combinations ac
  LEFT JOIN ColumnsStock cs ON ac.silo = cs.silo AND ac.GAR = cs.GAR AND ac.Lob = cs.Lob AND ac.ViewDate = cs.ViewDate
) select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob*/
 
/*
columnsStockFinal AS (
SELECT i1.ViewDate, i2.silo, i2.GAR, i2.Lob,
       coalesce(i2.PSAP, 0) as PSAP
FROM  ColumnsStock i2
cross join date_range i1 on 
                   i1.ViewDate = i2.ViewDate
GROUP BY i1.ViewDate, i2.silo, i2.GAR, i2.Lob
) --select * from columnsStockFinal order by ViewDate, Silo, GAR, Lob
*/
 
 
 
/*select cs.Silo, cs.ViewDate, cs.GAR, cs.Lob,
  cfts.recours,
  cs.PSAP
from columnsStockFinal cs  --columnsStock cs 
inner join
--full outer join
--inner join après avoir fait une autre CTE columnsStockFinal avec un cross join de ColumnsStock et date_range
columnsFluxToStock cfts on 
                        cs.Silo = cfts.Silo
                    AND cs.ViewDate = cfts.ViewDate
                            AND cs.GAR = cfts.GAR
                            AND cs.Lob = cfts.Lob
--where recours != 0 or PSAP != 0
order by ViewDate, Silo, GAR, Lob*/
 
select coalesce(cs.Silo, cfts.Silo) as Silo, coalesce(cs.ViewDate, cfts.ViewDate) ViewDate, coalesce(cs.GAR, cfts.GAR) as GAR, coalesce(cs.Lob, cfts.Lob) as Lob,
  cfts.recours,
  cs.PSAP
from columnsStock cs  --columnsStock cs
full outer join
columnsFluxToStock cfts on
                        cs.Silo = cfts.Silo
                    AND cs.ViewDate = cfts.ViewDate
                    AND cs.GAR = cfts.GAR
                    AND cs.Lob = cfts.Lob
--where recours != 0 or PSAP != 0
order by ViewDate, Silo, GAR, Lob

 

 Thanks for your help.

I am on Databricks using spark sql but it is pretty regular sql used here so it should be fine

1 Reply

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    jmclej3 

     

    Hi, Julien.

     

    I'm unfamiliar with DataBricks and any variations there may be with SQL Server, however, here's something that may provide some guidance/hints on how to proceed.

     

    I apologise in advance for not reading your existing attempts - I'm rushing through this one. I simply focused on your example output and explanation around carrying values forward. Correct me where I need correcting!

     

    Firstly, here's the supporting constructs that I've added in addition to your own.

     

    Constructs

    DROP TABLE IF EXISTS [rollupDates];
    CREATE TABLE [rollupDates]
    (
    	[date] [date] NOT NULL PRIMARY KEY
    );
    GO
    
    TRUNCATE TABLE [rollupDates];
    
    DECLARE @currentYear AS [int] = YEAR(GETDATE()) - 1;
    DECLARE @offset AS [int] = 1;
    
    SET NOCOUNT ON;
    WHILE (@offset < 13)
    BEGIN
    	INSERT INTO [rollupDates] VALUES
    		-- Get [next month] - [1 day] to account for leap years.
    		(DATEADD(dd, -1, DATEADD(MM, @offset, DATEFROMPARTS(@currentYear, 1, 1))));
    
    	SET @offset = @offset + 1;
    END;
    SET NOCOUNT OFF;

     

    Query

    SELECT
    	[rollupMetrics].[date]
    	, [rollupMetrics].[silo]
    	, [rollupMetrics].[gar]
    	, [rollupMetrics].[lob]
    	, CASE
    		WHEN [c].[recours] IS NOT NULL THEN [c].[recours]
    		ELSE
    			(
    				SELECT
    					TOP 1
    					[recours]
    				FROM
    					[ColumnsMixedFluxStock]
    				WHERE
    					[ViewDate] < [rollupMetrics].[date]
    					AND [silo] = [rollupMetrics].[silo]
    					AND [gar] = [rollupMetrics].[gar]
    					AND [lob] = [rollupMetrics].[lob]
    				ORDER BY
    					[ViewDate] DESC
    			)
    	END AS [recours]
    	, CASE
    		WHEN [c].[psap] IS NOT NULL THEN [c].[psap]
    		ELSE
    			(
    				SELECT
    					TOP 1
    					[psap]
    				FROM
    					[ColumnsMixedFluxStock]
    				WHERE
    					[ViewDate] < [rollupMetrics].[date]
    					AND [silo] = [rollupMetrics].[silo]
    					AND [gar] = [rollupMetrics].[gar]
    					AND [lob] = [rollupMetrics].[lob]
    				ORDER BY
    					[ViewDate] DESC
    			)
    	END AS [psap]
    FROM
    (
    	SELECT
    		*
    	FROM
    		[rollupDates] CROSS JOIN
    		(SELECT DISTINCT [silo], [gar], [lob] FROM [ColumnsMixedFluxStock]) AS [data]
    ) AS [rollupMetrics] LEFT OUTER JOIN [ColumnsMixedFluxStock] AS [c] ON
    	[rollupMetrics].[date] = [c].[ViewDate]
    	AND [rollupMetrics].[silo] = [c].[silo]
    	AND [rollupMetrics].[gar] = [c].[gar]
    	AND [rollupMetrics].[lob] = [c].[lob]
    ORDER BY
    	[rollupMetrics].[date]
    	, [rollupMetrics].[silo]
    	, [rollupMetrics].[gar]
    	, [rollupMetrics].[lob];

     

    Output

    This is based on a calendar year (2023), meaning there's a bit more than your example output.

     

     

    Explanation

    You're looking to plug gaps where rows for dates are missing, however, you cannot create any kind of join on a date that doesn't exist in at least one of the tables, so the [rollupDates] table provides you with an enumeration of those dates, cross joined with the unique combination of silo, gar and lob.

     

    I've assumed the reporting dates are the last day of each month.

     

    This complete "reference" allows you to leverage a standard LEFT OUTER JOIN with your [ColumnsMixedFluxStock] table.

     

    I'd say you'll understand what the query is doing without me having to explain, so I'll leave it there

     

    Cheers,

    Lain

Resources