Forum Discussion
how to unify 2 tables that have the same origin?
Hi, Julien.
I'm unfamiliar with DataBricks and any variations there may be with SQL Server, however, here's something that may provide some guidance/hints on how to proceed.
I apologise in advance for not reading your existing attempts - I'm rushing through this one. I simply focused on your example output and explanation around carrying values forward. Correct me where I need correcting!
Firstly, here's the supporting constructs that I've added in addition to your own.
Constructs
DROP TABLE IF EXISTS [rollupDates];
CREATE TABLE [rollupDates]
(
[date] [date] NOT NULL PRIMARY KEY
);
GO
TRUNCATE TABLE [rollupDates];
DECLARE @currentYear AS [int] = YEAR(GETDATE()) - 1;
DECLARE @offset AS [int] = 1;
SET NOCOUNT ON;
WHILE (@offset < 13)
BEGIN
INSERT INTO [rollupDates] VALUES
-- Get [next month] - [1 day] to account for leap years.
(DATEADD(dd, -1, DATEADD(MM, @offset, DATEFROMPARTS(@currentYear, 1, 1))));
SET @offset = @offset + 1;
END;
SET NOCOUNT OFF;
Query
SELECT
[rollupMetrics].[date]
, [rollupMetrics].[silo]
, [rollupMetrics].[gar]
, [rollupMetrics].[lob]
, CASE
WHEN [c].[recours] IS NOT NULL THEN [c].[recours]
ELSE
(
SELECT
TOP 1
[recours]
FROM
[ColumnsMixedFluxStock]
WHERE
[ViewDate] < [rollupMetrics].[date]
AND [silo] = [rollupMetrics].[silo]
AND [gar] = [rollupMetrics].[gar]
AND [lob] = [rollupMetrics].[lob]
ORDER BY
[ViewDate] DESC
)
END AS [recours]
, CASE
WHEN [c].[psap] IS NOT NULL THEN [c].[psap]
ELSE
(
SELECT
TOP 1
[psap]
FROM
[ColumnsMixedFluxStock]
WHERE
[ViewDate] < [rollupMetrics].[date]
AND [silo] = [rollupMetrics].[silo]
AND [gar] = [rollupMetrics].[gar]
AND [lob] = [rollupMetrics].[lob]
ORDER BY
[ViewDate] DESC
)
END AS [psap]
FROM
(
SELECT
*
FROM
[rollupDates] CROSS JOIN
(SELECT DISTINCT [silo], [gar], [lob] FROM [ColumnsMixedFluxStock]) AS [data]
) AS [rollupMetrics] LEFT OUTER JOIN [ColumnsMixedFluxStock] AS [c] ON
[rollupMetrics].[date] = [c].[ViewDate]
AND [rollupMetrics].[silo] = [c].[silo]
AND [rollupMetrics].[gar] = [c].[gar]
AND [rollupMetrics].[lob] = [c].[lob]
ORDER BY
[rollupMetrics].[date]
, [rollupMetrics].[silo]
, [rollupMetrics].[gar]
, [rollupMetrics].[lob];
Output
This is based on a calendar year (2023), meaning there's a bit more than your example output.
Explanation
You're looking to plug gaps where rows for dates are missing, however, you cannot create any kind of join on a date that doesn't exist in at least one of the tables, so the [rollupDates] table provides you with an enumeration of those dates, cross joined with the unique combination of silo, gar and lob.
I've assumed the reporting dates are the last day of each month.
This complete "reference" allows you to leverage a standard LEFT OUTER JOIN with your [ColumnsMixedFluxStock] table.
I'd say you'll understand what the query is doing without me having to explain, so I'll leave it there
Cheers,
Lain