Forum Discussion
SQL Query Incremental data load
- Jul 22, 2024
If you're not maintaining the reference timestamp outside of the source table, and instead, you're going to use the target table to determine a reference date, then you would leverage a subquery to pull the maximum value contained in either createdDate or updatedDate in that target table.
This would look something like this:
SELECT [s].* FROM [sourceTable] AS [s] WHERE COALESCE([s].[updatedDate], [s].[createdDate]) > ( SELECT CASE WHEN MAX([t].[updatedDate]) >= MAX([t].[createdDate]) THEN MAX([t].[updatedDate]) ELSE MAX([t].[createdDate]) END FROM [targetTable] AS [t] );Nothing changes prior to the original COALESCE, you're just changing the WHERE statement after the relational operator.
Cheers,
Lain
'some DateTime value' can be MaxDateUpdated or MaxDateCreated from target table. Or both hence I was taking the UNION approach so that both the scenarios are covered. I do not have one particular field which captures the change, I have to figure it out based on the two dates.
The business says, use the PK and DateUpdated if not null, or use the PK and CreatedDate if DateUpdated is NULL.
PK I can use when inserting the incremental load into target table, but for getting the delta from source into staging table I have build some logic that I do not miss out any incremental data.
The job is to be run on daily basis to bring in the incremental load
If you're not maintaining the reference timestamp outside of the source table, and instead, you're going to use the target table to determine a reference date, then you would leverage a subquery to pull the maximum value contained in either createdDate or updatedDate in that target table.
This would look something like this:
SELECT
[s].*
FROM
[sourceTable] AS [s]
WHERE
COALESCE([s].[updatedDate], [s].[createdDate]) > (
SELECT
CASE
WHEN MAX([t].[updatedDate]) >= MAX([t].[createdDate]) THEN MAX([t].[updatedDate])
ELSE MAX([t].[createdDate])
END
FROM
[targetTable] AS [t]
);
Nothing changes prior to the original COALESCE, you're just changing the WHERE statement after the relational operator.
Cheers,
Lain
- Neils2401Jul 22, 2024Copper ContributorThis looks interesting, I have to incorporate this into SSIS as my source is Oracle and target (stg & target tables) are in SQL Server.
This is one of the reason I was first getting the max dates (updated and created) from SQL Server target tables storing it in variables and then doing UNION with separate queries by comparing this max values with source dates. In doing so I am I missing anything or that will also work, to keep it simple.
Thank again Lain for your time on this issue.