Forum Discussion
SQL Query Incremental data load
I have source and staging tables & target table, I want to bring in incremental data into staging from source.
I have 2 date columnCreatedDate and UpdatedDate to work with to bring in the incremental data in stage
Table structure (ID_Pk, CreatedDate, UpdatedDate)
CreatedDate and UpdatedDate are date with time stamp and ID is PK
e.g. Created Date/updated date format ‘2024-05-09 16:13.03.5722250’
I have to write SQL to get only incremental data from source table, by using UpdateDate if not null, in case if updated date is null then use CreateDate to pull the incremental data.
I got the vmaxCreatedDate and vmaxUpdatedDate from targate table put it into varaibles and wrote below query, question is this this sql correct for incremental data load. I am inform to pick incremental data using UpdateDate in case it is NULL then use CreatedDate to bring in only incremental data
Select * from SourceTbl where updateDate > vmaxUpdatedDate and updateDate is not null UNION Select * from SourceTbl where createdDate > vmaxCreatedDate
If you're not maintaining the reference timestamp outside of the source table, and instead, you're going to use the target table to determine a reference date, then you would leverage a subquery to pull the maximum value contained in either createdDate or updatedDate in that target table.
This would look something like this:
SELECT [s].* FROM [sourceTable] AS [s] WHERE COALESCE([s].[updatedDate], [s].[createdDate]) > ( SELECT CASE WHEN MAX([t].[updatedDate]) >= MAX([t].[createdDate]) THEN MAX([t].[updatedDate]) ELSE MAX([t].[createdDate]) END FROM [targetTable] AS [t] );
Nothing changes prior to the original COALESCE, you're just changing the WHERE statement after the relational operator.
Cheers,
Lain
- LainRobertsonSilver Contributor
Hi, Asif.
You don't need a UNION - it's much easier to do what you're asking than that.
SELECT [a].* FROM myTable AS [a] WHERE COALESCE([a].[updatedDate], [a].[createdDate]) > 'some DateTime value';
Cheers,
Lain
- Neils2401Copper Contributor
Thanks Lain,
The 'Some dateTime value' you are saying in the query will be vmaxUpdatedDate or vmaxCreatedDate ? Which max date I should pick that I do not loose the incremental data.And below are some more details
The dates are captured at the point of time, when new rec is inserted or updated. I do not have much control over source table & DB, I am only allowed to select.
The dateCreated is datetime2 and UpdatedDate datetime filed, when I query I see it has date value along with timestamp for both the fields.
I see that along with PK the DateCreated is not null and UpdatedDate can be null, UpdatedDate will only have values when the rec is updated later after being inserted. Hence I am requested to pull the incremental data based on UpdatedDate and if this col has NULL then take DateCreated
- LainRobertsonSilver Contributor
The 'some DateTime value' is the value you're keeping track of external to the table itself. You could be saving it entirely external in something like a cookie, an XML file, or even internal to a database - whether that's the source database or another one.
Ultimately, you will have to keep track of the last timestamp you queried against so that your next query uses that value as it's "from" point of reference.
You don't need any form of control over the source database itself whatsoever. Read-only access is fine for the scenario you've described.
The COALESCE statement takes care of arbitrating between updatedDate and creationDate. Have a read of it to learn more about how it works, but the short description in the context if your example is that it will first look at updatedDate and return that value if it is not null. If updatedDate is NULL, then it will return createdDate.
Cheers,
Lain