Forum Discussion

Neils2401's avatar
Neils2401
Copper Contributor
Jul 22, 2024

SQL Query Incremental data load

I have source and staging tables & target table, I want to bring in incremental data into staging from source.

I have 2 date columnCreatedDate and UpdatedDate to work with to bring in the incremental data in stage

Table structure (ID_Pk, CreatedDate, UpdatedDate)

CreatedDate and UpdatedDate are date with time stamp and ID is PK

e.g. Created Date/updated date format ‘2024-05-09 16:13.03.5722250’

I have to write SQL to get only incremental data from source table, by using UpdateDate if not null, in case if updated date is null then use CreateDate to pull the incremental data.

I got the vmaxCreatedDate and vmaxUpdatedDate from targate table put it into varaibles and wrote below query, question is this this sql correct for incremental data load. I am inform to pick incremental data using UpdateDate in case it is NULL then use CreatedDate to bring in only incremental data

Select * from SourceTbl where updateDate > vmaxUpdatedDate  and updateDate  is not null
UNION
Select  * from SourceTbl where createdDate > vmaxCreatedDate 
  • Neils2401 

     

    If you're not maintaining the reference timestamp outside of the source table, and instead, you're going to use the target table to determine a reference date, then you would leverage a subquery to pull the maximum value contained in either createdDate or updatedDate in that target table.

     

    This would look something like this:

     

    SELECT
    	[s].*
    FROM
    	[sourceTable] AS [s]
    WHERE
    	COALESCE([s].[updatedDate], [s].[createdDate]) > (
    		SELECT
    			CASE
    				WHEN MAX([t].[updatedDate]) >= MAX([t].[createdDate]) THEN MAX([t].[updatedDate])
    				ELSE MAX([t].[createdDate])
    			END
    		FROM
    			[targetTable] AS [t]
    	);

     

    Nothing changes prior to the original COALESCE, you're just changing the WHERE statement after the relational operator.

     

    Cheers,

    Lain

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    Neils2401 

     

    Hi, Asif.

     

    You don't need a UNION - it's much easier to do what you're asking than that.

     

    SELECT
    	[a].*
    FROM
    	myTable AS [a]
    WHERE
    	COALESCE([a].[updatedDate], [a].[createdDate]) > 'some DateTime value';

     

    Cheers,

    Lain

    • Neils2401's avatar
      Neils2401
      Copper Contributor

      Thanks Lain,
      The 'Some dateTime value' you are saying in the query will be vmaxUpdatedDate or vmaxCreatedDate ? Which max date I should pick that I do not loose the incremental data.

       

      And below are some more details

       

      The dates are captured at the point of time, when new rec is inserted or updated. I do not have much control over source table & DB, I am only allowed to select.

      The dateCreated is datetime2 and UpdatedDate datetime filed, when I query I see it has date value along with timestamp for both the fields.

      I see that along with PK the DateCreated is not null and UpdatedDate can be null, UpdatedDate will only have values when the rec is updated later after being inserted. Hence I am requested to pull the incremental data based on UpdatedDate and if this col has NULL then take DateCreated

       

      • LainRobertson's avatar
        LainRobertson
        Silver Contributor

        Neils2401 

         

        The 'some DateTime value' is the value you're keeping track of external to the table itself. You could be saving it entirely external in something like a cookie, an XML file, or even internal to a database - whether that's the source database or another one.

         

        Ultimately, you will have to keep track of the last timestamp you queried against so that your next query uses that value as it's "from" point of reference.

         

        You don't need any form of control over the source database itself whatsoever. Read-only access is fine for the scenario you've described.

         

        The COALESCE statement takes care of arbitrating between updatedDate and creationDate. Have a read of it to learn more about how it works, but the short description in the context if your example is that it will first look at updatedDate and return that value if it is not null. If updatedDate is NULL, then it will return createdDate.

         

         

        Cheers,

        Lain

Resources