Incremental updates with overlap (delete last N days of records and load "NOW- M days" from source)

%3CLINGO-SUB%20id%3D%22lingo-sub-2195290%22%20slang%3D%22en-US%22%3EIncremental%20updates%20with%20overlap%20(delete%20last%20N%20days%20of%20records%20and%20load%20%22NOW-%20M%20days%22%20from%20source)%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2195290%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%3CP%3EI%20am%20looking%20into%20the%20possibilities%20of%20replacing%20our%20current%20legacy%20ETL%20setup%20with%20Azure%20dedicated%20pool%20and%20Data%20Factory.%3C%2FP%3E%3CP%3EThe%20source%20will%20be%20PostgreSQL%2C%20the%20destination%20should%20be%20Azure%20dedicated%20pool.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EOne%20thing%20I%20need%20help%20with%20is%20whether%20it%20is%20possible%20and%20how%20to%20perform%20the%20following%3A%3C%2FP%3E%3CP%3Ean%20%22upsert%22%20operation%2C%20or%20in%20more%20simple%20terms%2C%20identify%20the%20last%20day%20of%20data%20in%20the%20destination%20(denote%20here%20as%20LastDay)%20remove%20last%20N%20days%20of%20records%20(the%20last%20few%20days%20could%20be%20stale)%20from%20the%20ETL%20destination%20(Azure%20pool)%2C%20then%20query%20the%20source%20for%20the%20new%20data%20(i.e.%20newer%20than%20%22LastDay%20-%20N%20days%22)%20and%20update%20the%20records%20in%20the%20destination%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E(I%20have%20found%20this%20scenario%3A%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fdata-factory%2Ftutorial-incremental-copy-multiple-tables-powershell%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fdata-factory%2Ftutorial-incremental-copy-multiple-tables-powershell%3C%2FA%3E%26nbsp%3B%20regarding%20incremental%20update%20-%20however%2C%20I%20need%20that%20overlap%20(removal%20of%20last%20N%20days%20of%20possibly%20stale%20records)%3C%2FP%3E%3CP%3EThank%20you%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2195290%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Data%20Factory%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAzure%20ETL%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ECopy%20Activity%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
New Contributor

Hi,

I am looking into the possibilities of replacing our current legacy ETL setup with Azure dedicated pool and Data Factory.

The source will be PostgreSQL, the destination should be Azure dedicated pool.

 

One thing I need help with is whether it is possible and how to perform the following:

an "upsert" operation, or in more simple terms, identify the last day of data in the destination (denote here as LastDay) remove last N days of records (the last few days could be stale) from the ETL destination (Azure pool), then query the source for the new data (i.e. newer than "LastDay - N days") and update the records in the destination

 

(I have found this scenario: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-multiple-tables-powers...  regarding incremental update - however, I need that overlap (removal of last N days of possibly stale records)

Thank you

1 Reply

@__Vladimir__ 

Hello,

to remove n days from destination before inserting records in field "pre-copy script" meter code similar to this:

declare @date date
declare @lastNdays int
set @lastNdays = -3
set @date = dateadd(d,@lastNdays, (select top 1 [LastDay] from [dbo].[table] order by [LastDay] desc))
delete from [dbo].[table] where [LastDay] >=@date

iryna_0-1615408260770.png