Forum Discussion

zwu6233's avatar
zwu6233
Copper Contributor
Dec 23, 2024

Need tips for redesign existing ETL process rely on SQL DB Snapshot for error recovery

We have a long running (10+ hrs) legacy daily ETL process relying on SQL Server DB snapshots for error recovering.    I like to get some suggestions on how to rewrite the ETL process wihtout depending on DB snapshot.    

 

Existing Logic in SSIS looks like this:

     ETL Step 1  

     ETL Step 2  

     CREATE DB SNAPSHOT of the necessary database(s)

     TRY

        ETL Step 3 

        ETL Step 4 

         …

        Last Step of ETL job

    CATCH

        Capture error details

        RESTORE DB SNAPSHOT(S) (taken above)

    FINALLY

        DROP DB SNAPSHOT(S)

 

Other information/limitiations:

  1. The SQL Server is running on an Azure VM (IaaS).   (single node SQL Server instance) 
  2. The underlying database(s) are very big ( > 10TB each)  and therefore taking DB snapshot (instant) is much quicker than taking DB backup  
  3. There many other SQL databases hosted on that same SQL instance,  therefore we can't do Azure VM-level restore.
  4. Not ready to migrate this SQL system to Azure PaaS SQL products  ( multiple reasons, some are on MS side: such as Azure SQL MI can't support more than 16TB of data,  some are on our side: legacy downstream Apps can't connect to Azure PaaS SQL DB)
  5.  Can't add signficant time to the duration of the ETL job.  (waiting additional 2~3 hours to make full DB backup of multiple 10TB+ databases during daily ETL process is not accepable) 
  6. Job failure happens once every 3 months (or less)  Longer recovery time (extra 2~3 hours) and even manual involvement from DBA team to recover from a job failure is acceptable.   

    

 

No RepliesBe the first to reply

Resources