Forum Discussion
zwu6233
Dec 23, 2024Copper Contributor
Need tips for redesign existing ETL process rely on SQL DB Snapshot for error recovery
We have a long running (10+ hrs) legacy daily ETL process relying on SQL Server DB snapshots for error recovering. I like to get some suggestions on how to rewrite the ETL process wihtout depending on DB snapshot.
Existing Logic in SSIS looks like this:
ETL Step 1
ETL Step 2
CREATE DB SNAPSHOT of the necessary database(s)
TRY
ETL Step 3
ETL Step 4
…
Last Step of ETL job
CATCH
Capture error details
RESTORE DB SNAPSHOT(S) (taken above)
FINALLY
DROP DB SNAPSHOT(S)
Other information/limitiations:
- The SQL Server is running on an Azure VM (IaaS). (single node SQL Server instance)
- The underlying database(s) are very big ( > 10TB each) and therefore taking DB snapshot (instant) is much quicker than taking DB backup
- There many other SQL databases hosted on that same SQL instance, therefore we can't do Azure VM-level restore.
- Not ready to migrate this SQL system to Azure PaaS SQL products ( multiple reasons, some are on MS side: such as Azure SQL MI can't support more than 16TB of data, some are on our side: legacy downstream Apps can't connect to Azure PaaS SQL DB)
- Can't add signficant time to the duration of the ETL job. (waiting additional 2~3 hours to make full DB backup of multiple 10TB+ databases during daily ETL process is not accepable)
- Job failure happens once every 3 months (or less) Longer recovery time (extra 2~3 hours) and even manual involvement from DBA team to recover from a job failure is acceptable.
No RepliesBe the first to reply