Metadata Driven Pipelines for Dynamic Full and Incremental Processing in Azure SQL

Microsoft

Sep 25, 2024

Khaled_Ghardallou Yes, in this scenario you can copy directly from source to SQL. The reasons for the pattern I presented are:

General usability to multiple scenarios that may not support source to SQL directly.
1. Such as Snowflake to SQL. Many others but that is a common one. You would need to stage this data in this scenario anyway, and best to control the process with proper naming/standardization.
Staging the data within ADLS is a general best practice instead of loading directly from source to SQL, in most scenarios.
1. Especially when loading from on-prem, the latency and performance is far better when loading the data into ADLS then loading into SQL vs loading directly from on-prem into Azure SQL.
2. Staging the data in ADLS allows for reusability across different compute engines without hitting the database. A common example is providing raw files to a Data Science team or other teams to work on the same raw data. This would reduce the number of copies in the organization, storage costs, and data movement.
3. You may not want to load all of your data to Azure SQL. A common scenario is to load all raw source data into ADLS, but you may not be modeling/reporting on everything yet. So, you can still leverage that data with external tables, but not bring it into your database until you really need it. This can cut down on cost significantly depending on how much is not ingested vs ingested into the database.

Blog Post