Introduction:
Delta files are a way to efficiently store data in data lake by storing only the changes made to the data.
Its built-on parquet files, it enables you to update, insert and delete your data easily in delta lake.
the updateability is the main advantage of delta format.
When we create delta files in data lake, you will notice that we will get _delta_log folder which indicates which files are valid and updated, this enables versioning which is a powerful feature of delta files also we will get parquet files as mentioned above, because delta files are built on parquet like so:
Main Idea:
The main idea of this blog is to create a simple ETL using delta files.
1. Convert raw data (csv format) into delta format using Azure Synapse Pipelines.
2. Query delta files using SQL serverless pool.
3. Build Dashboard using PowerBI.
Pre-requisites:
Step 1:
Unfortunately, we can't use the Copy activity to transform .csv file to delta file format it is only supported in Dataflows at the moment.
Convert .csv file into Delta file using Azure Synapse:
Sink Settings:
Now in delta lake, you should see delta files as mentioned above.
Step 2:
Query delta files using SQL serverless pool, in order to do it, you need to follow these steps:
Result:
PS: you can also load the data into DF and query it in Spark Pool.
Step 3:
Build Dashboard using Power BI Integration in Azure Synapse, for now, we have in Synapse linked ADLS which includes metadata (delta files), and we would like to build a dashboard.
Links:
Delta format in Azure Data Factory - Azure Data Factory | Microsoft Learn
Tutorial: Connect serverless SQL pool to Power BI Desktop & create report - Azure Synapse Analytics ...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.