Building the Lakehouse - Implementing a Data Lake Strategy with Azure Synapse

Copper Contributor

Jan 02, 2024

ArshadAliTMMBA - very nice article.

Could you or anyone please shed some light on the questions I've been having as below? Thanks.

When we create Delta tables in Enriched zone, does it point to the data already stored in the Raw zone? Or will it create a new separate (which is redundant probably) physical copy of data?
Let suppose, my Raw zone now contains 950 TB of total data - only 150 TB of which resides on hot-tier (for business purpose), and the rest 800 TB resides in archive-tier (for compliance/audit purpose) of the Raw zone (in the Data Lake). A new batch of data comes in (let say 1 GB in size). It represents new data mostly, but also has some updates to the existing data. How should I accommodate this batch? I can easily dump new data in the Raw zone (in appropriate partitioned folder structure), but should I update Raw zone to update the existing ones from the new batch? But is not the Raw zone read-only, append-only? I think I'm not getting 100% the under the hood part (like how and where data stored physically within these zones, etc.) of all this. Please help!

Blog Post