ArshadAliTMMBA - very nice article.
Could you or anyone please shed some light on the questions I've been having as below? Thanks.
- When we create Delta tables in Enriched zone, does it point to the data already stored in the Raw zone? Or will it create a new separate (which is redundant probably) physical copy of data?
- Let suppose, my Raw zone now contains 950 TB of total data - only 150 TB of which resides on hot-tier (for business purpose), and the rest 800 TB resides in archive-tier (for compliance/audit purpose) of the Raw zone (in the Data Lake). A new batch of data comes in (let say 1 GB in size). It represents new data mostly, but also has some updates to the existing data. How should I accommodate this batch? I can easily dump new data in the Raw zone (in appropriate partitioned folder structure), but should I update Raw zone to update the existing ones from the new batch? But is not the Raw zone read-only, append-only? I think I'm not getting 100% the under the hood part (like how and where data stored physically within these zones, etc.) of all this. Please help!