Mar 01 2021 02:39 AM
Hi
I am building a new data warehouse using Azure Cloud.
We have a limited number of MySQL relational databases as source and the important tables all has history. In other words I can sum things up over time by using simple SQL.
Now I got the advice that I should use a Data Lake with .CSV files as a data Archive layer in my Data Warehouse.
But why?
I can not see why I should spend time moving the data from MySQL to .CSV and finally into a Azure SQL Server database when I can the the data directly from MySQL to a staging layer in my Azure SQL Server database using Azure Data Factory.
In the longer we might have a neeed for storing massive data from our NoSQL database and sources without history. In this case the Data Lake is a good fast and cheap place to store these data.
What do you think? Do you agree that I can skip the Data Lake for now?
Regards,
Christian
Mar 12 2021 12:38 AM
SolutionJul 05 2021 07:53 AM - edited Jul 05 2021 07:54 AM
In case you are interested in reading another viewpoint, Melissa Coates & James Serra have written about it in the past.
TL;DR - the answer is almost always "it depends"
https://www.jamesserra.com/archive/2018/11/should-i-load-structured-data-into-my-data-lake/
Mar 12 2021 12:38 AM
Solution