Forum Discussion
Christian_Hoffmann
Mar 01, 2021Copper Contributor
Do I really need a data archive in a lake?
Hi I am building a new data warehouse using Azure Cloud. We have a limited number of MySQL relational databases as source and the important tables all has history. In other words I can sum th...
- Mar 12, 2021Yes, you can skip it for now. There are reasons to use the "middle step", for example once you start getting data from multiple sources, maybe some of them require more mangling and there are extra steps in combining everything, you might want to have that middle layer as sort of staging/conversion area. Data size is also a factor here, it might be faster to just dump everything into a data lake or some such, and do additional filtering / sanitizing / anonymisation and so on, before pushing it to the final database (or whatever the end-target is). It is easy to add steps later on. My advice in general is to start simple, not over-engineer without a good reason.
cpmohanraj
Jul 05, 2021Copper Contributor
In case you are interested in reading another viewpoint, Melissa Coates & James Serra have written about it in the past.
TL;DR - the answer is almost always "it depends"
https://www.jamesserra.com/archive/2018/11/should-i-load-structured-data-into-my-data-lake/