Forum Discussion

Christian_Hoffmann's avatar
Christian_Hoffmann
Copper Contributor
Mar 01, 2021
Solved

Do I really need a data archive in a lake?

Hi   I am building a new data warehouse using Azure Cloud.   We have a limited number of MySQL relational databases as source and the important tables all has history. In other words I can sum th...
  • Antti_Kurenniemi's avatar
    Mar 12, 2021
    Yes, you can skip it for now. There are reasons to use the "middle step", for example once you start getting data from multiple sources, maybe some of them require more mangling and there are extra steps in combining everything, you might want to have that middle layer as sort of staging/conversion area. Data size is also a factor here, it might be faster to just dump everything into a data lake or some such, and do additional filtering / sanitizing / anonymisation and so on, before pushing it to the final database (or whatever the end-target is). It is easy to add steps later on. My advice in general is to start simple, not over-engineer without a good reason.

Resources