What kind of data would you put in a Data Lake?

Question

How would describe the data structure in a Data Lake for a Data Scientist?
&nbsp;
How would data be extracted for a Data Vault or Star Schema?
&nbsp;
Thanks,
Thomas
&nbsp;

cobrow · Accepted Answer

Thomas LeBlanc&nbsp;the Data Lakes flexibility is it's key value proposition.&nbsp; Customers no longer need to determine what can / can't be put into storage.&nbsp; Discussions I have with customer is how they can put wave and mp3 files alongside high resolution medical imaging and videos.&nbsp; This is in addition to raw data generated from operations of running the business.&nbsp;When introducing the customer to the Data Lake and how to navigate&nbsp;I usually describe using the Windows File Explorer.&nbsp;The structure or model of the Data Lake is what tailors the folders to your Data Scientists or customers requirements.&nbsp; This is a community discussion Rodrigo Souza&nbsp;has a post summarizing a great viewpoint (https://techcommunity.microsoft.com/t5/data-architecture-blog/how-to-organize-your-data-lake/ba-p/1182562)&nbsp;Data extraction from the lake also has equal flexibility.&nbsp; For databases and Star Schema's that might require some transformations of the data I would lead with Azure Data Factory (or SSIS).&nbsp; For some Data Scientists I work with they prefer Python as there primary language and choose to use this to extract, manipulate, and write back data to the Lake.&nbsp; The third process that currently has my interest for reporting purposes from the Azure Data Lake is using Power Platform Data Flows as my customer relies on Power BI (https://cloudblogs.microsoft.com/dynamics365/it/2019/09/18/using-power-platform-dataflows-to-extract-and-process-data-from-business-central-post-3/)

Forum Discussion

What kind of data would you put in a Data Lake?

2 Replies

Resources