Rodrigo, seeing this late. Great blog post. I have always wondered about the best way to organize the data lake files. Also what are your thoughts on putting RDBMS data in csv format in the Data lake vs just landing it in a landing zone RDBMS. Then just join the RDBMS data and file-based data lake data with Spark when you need to. Seems like a shame to de-schematize table to csv and maintain it in sync with changes to the table just to have the data in the data lake. There is the cost of having the RDBMS running in the landing zone, but is it worth it to keep the schema? Be interesting to see what the consensus is. I like the idea of keeping the data that is schematized in a landing zone RDBMS (maintaining it with ETL, CDC, Transactional Replication) and joining via Spark or ADF to file-based data sources when needed. What do you think is a best practice?