This post is part of a multi-part series titled "Patterns with Azure Databricks". Each highlighted pattern holds true to the key principles of building a Lakehouse architecture with Azure Databricks:
Companies need to ingest data in any format, of any size, and at any speed into the cloud in a consistent and repeatable way. Once that data is ingested into the cloud, it needs to be moved into the open, curated data lake, where it can be processed further to be used by high value use cases such as SQL analytics, BI, reporting, and data science and machine learning.
The diagram above demonstrates a common pattern used by many companies to ingest and process data of all types, sizes, and speed into a curated data lake. Let's look at the 3 major components of the pattern:
The ingestion, ETL, and stream processing pattern discussed above has been used successfully with many different companies across many different industries and verticals. It also holds true to the key principles discussed for building Lakehouse architecture with Azure Databricks: 1) using an open, curated data lake for all data (Delta Lake), 2) using a foundational compute layer built on open standards for the core ETL and stream processing (Azure Databricks), and 3) using easy integrations with other services like Azure Data Factory and IoT/Event Hubs which specialize in ingesting data into the cloud.
If you are interested in learning more about Azure Databricks, attend an event, and check back soon for additional blogs in the "Patterns with Azure Databricks" series.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.