Introduction
As businesses continue to generate massive amounts of data, the need for efficient data management solutions becomes increasingly important. This is where a data lake house comes in - an hybrid solution that combines the best features of a datalake and a data warehouse.
Part 1 - Building a Data Lakehouse using Azure Data explorer
We explored how to build a data lakehouse using Azure Data Explorer (ADX) where the data flows from Azure SQL DB using Change Data Capture (CDC) through Azure Data Factory and events flowing from events hub.
This article is Part 2 in the series, here we will deploy this solution using Bicep, a powerful infrastructure as code (IaC) tool from Microsoft. With this guide, you'll be able to create a data lakehouse that can handle large volumes of data and provide valuable insights for your business.
Requirements
- An Azure account and a logged in user with admin permissions
Infrastructure Deployment
- Go to github and download the files from here:
https://github.com/denisa-ms/azure-data-and-ai-examples/tree/master/adx-datalakehouse
- Go to the azure portal and login with a user that has administrator permissions
- Open the cloud shell in the azure portal
- Upload the file “all.zip” in the github repo by using the upload file button in the cloud shell
- Unzip the file by writing unzip all.zip
- Run ./createAll.ps1
NOTE: This takes time so be patient
Explanation
The code here creates the following entities
Azure SQL Server
Contains an Azure SQL database with the Adventure works sample data.
Azure Data Factory – (adxdlhouse-adf)
Contains 2 data pipelines:
- SQLToADX_orders: copies the orders from the Adventureworks sample DB in Azure SQL Server into ADX tables bronzeOrders
- SQLToADX_products: copies the products from the Adventureworks sample DB in Azure SQL Server into ADX tables bronzeProducts
Azure Events Hub
Contains a hub called “clicks-stream” that streams click events into ADX table bronzeClicks
How to Demo
In order to run this demo, you should:
- Create all the infrastructure by following the steps above in the infrastructure deployment section.
- Run the 2 pipelines in Azure Data factory to copy products and orders to ADX
- Ingest sample click events into the bronzeClicks table using the file HERE in Azure Data Explorer using 1 click ingestion as follows:
Select file
Click start Ingestion.
We are done!
We have products and orders from our operational DB (Azure SQL) and events coming from a stream in events hub.
In this demo I chose to add synthetic events using one-click ingestion, but you can create events and publish them to Events hub and they will be ingested using streaming ingestion to the bronzeClicks table.
I hope you enjoyed this
Thanks
Denise
Updated Jun 08, 2023
Version 5.0Denise_Schlesinger
Microsoft
Joined January 28, 2023
Startups at Microsoft
Follow this blog board to get notified when there's new activity