The inspiration for this article was the realization that Eventhouse aligns perfectly with the Medallion’s architecture requirements.
The Medallion architecture is a data design pattern with 3 layers:
So, in this article we will explore how to build a Real time analytics platform using the Medallion architecture using MS Fabric Eventhouse.
Real-time analytics platforms provide insights as soon as the data is ingested for making better decisions quickly. They handle large amounts of data with high velocity and low response times.
With this guide, you'll be able to create an end-to-end Real time Analytics platform that can handle large volumes of data and provide valuable insights for your business in real-time.
We will build the following:
Suppose you own an e-commerce website selling bike accessories.
You have millions of visitors a month, you want to analyze the website traffic, consumer patterns and predict sales.
This article will walk you through the process of building an end-to-end Real time Analytics Solution in MS Fabric, using the medallion architecture, for your e-commerce website.
You will learn how to:
A detailed step by step tutorial can be found here: MS Fabric - Real time analytics Tutorial - Introduction (moaw.dev)
All the code can be found here: denisa-ms/adx-analytics-fabric (github.com)
A medallion architecture (term coined by Databricks) is a data design pattern used to logically organize data.
The goal is to incrementally improve the structure and quality of data as it flows through each layer of the architecture.
Medallion architectures are sometimes also referred to as "multi-hop" architectures.
Unstructured and raw data are ingested using scalable pipelines to output the highest quality enriched data.
The medallion architecture is comprised of 3 layers.
Bronze layer:
Silver layer:
Gold layer:
Eventhouse is the storage solution in MS Fabric for streaming data. It provides a scalable solution for handling and analysing large volumes of structured, semi-structured and unstructured data, that is automatically indexed and partitioned based on ingestion time.
An Eventhouse allows you to manage multiple databases at once, sharing capacity and resources to optimize performance and cost.
Use an Eventhouse for event-based data scenarios, such as telemetry and log data, time series and IoT data, security and compliance logs, or financial records.
You can read more about this here.
To build the Medallion Architecture in our Eventhouse DB, data needs to be transformed and copied between the layers (Bronze->Silver->Gold).
More information can be found here:
The e-commerce store database entities are:
Contains referenced tables that are NOT copied into our Eventhouse DB but used for joins, directly connecting to our operational DB (Azure SQL)
//External tables - shortcuts
// connect to operational Database with external table Product
.create external table products (ProductID: int, ProductNumber: string, Name: string)
kind=sql
table=[SalesLT.Product]
(
h@'Server=tcp:adxdemo.database.windows.net,1433;Initial Catalog=aworks;User Id=sqlread;Password=ChangeYourAdminPassword1'
)
with
(
createifnotexists = true
)
// connect to operational Database with external table ProductCategory
.create external table productCategories (ProductCategoryID: int, Name: string)
kind=sql
table=[SalesLT.ProductCategory]
(
h@'Server=tcp:adxdemo.database.windows.net,1433;Initial Catalog=aworks;User Id=sqlread;Password=ChangeYourAdminPassword1'
)
with
(
createifnotexists = true
)
Contains all the raw data tables copied using Data Pipelines.
.create table [Address] (AddressID:int,AddressLine1:string,AddressLine2:string,City: string, StateProvince:string, CountryRegion:string, PostalCode: string, rowguid: guid, ModifiedDate:datetime)
.create table [Customer](CustomerID:int, NameStyle: string, Title: string, FirstName: string, MiddleName: string, LastName: string,Suffix:string, CompanyName: string, SalesPerson: string, EmailAddress: string, Phone: string, ModifiedDate: datetime)
.create table [SalesOrderHeader](SalesOrderID: int, OrderDate: datetime, DueDate: datetime, ShipDate: datetime, ShipToAddressID: int, BillToAddressID: int, SubTotal: decimal, TaxAmt: decimal, Freight: decimal, TotalDue: decimal, ModifiedDate: datetime)
.create table [SalesOrderDetail](SalesOrderID: int, SalesOrderDetailID: int, OrderQty: int, ProductID: int, UnitPrice: decimal , UnitPriceDiscount: decimal,LineTotal: decimal, ModifiedDate: datetime)
//adds a hidden field showing ingestion time
.alter table Address policy ingestiontime true
.alter table Customer policy ingestiontime true
.alter table SalesOrderHeader policy ingestiontime true
.alter table SalesOrderDetail policy ingestiontime true
Contains enriched tables.
We will add the hidden ingestionTime field to our silver layer tables.
.create table [SilverAddress] (AddressID:int,AddressLine1:string,AddressLine2:string,City: string, StateProvince:string, CountryRegion:string, PostalCode: string, rowguid: guid, ModifiedDate:datetime, IngestionDate: datetime)
.create table [SilverCustomer](CustomerID:int, NameStyle: string, Title: string, FirstName: string, MiddleName: string, LastName: string,Suffix:string, CompanyName: string, SalesPerson: string, EmailAddress: string, Phone: string, ModifiedDate: datetime, IngestionDate: datetime)
.create table [SilverSalesOrderHeader](SalesOrderID: int, OrderDate: datetime, DueDate: datetime, ShipDate: datetime, ShipToAddressID: int, BillToAddressID: int, SubTotal: decimal, TaxAmt: decimal, Freight: decimal, TotalDue: decimal, ModifiedDate: datetime, DaysShipped: long, IngestionDate: datetime)
.create table [SilverSalesOrderDetail](SalesOrderID: int, SalesOrderDetailID: int, OrderQty: int, ProductID: int, UnitPrice: decimal, UnitPriceDiscount: decimal,LineTotal: decimal, ModifiedDate: datetime, IngestionDate: datetime)
// use update policies to transform data during Ingestion
.create function ifnotexists
with (docstring = 'Add ingestion time to raw data')
ParseAddress ()
{
Address
| extend IngestionDate = ingestion_time()
}
.alter table
SilverAddress
policy update @'[{"Source": "Address", "Query": "ParseAddress", "IsEnabled" : true, "IsTransactional": true }]'
.create function ifnotexists
with (docstring = 'Add ingestion time to raw data')
ParseCustomer ()
{
Customer
| extend IngestionDate = ingestion_time()
}
.alter table
SilverCustomer
policy update @'[{"Source": "Customer", "Query": "ParseCustomer", "IsEnabled" : true, "IsTransactional": true }]'
.create function ifnotexists
with (docstring = 'Add ingestion time to raw data')
ParseSalesOrderHeader ()
{
SalesOrderHeader
| extend DaysShipped = datetime_diff('day', ShipDate, OrderDate)
| extend IngestionDate = ingestion_time()
}
.alter table
SilverSalesOrderHeader
policy update @'[{"Source": "SalesOrderHeader", "Query": "ParseSalesOrderHeader", "IsEnabled" : true, "IsTransactional": true }]'
.create function ifnotexists
with (docstring = 'Add ingestion time to raw data')
ParseSalesOrderDetail ()
{
SalesOrderDetail
| extend IngestionDate = ingestion_time()
}
.alter table
SilverSalesOrderDetail
policy update @'[{"Source": "SalesOrderDetail", "Query": "ParseSalesOrderDetail", "IsEnabled" : true, "IsTransactional": true }]'
In the SalesOrderHeader table we will also add a calculated field called “DaysShipped” to calculate the number of days it took to ship the goods in the orders.
.create function ifnotexists
with (docstring = 'Add ingestion time to raw data')
ParseSalesOrderHeader ()
{
SalesOrderHeader
| extend DaysShipped = datetime_diff('day', ShipDate, OrderDate)
| extend IngestionDate = ingestion_time()
}
Contains aggregated data.
We use materialized views for deduplication to show only the latest changes to the source in this layer.
//GOLD LAYER
// use materialized views to view the latest changes in the tables
.create materialized-view with (backfill=true) GoldAddress on table SilverAddress
{
SilverAddress
| summarize arg_max(IngestionDate, *) by AddressID
}
.create materialized-view with (backfill=true) GoldCustomer on table SilverCustomer
{
SilverCustomer
| summarize arg_max(IngestionDate, *) by CustomerID
}
.create materialized-view with (backfill=true) GoldSalesOrderHeader on table SilverSalesOrderHeader
{
SilverSalesOrderHeader
| summarize arg_max(IngestionDate, *) by SalesOrderID
}
.create materialized-view with (backfill=true) GoldSalesOrderDetail on table SilverSalesOrderDetail
{
SilverSalesOrderDetail
| summarize arg_max(IngestionDate, *) by SalesOrderDetailID
}
We use materialized views to create summarized clicks and impressions per date
.create materialized-view with (backfill=true) GoldDailyClicks on table Event
{
Event
| where eventType == "CLICK"
| extend dateOnly = substring(todatetime(eventDate).tostring(), 0, 10)
| summarize count() by dateOnly, eventType
}
.create materialized-view with (backfill=true) GoldDailyImpressions on table Event
{
Event
| where eventType == "IMPRESSION"
| extend dateOnly = substring(todatetime(eventDate).tostring(), 0, 10)
| summarize count() by dateOnly, eventType
}
Follow this step by step tutorial and find instructions on how to build Real Time Dashboards on top of these tables.
The tutorial to build all this architecture can be found here: MS Fabric - Real time analytics Tutorial - Introduction (moaw.dev)
All the code can be found here: denisa-ms/adx-analytics-fabric (github.com)
That’s it, we are done.
I hope you enjoyed reading this.
Denise
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.