data & ai
2 TopicsA Practical Guide to Implement End-to-End Data Warehousing PoC Using Microsoft Fabric-Part1
Sometime back we had the opportunity to help a customer implement a PoC using Microsoft Fabric. In a series of blogs, we will demonstrate how we helped our customer from the SfMC team to implement a Data Warehouse PoC using the ELT (Extract, Load, Transform) approach. For this PoC, we used sample data from SQL Server’s well-known sample databases, giving us a straightforward and accessible way to illustrate the capabilities of Microsoft Fabric in a real-world data warehousing scenario. The following were the customer requirements: Build the enterprise grade Data Warehouse solution in Microsoft Fabric Ingest data from 59 diverse sources across 130 different countries Source data from ADLS (JSON) Source data from Azure and on-prem SQL Server databases Other data sources. PoC scope: ELT approach ingest (OLTP DB & ADLS Gen2) data directly into the Warehouse Transformation using stored procedures to load the production schema Solution Summary: Ingest OLTP data from source SQL Server (full load) Meta data driven copy activity to load data into staging schema of the Data Warehouse. Stored procedures to process the staging data along with data from ADLS shortcuts. Load the production Dimension tables. Application’s customer data arrives into ADLS storage as JSON files. Stored Procedures to process the data and load Customer dimension and Fact data using incremental load. Built 6 Dimensions and 1 Fact. Build reports. Sample overview of items that were implemented: Sample report from the PoC: Prerequisites: To implement this PoC, the following are the prerequisites: you may need to download the following files: You can download them from: Download all the required files to local drive (eg: C:\temp). We assume that you already have a Fabric tenant setup with a Fabric Capacity (you will need a F64 capacity to test the co-pilot feature else a F2 capacity will be sufficient). If Fabric is not enabled, use the link below to enable Fabric for your organization. Step by Step Guide to Enable Microsoft Fabric for Microsoft 365 Developer Account You will also need: A logical SQL Server (how to deploy one, how to add local IP address to the firewall exception) ADLS Gen2 Account (how to deploy one, how to create a container) Once created, Once the account is created, navigate to the “Data Storage” section and create a container name it levelup or a name of choice Open the container “levelup”and create a folder called “JSON_FILES” Install Storage Explorer, configure to connect to your Azure subscription, and navigate to storage account container. Upload five JSON files from the downloaded folder “JSON_FILES” to ADLS Gen2 in the "levelup" container under the "JSON_FILES" folder. Upload the following folders by choosing “Upload Folder” from Azure Storage Explorer. Sales_SalesOrderDetail Sales_SalesOrderHeader The above two folders containing Delta files, will be used for creating shortcuts in the Lakehouse which will be used for building the tables within the Warehouse in the subsequent tasks. After the upload, you should have the folders below inside your Levelup container. Next, create a database on the Azure SQL Server by using the bacpac files downloaded earlier: Connect to Azure SQL Logical Server using SSMS (if you don’t have, you can download here) Right click on Databases and select the option “Import Data-Tier Application”. Follow the screen captures below to complete the database bacpac import. Note: Depending upon DTU/V-core chosen, the Import activities might take up 30 mins. Continue to the next blogA Practical Guide to Implement End-to-End Data Warehousing PoC Using Microsoft Fabric-Part2
In this continuation (part 2) of the previous blog, you will create the required Fabric items within a workspace and create the tables in Lakehouse and Data Warehouse. Task1: Create a Microsoft Fabric Workspace, Data Warehouse and Lakehouse Before you can start building the solution, you'll first need to create a workspace where you'll create the rest of the items. Follow the steps below to create the workspace. Sign in to https://app.fabric.microsoft.com/ Select Workspaces > New Workspace. Fill out the Create a workspace dialog as follows: Name: LevelUpFabric1 (or a name of choice). Name should be unique. Expand the Advanced section. In the License Mode choose Trial or Fabric Capacity. Click Apply. The workspace will be created and opened. In the upper left corner, select New Item> search and click on Warehouse to create Data Warehouse and name it “DW_Levelup”. This will create the Data Warehouse and open it. Click on the workspace icon the left navigation to navigate to the workspace page. In the upper left corner of the Workspace, select New Item > search and click on Lakehouse to create a Lakehouse and name it, “LH_Levelup”. This will create the Lakehouse and open it. Task2: Create table Shortcuts in the Lakehouse from ADLS Gen2 We will now create shortcuts in the Lakehouse pointing to the two delta folders (Sales_SalesOrderHeader and Sales_SalesOrderDetail) which you had uploaded into the ADLS Gen2 store account in the pre-requisite steps. In the Lakehouse explorer, hover over the Tables folder (not Files folder) and click on the ellipsis and click “New Shortcut”. In the New Shortcut dialog box, select ADLS Gen2 under External sources. In the next screen, provide the required details as shown in the below snip. You may refer to your storage account settings for details. It is discouraged to use Account key for authentication but is done so for ease of use. The URL should be similar to: https://xxxxx.dfs.core.windows.net/levelup Ensure the shortcut name “Sales_SalesOrderDetail” and “Sales_SalesOrderHeader” matches the delta folder name. Important: Please use Sales_SalesOrderHeader and Sales_SalesOrderDetail as shortcut names. These names are used in the stored procedures. If you choose to use different shortcut names in step 5, you may need to update the SPs that refer to it, to avoid errors. The shortcuts should show up within the tables section of the Lakehouse like the below. We will use these shortcuts later in the solution. You may have to refresh the tables section for the shortcuts to appear (you may also have to refresh the tables section if the shortcuts appear as under “Unidentified” too. If it continues to be unidentified, there could be some steps above which were not followed correctly.) Task3: Create the required Tables and SPs in the Fabric Data Warehouse Now, go to Fabric workspace “LevelUpFabric1” and Navigate to the Data Warehouse and open a new query window and run below scripts. Note: The above scripts are available in the “FabricLevelUp_Files” folder that was downloaded during the prerequisite steps within “TSQL_Script” folder. Scripts Details 01_create_AllStgDimTbls.sql Creates staging (stg) schema and all the staging related tables. 02_TablesToCopy.sql Creates metadata tables to dynamic full load activity. 03_del_AllRecordsOfStgDimTbls.sql Creates stored procedure to delete all the records of the staging dim tables. 04_crt_Prd_Sch_Tbl_Sp.sql Create production (prd) schema, fact table, and SP for incremental update to fact table. 05_crt_Prd_Dim_Tbls.sql Create stored procedure to create all the production dim tables. 06_upd_prd_dim_Customer.sql Creates SP to update incremental stg.DimCustomer records to prd.DimCustomer tables. So far, we have created workspace, all the required items in the Lakehouse and Data Warehouse which will be used to load the Data Warehouse. Continue to the next blog...