For many people, hands-on experience is often the best way to learn and evaluate data tools. I've been working with a colleague from our Azure team, Kunal Jain, to put together an end-to-end Azure Synapse and Power BI solution using 120+ million rows of real CMS Medicare Part D Data that is available for use in the public domain. If you're not highly technical, and you've never used Azure or Power BI before, you can still deploy this solution with a few simple steps using an Azure ARM template. We also provide a video to walk you through a paint-by-numbers tutorial. The Azure ARM template will automatically:
- Create Azure Data Lake, Azure Data Factory, and Azure Synapse
- Pull the raw data from CMS into a Data Lake using Azure Data Factory
- Shape the data and create a dimensional model for Azure Synapse
- Deploy the solution to Azure Synapse, including performance tuning settings
The entire process takes about an hour and a half to run, with most of that time spent on waiting for the scripts from the ARM template to finish. Once deployed, another video also walks you through the steps of connecting it to a pre-built Power BI report template. The whole process should take about 1-2 hours with no code required, and at the end you can review and evaluate and end-to-end Azure and Power BI solution using real CMS Medicare Part D Data:
- Azure Data Factory for ELT packages
- Azure Data Lake for Unstructured Big Data Storage
- Azure Synapse Analytics for an MPP Cloud Data Warehouse
- Power BI for an Analysis Services Semantic Model and Data Visualizations
Here is a link to the GitHub site: https://github.com/kunal333/E2ESynapseDemo
While the source CMS data is real public Healthcare data, the intent of this project is to provide you with a simple end-to-end solution for the purposes of learning, demos, and tool evaluation. We intend to enhance and build upon this solution in the future, but it is not a supported solution intended to be used for production purposes.
Below is a tutorial video for deploying the solution. Note that this is designed to be low code with only a few things to cut and paste. All you need is an Azure account, and you can pause the Synapse instance or delete the entire Resource Group at any time. There are also simple instructions on the GitHub page:
Here is another tutorial video describing the process by which to connect the Power BI Template file containing the business logic:
The following diagram summarizes the steps of the whole process:
Azure Data Factory, along with Power BI, creates the following logical model that enables highly performant end user queries for complicated questions about the data. Notice that a CSV file is also added to the Power BI layer to demonstrate that custom criteria from a business user can be used to query the Synapse data:
Calculations have been added to the Power BI Semantic Layer to enable complex analytics such as Pareto Analysis:
Below is a screenshot of the pre-built Power BI report:
More information about the solution is available at the GitHub site: https://github.com/kunal333/E2ESynapseDemo
If you deploy this solution, we'd appreciate if you could take the time to provide some feedback. What was your experience with the ARM template? How do you plan to use this solution? What types of similar solutions can we provide in the future that would be valuable?: