Power BI DataFlows Integrates Directly with an Azure Data Infrastructure
For companies operating in highly regulated industries such as Healthcare, the promise of self-service Business Intelligence often takes a back seat to regulatory concerns about sensitive data such as Personally Identifiable Information (PII). Healthcare companies require capabilities to control the flow of sensitive data for both enterprise and self-service Business Intelligence. This article will review strategies for controlling access to sensitive data while still empowering users to gain value from Microsoft Business Intelligence and Analytics tools.
This article is the second in a series exploring how Power BI paired with Azure data tools creates a flexible, scale-able, and achievable healthcare analytics architecture:
Terms often associated with sensitive data include PII, PHI (Protected Health Information), and PIFI (Personally Identifiable Financial Information). Data that could be used for unfair financial market trades, often referred to as “insider information,” is also a consideration when granting users access to data. I am not an expert on these laws and the specifics of the associated requirements, but the tools and techniques below will hopefully provide value as you consider a plan for managing sensitive data.
Protecting and De-Identifying sensitive data goes beyond the simple removal of names, addresses, social security numbers, etc. The challenge of minimizing sensitive data risk becomes more complex when using Business Intelligence and Analytics tools. Here’s a few examples:
Here’s a few examples of how sensitive PII might get accidentally shared with the wrong person:
Examples of how Accidental PII Sharing can Happen
IMAGE A
When sensitive data is shared with approved users, there are tools and techniques which can help minimize the risk of accidental sharing. New Power BI Data Protection capabilities, Administrative Settings, Power BI DataFlows with Azure Data Lake, and Row Level Security can help simplify security and access to sensitive data.
Power BI Data Protection
Microsoft Information Protection and Cloud App Security tools are now built into Power BI. These capabilities are game-changers for the Business Intelligence field, and were just recently announced at Ignite. New capabilities include Security for data after exporting it from Power BI, and device-based Security. Example A.1 above is directly impacted by these new features. Click here for more details. I will cover Data Protection in the next article of this series.
Power BI Administrative Settings
There are several settings in the Power BI Administrative Portal that can impact access to PII. Click here for a comprehensive list. A few I’d recommend understanding in detail for an enterprise deployment requiring PII governance include:
Data Flows and Azure Data Lake
What is DataFlows? DataFlows is a self-service ETL/ELT tool in Power BI that is easy to use with a low-code/no-code interface. It also allows Power BI Administrators to control data available to self-service Power BI Model architects. Here is a comparison of DataFlows and enterprise ETL/ELT tools:
Differences between Power BI DataFlows and Enterprise ETL/ELT tools
IMAGE B
So how do Power BI DataFlows and Azure integrate for Sensitive Data Access Control? In the slide below, notice that the functional components of Power BI exist within the same secure Azure tenant as Azure Data Lake. There are also other tools available in Azure for enterprise grade ETL/ELT, Data Science / ML Projects, and for scaling up very large databases using Azure Synapse Analytics. It is important to note that at the time of writing this article, Power BI DataFlows and Azure Data Lake integration is still in Preview:
Power BI DataFlows Integrates Directly with an Azure Data Infrastructure
IMAGE C
Let’s review key features having enumerated green arrows above:
C.1 - Low-Code / No-Code ETL/ELT - Power BI Pro users can create low-code/no-code ETL/ELT packages using DataFlows. Pro users can also use existing DataFlows as tables of data to build Power BI Models. By default, DataFlows are stored in a hidden SaaS Azure Data Lake that is part of Power BI. If you choose to add your own Azure Data Lake, you can access the content that has been loaded into it by DataFlows. Once in Azure Data Lake, data can be used in DataBricks, ETL/ELT tools, Azure databases, and third party applications outside of Azure. As a result DataFlows does not trap your data in Power BI, and you can use those tables of data anywhere.
C.2 - Azure ML Integration - DataFlows also has native integration with Azure ML. If your Data Science team publishes Models to Azure ML, DataFlows users can use those ML models to score tables of data during scheduled refreshes. The integration is point-and-click with no need to write code. Azure Cognitive Services can also be integrated in a similar way.
C.3 - Open Platform for Third Party Connectivity – Third Party tools can pull data from Azure Data Lake, and connect to Power BI Data Models in Premium just like they can with Analysis Services.
C.4 - Open Platform for Third Party Reporting - With Power BI, end users can build reports with the data visualization tool of their choice on top of those Premium Power BI Data Models.
So how can DataFlows help control access to PII? Consider the scenario in the following diagram:
Manage PII with DataFlows and Azure Data Lake
IMAGE D
In the example above, User A Can View the App containing PII but Cannot make their Own Report or Re-share the App. User B Cannot View Anything from DataFlow 1.
A few considerations when using DataFlows with Azure Data Lake:
Power BI DataFlows with Azure Data Lake
IMAGE E
The next two articles in this series will also focus on capabilities that enable secure use of PII in an enterprise and self-service Business Intelligence environment:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.