Combine Azure Synapse and Power BI RLS to Analyze Aggregated Data while Controlling Granular PII

Greg_Beaumont

Microsoft

Apr 23, 2020

Let’s start this article off with a simple summary of the use case:

You’ve been tasked with creating an analytics solution based on large volumes of data containing PII (Personally Identifiable Information)
End users will need to see Summary Level Data for their Department, and also Drill to Detail Level Data containing PII.
Here’s the catch! Users can only view Detail Level Data containing PII for a subset of the Department cohort! For example a Physician can view reports and do interactive analysis of aggregated data for every patient in their entire Department, but they are only allowed to see details containing PII for their own patients.
You’d like to deliver an easy-to-use and highly performant solution that has a simple, cohesive, and elegant design.

Consider the diagram below. A User can see aggregated data for all Specialties, but only two States. When the User drills to detail, they can only see detail data containing PII for one State and one Specialty:

Control Access to Detail Level PII without Compromising Analytics for Aggregated Data

Traditional Row Level Security (RLS) can be used to prevent users from viewing data they are not supposed to see, but it also prevents them from viewing aggregated metrics of the rows obfuscated by RLS. Also, even when PII has been scrubbed from data, re-identification can still be an issue. For example, there may be only one person in a zip code with a rare medical condition. Even though their PII has been scrubbed from a reporting data set, showing data for that zip code and medical condition could lead to re-identification of the person. It is often a best practice to set the grain of de-identified data at a count of persons that reduces or eliminates this risk. Implementing this concept into Power BI Aggregation Tables or Azure Synapse Materialized Views can prevent queries from yielding granular data below a certain patient count. In the example below, that minimum is 8 patients:

Reduce the risk of Re-Identification by Aggregating Counts of Patients in the Table Grain

In the example above, the Aggregation Table (left) can be set to have different RLS than the Detail Table (right). If the Aggregation is cached in Power BI as an Import table, and the Detail Table is in Azure Synapse, then drilling to details which are different from the aggregates can be achieved such as in the example below:

Allow Analytics of Aggregated Data while still controlling Access to Detail-Level PII

When using Azure Synapse Analytics with Power BI, you can apply RLS to any of the three basic types of data tables: 1) Power BI Import tables for ease-of-use and highly efficient query performance from an optimized cache of data, 2) Azure Synapse Materialized Views for MPP performance using efficient queries, and 3) Azure Synapse Tables for limitless scale. The diagram below shows these three options that you can mix-and-match in a Composite Model and even layer them as Power BI Aggregations:

Power BI Composite Models offer Different Options for Persisting Data

Using a methodology of controlling access to Detail level data differently from Aggregated data can be used in many different, flexible and useful ways. Addressing concerns with protecting PII is a common use case. There are a few considerations that should be evaluated and rigorously tested if considering this type of architecture:

Implementing RLS has performance implications in both Power BI and Azure Synapse. Every time a query uses RLS, an additional layer of complexity is added. The RLS design, Power BI Model sizing, and Azure Synapse Analytics sizing all need to be evaluated.
RLS architecture should be designed to require minimal compute, and also rigorously tested before the Azure Synapse and Power BI Data Model designs are finalized.
An efficient and effective RLS design will minimize the chances of future re-work, reduce compute overhead for processing queries, and potentially reduce costs due to optimized query performance and storage requirements.

This article is the fourth in a series exploring how Power BI paired with Azure data tools creates a flexible, scale-able, and achievable healthcare analytics architecture:

#1 - Unleash Massive Healthcare Data Volumes to Analytics using Power BI Aggregations - Click Here!
#2 - Control PII and Sensitive Data Risk for Self-Service BI using Power BI DataFlows and Azure Data Lake - Click Here!
#3 - Microsoft Azure Information Protection Secures Power BI Data Exports for a Seamless DLP Strategy Click Here!
#4 - Combine Azure Synapse and Power BI RLS to Analyze Aggregated Data while Controlling Granular PII (This Article)

Updated Apr 23, 2020

Version 1.0

healthcare

Greg_Beaumont

Microsoft

Joined December 17, 2018

View Profile

Healthcare and Life Sciences Blog

Follow this blog board to get notified when there's new activity