Serverless Synapse SQL pools enable you to read Parquet/CSV files or Cosmos DB collections and return their content as a set of rows. In some scenarios, you would need to ensure that a reader cannot access some rows in the underlying data source. This way, you are limiting the result set that will be returned to the users based on some security rules. In this scenario, called Row-level security, you would like to return a subset of data depending on the reader’s identity or role.
Row-level security is supported in dedicated SQL pools, but it is not supported in serverless pools (you can propose this feature in Azure feedback site). In some cases, you can implement your own custom row-level security rules using standard T-SQL code.
In this post you will see how to implement RLS by specifying the security rules in the WHERE clause or by using inline table-value function (iTVF).
Scenario
Let’s imagine that we have a view or external table created on top of the COVID data set placed in the Azure Data Lake storage:
create or alter view dbo.covid as
select *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
format = 'parquet') as rows
The readers will get all COVID cases that are written in the Parquet file. Let’s imagine that the requirement is to restrict the user access and allow them to see just a subset of data based on the following rules:
- Azure AD user jovan@adventureworks.com can see only the COVID cases reported in Serbia.
- The users in ‘AfricaAnalyst’ role can see only the COVID cases reported in Africa.
We can represent these security rules using the following predicate (this is T-SQL pseudo-syntax):
( USER_NAME = 'jovan@adventureworks.com' AND geo_id = 'RS' )
OR
( USER IS IN ROLE 'AfricaAnalyst' AND continent_exp = 'Africa' )
You can use the system functions like SUSER_SNAME() or IS_ROLEMEMBER() to identify the caller and easily check should you return some rows to the current user. We just need to express this condition in T-SQL and add it as a filtering condition in the view.
In this post you will see two methods for filtering the rows based on security rules:
- Filtering rows directly in the WHERE condition.
- Applying security predicated coded in iTVF.
As a prerequisite, you should setup a database role that should have limited access to the underlying data set.
CREATE ROLE AfricaAnalyst;
CREATE USER [jovan@contoso.com] FROM EXTERNAL PROVIDER
CREATE USER [petar@contoso.com] FROM EXTERNAL PROVIDER
CREATE USER [petar@contoso.com] FROM EXTERNAL PROVIDER
ALTER ROLE AfricaAnalyst ADD MEMBER [jovan@contoso.com];
ALTER ROLE AfricaAnalyst ADD MEMBER [petar@contoso.com];
ALTER ROLE AfricaAnalyst ADD MEMBER [nikola@contoso.com];
In this code we have added a role and added three users to this role.
Filtering rows based on security rule
The easiest way to implement row-level security is to directly embed security predicates in the WHERE condition of the view:
create or alter view dbo.covid as
select *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
format = 'parquet') as rows
WHERE
( SUSER_SNAME() = 'jovan@adventureworks.com' AND geo_id = 'RS')
OR
( IS_ROLEMEMBER('AfricaAnalyst', SUSER_SNAME()) = 1 AND continent_exp = 'Africa')
The Azure AD principal jovan@adventureworks.com will see only the COVID cases reported in Serbia:
Azure AD principals that belong to ‘AfricaAnalysts’ role will see the cases reported in Africa when running the same query:
Without any change in the query, the readers will be able to access only the rows that they are allowed to see based on their identity.
Operationalizing row-level-security with iTVF
In some cases, it would be hard to maintain security rules if you put them directly in the view definition. The security rules might change, and you don’t want to track all possible views to update them. The WHERE condition in the view may contain other predicates, and some bug in AND/OR logic might change or disable the security rules.
The better idea would be to put the security predicates in a separate function and just apply the function on the views that should return limited data.
We can create a separate schema called secure, and put the security rules in the inline table value function (iTVF):
create schema secure
go
create or alter function secure.geo_predicate(@geo_id varchar(20),
@continent_exp varchar(20))
returns table
return (
select condition = 1
WHERE
(SUSER_SNAME() = 'jovanpop@adventureworks.com' AND _id = 'RS')
OR
(IS_ROLEMEMBER('AfricaAnalyst', SUSER_SNAME())=1 AND continent_exp='Africa')
)
This predicate is very similar to the predicates used in native row-level security in the T-SQL language. This predicate evaluates a security rule based on geo_id and continent_exp values. These values are provided as the function parameters. The function will internally use SUSER_SNAME() and IS_ROLEMEMBER() functions and evaluate security rules based on geo_id and continent_exp values. If the current user can see a row with the provided values, the function fill return value 1. Otherwise, it will not return any value.
Instead of modifying every view by adding new conditions in the WHERE clause, we can create the secure wrapper views where we will apply this predicate. In the following example, I’m creating a view secure.covid where I will apply the predicate and pass the columns that will be used in the predicate to evaluate should this row be returned or not:
create or alter view secure.covid
as
select *
from dbo.covid
cross apply security.geo_predicate(geo_id, continent_exp)
go
Note that in this case the view dbo.covid will not have the WHERE clause that contains the security rule. Security rules are filtering rows by applying security.geo_predicate() iTVF on the original view. For every row in the dbo.covid view. The query will provide the values of geo_id and continent_exp columns to the iTVF predicate. The predicate will remove the row from the output if these values do not satisfy criterion.
A user jovanpop@adventureworks.com will see the filtered results based on his context:
Placing the security rules in a separate iTVFs and creating the secure wrapper views will make your code more maintainable.
Conclusion
Although native row-level-security is not available in serverless SQL pool, you can easily implement the similar security rules using the standard T-SQL functionalities.
You just need to ensure that a reader cannot bypass the security rules.
- You need to ensure that the readers cannot directly query the files or collections in the underlying data source using the OPENROWSET function or the views/external tables. Make sure that you restrict access to the OPENROWSET/credentials and DENY SELECT on the base views and external tables that read original un-filtered data.
- You need to ensure that users cannot directly access the underlying data source and bypass serverless SQL pool. You would need to make sure that underlying storage is protected from random access using private endpoints.
- Make sure that the readers cannot use some other tool like other Synapse workspace, Apache Spark, or other application or service to directly access the data that you secured on your workspace.
This method uses the core T-SQL functionalities and will not introduce additional overhead compared to the native row-level security. If this workaround doesn’t work in your scenario, you can propose this feature in Azure feedback site.