Blog Post

Azure Synapse Analytics Blog
5 MIN READ

How to implement row-level security in serverless SQL pools

JovanPop's avatar
JovanPop
Icon for Microsoft rankMicrosoft
May 18, 2021

Serverless Synapse SQL pools enable you to read Parquet/CSV files or Cosmos DB collections and return their content as a set of rows. In some scenarios, you would need to ensure that a reader cannot access some rows in the underlying data source. This way, you are limiting the result set that will be returned to the users based on some security rules. In this scenario, called Row-level security, you would like to return a subset of data depending on the reader’s identity or role.

Row-level security is supported in dedicated SQL pools, but it is not supported in serverless pools (you can propose this feature in Azure feedback site). In some cases, you can implement your own custom row-level security rules using standard T-SQL code.

In this post you will see how to implement RLS by specifying the security rules in the WHERE clause or by using inline table-value function (iTVF).

Scenario

Let’s imagine that we have a view or external table created on top of the COVID data set placed in the Azure Data Lake storage:

 

create or alter view dbo.covid as
select *
from openrowset(
    bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
    format = 'parquet') as rows

 

 

The readers will get all COVID cases that are written in the Parquet file. Let’s imagine that the requirement is to restrict the user access and allow them to see just a subset of data based on the following rules:

  • Azure AD user jovan@adventureworks.com can see only the COVID cases reported in Serbia.
  • The users in ‘AfricaAnalyst’ role can see only the COVID cases reported in Africa.

We can represent these security rules using the following predicate (this is T-SQL pseudo-syntax):

 

( USER_NAME = 'jovan@adventureworks.com' AND geo_id = 'RS' )
OR
( USER IS IN ROLE 'AfricaAnalyst' AND continent_exp = 'Africa' )

 

 

You can use the system functions like SUSER_SNAME() or IS_ROLEMEMBER() to identify the caller and easily check should you return some rows to the current user. We just need to express this condition in T-SQL and add it as a filtering condition in the view.

In this post you will see two methods for filtering the rows based on security rules:

  • Filtering rows directly in the WHERE condition.
  • Applying security predicated coded in iTVF.

As a prerequisite, you should setup a database role that should have limited access to the underlying data set.

 

CREATE ROLE AfricaAnalyst;

CREATE USER [jovan@contoso.com] FROM EXTERNAL PROVIDER
CREATE USER [petar@contoso.com] FROM EXTERNAL PROVIDER
CREATE USER [petar@contoso.com] FROM EXTERNAL PROVIDER

ALTER ROLE AfricaAnalyst ADD MEMBER [jovan@contoso.com];
ALTER ROLE AfricaAnalyst ADD MEMBER [petar@contoso.com];
ALTER ROLE AfricaAnalyst ADD MEMBER [nikola@contoso.com];

 

In this code we have added a role and added three users to this role.

Filtering rows based on security rule

The easiest way to implement row-level security is to directly embed security predicates in the WHERE condition of the view:

 

create or alter view dbo.covid as
select *
from openrowset(
    bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet',
    format = 'parquet') as rows
WHERE
( SUSER_SNAME() = 'jovan@adventureworks.com' AND geo_id = 'RS')
OR
( IS_ROLEMEMBER('AfricaAnalyst', SUSER_SNAME()) = 1 AND continent_exp = 'Africa')

 

The Azure AD principal jovan@adventureworks.com will see only the COVID cases reported in Serbia:

 

Azure AD principals that belong to ‘AfricaAnalysts’ role will see the cases reported in Africa when running the same query:

 

Without any change in the query, the readers will be able to access only the rows that they are allowed to see based on their identity.

Operationalizing row-level-security with iTVF

In some cases, it would be hard to maintain security rules if you put them directly in the view definition. The security rules might change, and you don’t want to track all possible views to update them. The WHERE condition in the view may contain other predicates, and some bug in AND/OR logic might change or disable the security rules.

The better idea would be to put the security predicates in a separate function and just apply the function on the views that should return limited data.

We can create a separate schema called secure, and put the security rules in the inline table value function (iTVF):

 

create schema secure
go

create or alter function secure.geo_predicate(@geo_id varchar(20),
                                              @continent_exp varchar(20))
returns table
return (
    select condition = 1
    WHERE
    (SUSER_SNAME() = 'jovanpop@adventureworks.com' AND _id = 'RS')
    OR
    (IS_ROLEMEMBER('AfricaAnalyst', SUSER_SNAME())=1 AND continent_exp='Africa')
)

 

This predicate is very similar to the predicates used in native row-level security in the T-SQL language. This predicate evaluates a security rule based on geo_id and continent_exp values. These values are provided as the function parameters. The function will internally use SUSER_SNAME() and IS_ROLEMEMBER() functions and evaluate security rules based on geo_id and continent_exp values. If the current user can see a row with the provided values, the function fill return value 1. Otherwise, it will not return any value.

Instead of modifying every view by adding new conditions in the WHERE clause, we can create the secure wrapper views where we will apply this predicate. In the following example, I’m creating a view secure.covid where I will apply the predicate and pass the columns that will be used in the predicate to evaluate should this row be returned or not:

 

create or alter view secure.covid
as
select *
from dbo.covid
    cross apply security.geo_predicate(geo_id, continent_exp)
go

 

 

Note that in this case the view dbo.covid will not have the WHERE clause that contains the security rule. Security rules are filtering rows by applying security.geo_predicate() iTVF on the original view. For every row in the dbo.covid view. The query will provide the values of geo_id and continent_exp columns to the iTVF predicate. The predicate will remove the row from the output if these values do not satisfy criterion.

 A user jovanpop@adventureworks.com will see the filtered results based on his context:

 

Placing the security rules in a separate iTVFs and creating the secure wrapper views will make your code more maintainable.

Conclusion

Although native row-level-security is not available in serverless SQL pool, you can easily implement the similar security rules using the standard T-SQL functionalities.

You just need to ensure that a reader cannot bypass the security rules.

  • You need to ensure that the readers cannot directly query the files or collections in the underlying data source using the OPENROWSET function or the views/external tables. Make sure that you restrict access to the OPENROWSET/credentials and DENY SELECT on the base views and external tables that read original un-filtered data.
  • You need to ensure that users cannot directly access the underlying data source and bypass serverless SQL pool. You would need to make sure that underlying storage is protected from random access using private endpoints.
  • Make sure that the readers cannot use some other tool like other Synapse workspace, Apache Spark, or other application or service to directly access the data that you secured on your workspace.

This method uses the core T-SQL functionalities and will not introduce additional overhead compared to the native row-level security. If this workaround doesn’t work in your scenario, you can propose this feature in Azure feedback site.

Updated Sep 15, 2021
Version 2.0
  • KeyBee's avatar
    KeyBee
    Brass Contributor

    This is really brilliant and exactly what we need!

  • FedePi's avatar
    FedePi
    Copper Contributor

    Thank you for your guide
    I tried to follow these steps, but I get the following error when I try to connect to database (using PowerBI) :

    (I connected to PowerBI with an AD Azure user that I added as SQL user with the code you suggested)

    CREATE USER [...@....com] FROM EXTERNAL PROVIDER
     

    I then added the following permission:

    GRANT ADMINISTER DATABASE BULK OPERATIONS TO [..@...com]
    Then I got this error:

     

    Then, I tried to grant read access to this user for the specific parquet file that the view points to, using Synapse interface for data lake, and now it works correctly.

    Now, I am worried that in this way the user could bypass the security rule  as you suggest in the conclusion: "You need to ensure that the readers cannot directly query the files or collections in the underlying data source using the OPENROWSET function or the views/external tables. ", however it seems the only option that works. Do you have other options? Or, do you think that the way I did it is correct and the user could not access directly the data with only these permissions?

     

    Thank you

     

     

  • We need to use IS_MEMBER('AAD Group') to check whether user is part of any group or not inside RLS function. But seems it is not supported for AAD Groups. Is there any work around for this problem?

  • eswarc's avatar
    eswarc
    Copper Contributor

    In the example share, the blob container has public access so the RLS works. In my case, the blob container has restricted access.

    Is there a way/workaround to grant access to synapse serverless views (with RLS logic) without giving permissions to ADLS container ?

  • ThomasKwakman's avatar
    ThomasKwakman
    Copper Contributor

    Hi, great article, thank you for that!

     

    I am testing out this pattern and it happens that:

    1/ User must have permissions to use OPENROWSET

    2/ Credentials must be included in the DATA_SOURCE

    This means that a malicious (and clever) user could bypass the whole RLS mechanism by creating a SELECT statement using OPENROWSET and the underlying DATA_SOURCE.

     

    I have not found a way to use OPENROWSET in the secured view without giving OPENROWSET permission to user.

    Has any one a solution to this? What's the current status of things?