Azure Synapse Analytics is analytics service that enables you to implement solutions that enable you users to access data in Azure storage and define permission models that define what user can access some data. Azure Active directory is recommended model for accessing data and defining permission rules on your data. In addition to Azure AD permission model, you can define additional security policies that protect your data even in some cases where Azure AD permission model cannot be used. In this article you will see how to setup fine-grained security policy for SQL users that can access some parts of storage using workspace identity or SAS key. This is must-have setup for scenarios where SQL principals access data or serverless Synapse SQL pool access storage using Managed Identity or SAS token.
Synapse SQL permission model
Synapse SQL runtime in Azure Synapse Analytics workspace enables you to define access rights and permissions to read data in two security layers:
SQL permission layer where you can use standard SQL permission model with users, roles, and permissions defines in SQL runtime.
ACL rules in Azure storage layer where you can define access rules by assigning storage roles to some AAD users.
Permissions of the user who would like to read data from storage files are checked in both layers:
In order to pass SQL security check, user would need to have some SELECT, GRANT REFERENCES CREDENTIAL, ADMINISTER BULK OPERATIONS SQL permission to initiate call to ADLS storage layers.
Then request will be checked on storage layer by verifying validity of SAS token that is used to access files, or ACL permissions of Azure AD identity that is accessing storage.
You have flexibility to choose would you configure security check only in SQL, storage or on both places depending on your use case.
When you must use SQL permission model?
If you are using Azure Active directory passthrough authentication, you can define granular access rules in Azure storage layer and specify which users could access some files and folders by assigning Azure roles such as Storage Blob Reader or Storage Blob Owner. In this case you can rely only on ADLS storage security checks.
However, there are some cases when you will not use AAD passthrough:
When the data placed in ADLS storage and accessed using Shared Access Signature
When the data placed in ADLS storage and accessed using workspace Managed Identity (common cases for this scenario is when your storage is protected using firewall)
When the data placed in Cosmos DB analytical is accessed using Cosmos DB read-only keys.
When you want to prevent AAD users to have any direct access to storage so they can query files only via SQL interface. In this scenario you will enable Synapse SQL to access storage via Managed Identity or SAS token.
Applications and tool are using SQL principals to access storage using username/password instead of AAD logins. SQL principals may use either SAS token or Managed Identity of workspace to access storage.
If you are using some of these authentication methods, your Synapse SQL runtime has access to any file/folder/container placed in storage layer. If you have different user roles who are accessing data, you need to ensure that some users have access only to some subset of folders. Since you don’t have fine-grained ACL permissions on storage, you need to do the following steps to define permissions in SQL runtime:
Create separate users or roles for the group of users who can access some subset of data on storage.
Create external tables that represent proxies to your data sets on storage. Every external table should reference one set of files on storage.
Grant users REFERENCES permission to credentials that should be used to access storage.
DENY ADMINISTER BULK OPERATION permission to prevent users to directly access any file in storage via OPENROWSET and referenced credential.
GRANT SELECT permission only on external tables that some user groups can access.
Let’s see how to apply this security model in the scenario where two user roles can access only some subfolders in storage.
We have ADLS storage with three data sets – Product, RetailSales, and StoreDemographics placed in different folders on the same ADLS storage account. Synapse SQL access storage using Managed Identity that has full access to all folders in storage.
We have two roles in this scenario:
Sales Managers who can read data about products and retail sales, and
Store Managers who can access data about products and store demographics.
We need to ensure that these roles can access only subsets of data, although Synapse SQL has full access. Therefore, we need to define access rights on SQL layer that will protect access to the resources.
Create users that will access Synapse SQL
In this step we will create two logins that will enable sales managers and store managers to access Synapse SQL:
CREATE LOGIN StoreManager WITH PASSWORD = '100reM4n4G3r!@#$';
CREATE USER StoreManager FROM LOGIN StoreManager;
CREATE LOGIN SalesManager WITH PASSWORD = 'Sa<M4n4G3r!@#$';
CREATE USER SalesManager FROM LOGIN SalesManager;
Now we have two username/password pairs that can access Synapse SQL, but they still cannot access storage.
Create credentials that will be used to access storage
We need some database scoped credential that Synapse SQL runtime will use to the ADLS access storage. Let’s imagine that we are enabling Synapse SQL to access private storage protected with firewall using Managed Identity of the workspace:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Y...0'
CREATE DATABASE SCOPED CREDENTIAL WorkspaceIdentity WITH IDENTITY = 'Managed Identity'
GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::WorkspaceIdentity TO StoreManager;
GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::WorkspaceIdentity TO SalesManager;
Once we create DATABASE SCOPED CREDENTIAL, we need to enable users to reference that credential so they can access storage.
Prevent users to explore any data
Users who have REFERENCES permission on some storage might use OPENROWSET function to access any file on that storage. Therefore, we need to ensure that they cannot use this function by explicitly denying ADMINISTER BULK OPERATIONS in master database and ADMINISTER DATABASE BULK OPERATIONS permissions in data warehouse:
--> USE master
DENY ADMINISTER BULK OPERATIONS TO StoreManager;
DENY ADMINISTER BULK OPERATIONS TO SalesManager;
--> USE RetailStore
DENY ADMINISTER DATABASE BULK OPERATIONS TO StoreManager;
DENY ADMINISTER DATABASE BULK OPERATIONS TO SalesManager;
Create external tables that reference folders on storage
Since we have three datasets placed in three folders, we need to create three external tables that will access storage using some credential:
CREATE EXTERNAL DATA SOURCE [Data] WITH
( LOCATION = N'https://....dfs.core.windows.net/data', CREDENTIAL = WorkspaceIdentity )
CREATE SCHEMA store
CREATE EXTERNAL TABLE store.Product (...)
WITH (DATA_SOURCE = Data, LOCATION = N'Product/',FILE_FORMAT = ParquetSnappy)
CREATE EXTERNAL TABLE store.[RetailSales] (...)
WITH (DATA_SOURCE = Data, LOCATION = N'RetailSales/',FILE_FORMAT = ParquetSnappy)
CREATE EXTERNAL TABLE [store].[StoreDemographics] (...)
WITH (DATA_SOURCE = Data, LOCATION = N'StoreDemographics/',...)
Any user that can select data from these tables can read the content of underlying files in ADLA storage.
Enable users to access their data sets
Finally, we need to implement required security settings and allow store managers and sales managers to access only their data sets via proxy external tables:
GRANT SELECT ON OBJECT::store.Product TO StoreManager;
GRANT SELECT ON OBJECT::store.StoreDemographics TO StoreManager;
GRANT SELECT ON OBJECT::store.Product TO SalesManager;
GRANT SELECT ON OBJECT::store.RetailSales TO SalesManager;
Now if we try to select data as Store Managers, we will get the results:
However, if these users try to access store.RetailSales they will get error:
The similar results will get sales manager when trying to access the tables.
Serverless Synapse SQL runtime enables to define fine-grained permissions and control what resources your users can access. Even if you provide full storage access to Synapse SQL runtime, you are still not loosing ability to define fine-grained permission to your users using SQL runtime permission model.