Azure Synapse Analytics Workspace enables you to read the files from Azure Data Lake storage using OPENROWSET(BULK <file url>). In this article you will see how to grant minimal permission to the users who need to analyze files with OPENROWSET(BULK) function.
When you try Azure Synapse workspace, you usually start with full permissions (for example as database owner). Once you need to let other users access the files on Data Lake, you probably think that is the minimal permission that you need to assign to user to let them just do ad-hoc analysis without any other permissions.
In this post you will learn how to configure minimal permission to needed to analyze the files on Azure storage.
SQL endpoint in Synapse workspace uses standard T-SQL syntax to create principals who can access your data. The following statement creates a new login with username testprincipal and password VeryStrongAndSecurePassword1234!!!:
create login testprincipal with password = 'VeryStrongAndSecurePassword1234!!!';
Using this T-SQL statement you can create new principals that are assigned to the public role.
Let’s see what we can do with this principal. If we login using this username/password we can try to run the following queries:
select name from sys.databases -- success use SampleDB --Msg 916, Level 14, State 2, Line 1 --The server principal "testprincipal" is not able to access the database "SampleDB" under the current security context. select top 10 * from openrowset(bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet', format='parquet') as a -- Success
By default, a new user can see all databases, execute OPENROWSET to query files on Azure Data Lake storage, but cannot access other databases or create objects. This can be verified using the following function:
New login is in the public role and has the permissions to view any database and to run ad-hoc query using OPENROWSET, but no other permissions.
This version of OPENROWSET enables principals to public access storage or in case of Azure AD principal to access files on the storage where storage admin granted Storage Blob Data Reader RBAC role to Azure AD user who access. If you need more fine grained impersonation mechanism, you should create data sources and protect them with credentials in databases. The following section will describe how to enable principal access objects in databases.
If you need to allow the principals to access some non-public DataLake storage, or storage that cannot be accessed using Azure AD passthrough, you can create server-level credentials that matches storage URL used in OPENROWSET:
CREATE CREDENTIAL [https://pandemicdatalake.blob.core.windows.net] WITH IDENTITY='SHARED ACCESS SIGNATURE', SECRET = 'sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-04-18T20:42:12Z&st=2019-04-18T12:42:12Z&spr=https&sig=lQHczNvrk1KoYLCpFdSsMANd0ef9BrIPBNJ3VYEIq78%3D';
In this case, SQL principal need to have GRANT REFERENCES permission for this credential in order to use it to access storage using Shared access signature defined in the credential:
grant references on credential::[https://pandemicdatalake.blob.core.windows.net] to testprincipal;
Find more information in Synapse analytics documentation.
Let’s enable user to access database. In the context of some database such as SampleDB execute the following statement:
create user testprincipal for login testprincipal;
This statement created a database user that will access current database using the login defined in the previous script. This principal can now connect to SampleDB database (NOTE: use this setup script to initialize the objects in this database) , can still use OPENROWSET with absolute URL to read files from storage, and it can also use OPENROWSET that references some public data source:
SELECT TOP 10 * FROM OPENROWSET( BULK 'puYear=*/puMonth=*/*.parquet', -- YellowTaxi is a data source that references a public location (without credential) DATA_SOURCE = 'YellowTaxi', FORMAT='PARQUET' ) AS nyc
In this case, data source doesn’t have credential and references public location:
CREATE EXTERNAL DATA SOURCE YellowTaxi WITH ( LOCATION = 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/')
However, this principal cannot read some files using a pre-defined data source that is protected with credential:
CREATE EXTERNAL DATA SOURCE SqlOnDemandDemo WITH ( LOCATION = 'https://sqlondemandstorage.blob.core.windows.net', CREDENTIAL = sqlondemand );
If some principals try to read the files from this storage, they will get the following error:
The issue here is that data source uses credential to access storage, and the current user cannot reference this credential.
In order to access the files via credential-protected data source, the principal needs to have references permission on the underlying database scoped credential (in this case sqlondemand credential) that is used in data source:
grant references on database scoped credential::sqlondemand to testprincipal;
Note that the user who uses OPENROWSET with data source still need to have ADMINISTER BULK OPERATIONS permission. If you deny this permission, the principal cannot use OPENROWSET anymore:
-- execute from master database context: deny ADMINISTER BULK OPERATIONS to testprincipal;
In the following section you will see how to use this role to enable principals to do ad-hoc analysis.
In Synapse SQL endpoint you have special permissions that enable principals to read files from storage using OPENROWSET:
ADMINISTER BULK OPERATIONS permission is automatically assigned to public role, so any new principal will be able to run OPENROWSET from any database on the endpoint.
In the previous example with revoking ADMINISTER BULK OPERATIONS you might notice that I used DENY and not REVOKE. You need to explicitly DENY permissions and you cannot just REVOKE this permission like in this example:
revoke ADMINISTER BULK OPERATIONS to testprincipal;
If you try to revoke this permission, testprincipal will still be able to execute OPENROWSET (you can confirm this using fn_my_permissions(NULL, 'SERVER') function)
This might confuse you, but the reason is that ADMINISTER BULK OPERATIONS is not initially granted to testprincipal. testprincipal belongs to the public role, and in synapse SQL endpoint, public role has ADMINISTER BULK OPERATIONS permission by default. If you want to revoke this permission, you would need to revoke it from the public role using this statement:
revoke ADMINISTER BULK OPERATIONS to public;
If you will have just the users who will analyze files in ad-hoc manner, you can leave this permission to public role. Otherwise it might be good to revoke it and create separate data analyst role that will have this permission.
If you want to have more granular control, you can create database users in databases and explicitly define ADMINISTER BULK OPERATIONS only in some databases.
grant ADMINISTER DATABASE BULK OPERATIONS to testprincipal;
This fine grained permission will enable you to ensure that principal will ad-hoc analysis of external storage only through the databases that you defined. This might be very important if you start creating external tables on storage and you don't want to have a user without permission to read data from external table to bypass this restriction using global ADMINISTER BULK OPERATIONS permission.
Find more information about the security model in Synapse SQL in documentation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.