In this article, we are going to discuss about how security model works on lake databases with Azure Synapse Analytics
Pre-requisites: To access synapse workspace and create a lake database you need some basic roles and permissions on Azure Synapse workspace and on the storage account where the data will be landing after the data is ingested.
Synapse administrator role on Synapse workspace
Storage blob data contributor on storage account linked to your workspace
As we know that in lake databases the data resides on data lake storage either on the one linked with our synapse workspace or a different one, and a serverless SQL endpoint is available to access the data within our lake database using T-SQL commands. So, when we talk about security on our database it is primarily at the data lake level. Although you can create views and other SQL objects for your lake database the permissions on them are also controlled and validated first at storage level.
Lake databases are also spark databases. When we say lake database is a spark database it means that the compute engine behind creation of our lake database is spark. It uses spark pool to create a lake database. On similar grounds you can modify or delete a lake database using spark notebooks in spark pool or using Database designer. These spark databases, rightly called lake databases, are eventually replicated to serverless SQL database. This replication happens asynchronously so there can be a bit of delay before the changes are replicated. However, when the lake database is visible on serverless SQL pool it is for read only purposes, meaning you cannot alter or delete any lake database using serverless SQL pool.
As stated above, you cannot modify or delete a lake database from serverless SQL pool. However, there are the following changes that you can perform on tables in lake databases from serverless SQL pool using T-SQL commands.
Adding, altering, and dropping views, procedures, inline table-value functions in a lake database.
Adding and removing database-scoped Azure AD users.
Add or remove Azure AD database users to the db_datareader role. Azure AD database users in the db_datareader role have permission to read all tables in the lake database but cannot read data from other databases.
NOTE: For creating custom SQL objects mentioned in point number 1 please create a schema where you will place these objects. The dbo schema is reserved for lake database and custom SQL objects cannot be placed inside it.
Lake db security works on two layers
Storage layer where data resides which you can control using RBAC and ACLs on storage account
Serverless SQL endpoint access using login for AAD security group or user in Azure AD on workspace or database level.
Lake security model
Only Azure AD users can use tables in the lake databases, and they can access the data in the lake using their own identities.
You can grant access to the underlying data used for external tables to a security principal, such as: a user, an Azure AD application with assigned service principal, or a security group. For data access, grant both of the following permissions:
Grant read (R) permission on files (such as the table's underlying data files).
Grant execute (X) permission on the folder where the files are stored and on every parent folder up to the root. You can read more about these permissions on Access Control Lists (ACLs)
For example, in https://<storage-name>.dfs.core.windows.net/<fs>/synapse/workspaces/<synapse_ws>/warehouse/mytestdb.db/myparquettable/, security principals need:
execute (X) permissions on all the folders starting at the <fs> to the myparquettable.
read (R) permissions on myparquettable and files inside that folder, to be able to read a table in a database (synchronized or original one).
If a security principal requires the ability to create objects or drop objects in a database, additional write (W) permissions are required on the folders and files in the warehouse folder.
SQL security model
The Azure Synapse workspace provides a T-SQL endpoint that enables you to query the lake database using the serverless SQL pool. In addition to the data access, SQL interface enables you to control who can access the tables. You need to enable a user to access the shared lake databases using the serverless SQL pool. There are two types of users who can access the lake databases:
Administrators: The administrator role has full control over all databases within Synapse workspace
Readers: There are two types of reader access which can be provided
Workspace readers: The workspace reader has server-level permission to read data from any database within a serverless SQL pool.
Database readers: The database reader has database-level permission to read data from a lake database.
Two types of access level
To assign administrator role to any user, assign the Synapse SQL Administrator workspace role or sysadmin server-level role inside the serverless SQL pool. This role has full control over all databases. The Synapse Administrator and Synapse SQL Administrator roles also have all permissions on all objects in a serverless SQL pool, by default.
Once these databases and tables are synchronized from Spark to serverless SQL pool, these external tables in serverless SQL pool can be used to access the same data. However, objects in serverless SQL pool are read-only because of keeping consistency with the Spark pools objects. The limitation makes only users with Synapse SQL Administrator or Synapse Administrator roles can access these objects in serverless SQL pool. If a non-admin user tries to execute a query on the synchronized database/table, they'll receive error like: External table '<table>' is not accessible because content of directory cannot be listed. despite them having access to data on the underlying storage account(s).
Creating a user or giving other permissions will fail if attempted. To read synchronized databases, one must have privileged server-level permissions (like sysadmin)
A user who needs to read data and create reports usually doesn't have full administrator access (sysadmin). This user is usually data analyst who just needs to read and analyse data using the existing tables. They don't need to create new objects.
A user with minimal permission should be able to:
Connect to a database that is replicated from Spark.
Select data via external tables and access the underlying ADLS data.
Two levels of reader roles
Database reader role on lake database
Create database users from Azure AD in your lake database and add them to db_datareader role, which will enable them to read data in the lake database.
In this example, we are adding an Azure AD user in the lake database who can read data via shared tables. The users are added in the lake database via the serverless SQL pool. Then, assign the user to the db_datareader role so they can read data.
CREATE USER [firstname.lastname@example.org] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [email@example.com];
Workspace data reader role
Grant the server-level permissions GRANT CONNECT ANY DATABASE and GRANT SELECT ALL USER SECURABLES on serverless SQL pool to a login that will enable the login to access and read any database. This might be a good choice for assigning reader/non-admin access to a user.
After executing the code script below, it will allow non-admin users to have server-level permissions to connect to any database. It will also allow users to view data from all schema-level objects, such as tables or views. Data access security can be managed on the storage layer.
These commands must be executed at master database level as they are server-level permissions.
CREATE LOGIN [firstname.lastname@example.org] FROM EXTERNAL PROVIDER;
GRANT CONNECT ANY DATABASE TO [email@example.com];
GRANT SELECT ALL USER SECURABLES TO [firstname.lastname@example.org];