Exploring data using Synapse Serverless secured by Azure Data Lake directory based SAS Token
Published Jun 01 2021 08:23 AM 5,438 Views
Microsoft

 

Synapse Serverless SQL Pool is a serverless query engine platform that allows you to run SQL queries on files or folders placed in Azure storage without duplicating or physically storing the data. 

There are broadly three ways to access ADLS using synapse serverless.

  • Azure Active Directory
  • Managed Identity
  • SAS Token

The user would need to be assigned to one of the RBAC role :  Azure storage blob data owner\contributor\reader role.  However, there might be scenario that you would or could not provide access to the ADLS account or container and provide access to granular level  directories and folder  levels and not complete storage container or blob. 

 

Scenario

You have a data lake that contains employee and social feed data. You have data residing in an employee folder that is used by HR team members and twitter for live social feeds that is usually used by marketing folks. If you use SAS token or RBAC, you cannot control to the folder level.

How do you allow users to perform data exploration using synapse serverless with fine grain control on underlying storage.

 

demo1.png

Fig1. A storage account with container demo contains two folder employee and twitter.

 

Solution

 

To solve this challenge, you can use directory scoped SAS token along with database scope credentials in synapse serverless.

 Directory scoped SAS provides constrained access to a single directory when using ADLS Gen2.  This can be used to provide access to a directory and the files it contains. Previously a SAS could be used to provide access to a filesystem or a file but not a directory.  This added flexibility allows more granular and easier access privilege assignment.

Directory scoped shared access signatures (SAS) generally available | Azure updates | Microsoft Azur...

 

 

Storage Account

Container

Folder

File

AAD

YES (RBAC on Account)

YES (RBAC on Container)

YES (via POSIX ACLs)

YES (via POSIX ACLs)

Managed Identity (same as AAD)

YES (RBAC on Account)

YES (RBAC on Container)

YES (via POSIX ACLs)

YES (via POSIX ACLs)

SAS Token

YES (Scope - Account) 

YES(Scope - Container) 

YES (Scope - Directory and Files) 

YES (Scope - Files) 

 

For 

How to create Directory based SAS token

 

You can do via SDK or portal. To create a SAS token via portal.

a. Navigate to the folder that you would like to provide access and right click on the folder and select generate SAS token.

demo2.png

Fig 2 : Directory scope selection for employee folder  

 

b. Select permissions Read, list and execute to read and load all the files in the folder. Provide the expiration date and click generate SAS token and URL. Copy blob SAS token.

demo3.png

   Fig 3 :  Generate SAS token.

 

The step b. can be similar to create storage SAS token . Earlier, it used to apply to storage account, now you can reduce the surface area to directory and files as well.

 

 

Use Serverless with  directory SAS token

Once the storage account access has been configured using SAS token, the next to access the data using synapse serverless engine.

 

On Azure synapse Studio, go to develop and SQL Script.

 

a. Create a master key,  if it is not there.

-- create master key that will protect the credentials:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = <enter very strong password here>

 

b. Create a database scope credential using the sas token. You would like to access HR data. So use the blog storage sas token generated for Employee directory.

 

CREATE DATABASE SCOPED CREDENTIAL mysastokenemployee

 WITH IDENTITY = 'SHARED ACCESS SIGNATURE',

 SECRET = '<blob sas token>'

 

c. Create external data source till the container path demo1 and use credential mysastokenemployee

 

CREATE EXTERNAL DATA SOURCE myemployee

WITH (    LOCATION   = 'https:// <storageaccountname>.dfs.core.windows.net/<filesystemname>',

          CREDENTIAL = mysastokenemployee

)

 

d. Once the data is created, lets read the data using OPENROWSET BULK in serverless

 

 

SELECT * FROM OPENROWSET(

   BULK  '/employee/*.csv',

   DATA_SOURCE = 'myemployee',

   FORMAT ='CSV',

   parser_version = '2.0',

   HEADER_ROW =  TRUE

   ) AS Data

 

 

demo4.png

 

Fig 4 : Openrowset bulk output

 

e. Now to confirm whether the scope of the SAS token is only restricted to employee folder, lets use the same data source and database credential to access file in twitter folder.

 

 

SELECT * FROM OPENROWSET(

   BULK  '/twitter/StarterKitTerms.csv',

   DATA_SOURCE = 'myemployee',

   FORMAT ='CSV',

   parser_version = '2.0',

   HEADER_ROW =  TRUE

   ) AS Data

 

demo5.png

Fig 5: Bulk openrowset access failure

 

f. You will encounter an error because the scope of the SAS token was restricted to employee folder.

Now, to access twitter folder for the marketing representative, create a database scoped credential using a sas token for twitter folder.  Repeat the steps “How to create Directory based SAS token” for twitter folder.

 

CREATE DATABASE SCOPED CREDENTIAL mysastokentwitter

 WITH IDENTITY = 'SHARED ACCESS SIGNATURE',

  SECRET = '<blob sas token>'

 

g. Create an external data source using the scope credential created for twitter directory.

CREATE EXTERNAL DATA SOURCE mytwitter

WITH (    LOCATION   = 'https://<storageaccountname>.dfs.core.windows.net/<filesystemname>/',

          CREDENTIAL = mysastokentwitter

)

 

h. Once the data source is created , you can query the twitter data using newly created data source.

 

SELECT * FROM OPENROWSET(

   BULK  '/twitter/StarterKitTerms.csv',

   DATA_SOURCE = 'mytwitter',

   FORMAT ='CSV',

   parser_version = '2.0',

   HEADER_ROW =  TRUE

   ) AS Data

 

demo6.png

Fig 6 : Twitter data accessed using the directory sas token

 

Summary

 

  1. In a central data lake environment or any file store , directory sas token is a great way of reducing the access surface area without providing access at storage root or account level.
  2. Separation of duties and roles can be easily achieved as data access is controlled at  storage level  and synapse serverless
  3. Create sas token with read, list and execute to minimize the impact of accidental deletion etc. sharing the sas token should be done in a secured manner.
  4. Expire sas token, regenerate new token and recreate the scope credentials frequently.
  5. Serverless is great way of data exploration without spinning any additional SQL resources. You would be charged based on data processed by each query.
  6. Managing too many sas tokens will be challenge. So, use a hybrid approach of breaking the large data lake to smaller pools or mesh and grant RBAC access control and blend with SAS token for regulated users is best way of scaling the serverless capability.
Co-Authors
Version history
Last update:
‎Sep 15 2021 02:02 PM
Updated by: