When a Synapse notebook accesses Azure storage account it uses an AAD identity for authentication.
How the notebook is run controls with AAD identity is used:
- If a user is interactively running the notebook, then the user's AAD identity is used. We often call this "AAD passthrough" because it "passes the user's AAD identity through to Azure Storage"
- If the notebook is run through the pipeline, the workspace MSI is used.
This blog will show you how force the notebook to always use the workspace MSI.
This is for beginners with some knowledge of the workspace configuration using linked services.
STEP 1: Ensure the workspace MSI must have the permissions to access the data in the storage account.
The easiest way of doing this is to assign the workspace to the Storage Blob Data Contributor role on the storage account.
STEP 2: Configuring the storage account firewall (if needed)
If you have enabled the firewall on the storage account, you need to follow these instructions: Configure Azure Storage firewalls and virtual networks | Microsoft Docs
Here is an example with firewall enabled on the storage account:
When you grant access to trusted Azure services inside of the storage networking, you will grant the following types of access:
- Trusted access for select operations to resources that are registered in your subscription.
- Trusted access to resources based on system-assigned managed identity.
Additional information on this topic can be found in this document: Connect to a secure storage account from your Azure Synapse workspace – Azure Synapse Analytics | Mi...
Step 3: Configuring the Linked Service
Open Synapse Studio and configure the Linked Service to use the workspace MSI:
STEP 4: Test the configuration and see if it is successful
Click Test connection to verify that you have configured everything correctly.
STEP 5: Update the notebook code to use the Linked Service configuration
val linked_service_name = “LinkedServerName”
// replace with your linked service name
// Allow SPARK to access from Blob remotely
val sc = spark.sparkContext
//replace the container and storage account names
val df = “abfss://Container@StorageAccount.dfs.core.windows.net/”
print(“Remote blob path: ” + df)
Learn more about how the Synapse workspaces performs authentication and uses managed identities by reading these documents:
That is it!
Liliam UK Engineer