Background
When a Synapse notebook accesses Azure storage account it uses an AAD identity for authentication.
How the notebook is run controls with AAD identity is used:
- If a user is interactively running the notebook, then the user's AAD identity is used. We often call this "AAD passthrough" because it "passes the user's AAD identity through to Azure Storage"
- If the notebook is run through the pipeline, the workspace MSI is used.
This blog will show you how force the notebook to always use the workspace MSI.
Audience
This is for beginners with some knowledge of the workspace configuration using linked services.
STEP 1: Ensure the workspace MSI must have the permissions to access the data in the storage account.
The easiest way of doing this is to assign the workspace to the Storage Blob Data Contributor role on the storage account.
STEP 2: Configuring the storage account firewall (if needed)
If you have enabled the firewall on the storage account, you need to follow these instructions: Configure Azure Storage firewalls and virtual networks | Microsoft Docs
Here is an example with firewall enabled on the storage account:
When you grant access to trusted Azure services inside of the storage networking, you will grant the following types of access:
- Trusted access for select operations to resources that are registered in your subscription.
- Trusted access to resources based on system-assigned managed identity.
Additional information on this topic can be found in this document: Connect to a secure storage account from your Azure Synapse workspace – Azure Synapse Analytics | Microsoft Docs
Step 3: Configuring the Linked Service
Open Synapse Studio and configure the Linked Service to use the workspace MSI:
STEP 4: Test the configuration and see if it is successful
Click Test connection to verify that you have configured everything correctly.
STEP 5: Update the notebook code to use the Linked Service configuration
val linked_service_name = “LinkedServerName”
// replace with your linked service name
%%spark
// Allow SPARK to access from Blob remotely
val sc = spark.sparkContext
spark.conf.set(“spark.storage.synapse.linkedServiceName”, linked_service_name)
spark.conf.set(“fs.azure.account.oauth.provider.type”, “com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider”)
//replace the container and storage account names
val df = “abfss://Container@StorageAccount.dfs.core.windows.net/”
print(“Remote blob path: ” + df)
mssparkutils.fs.ls(df)
Additional Resources
Learn more about how the Synapse workspaces performs authentication and uses managed identities by reading these documents:
- Secure credentials with linked services using the TokenLibrary
- Managed identities for Azure resource authentication
- ADLS Gen2 storage with linked services
- Introduction to Microsoft Spark utilities – Azure Synapse Analytics | Microsoft Docs
That is it!
Liliam UK Engineer
Updated May 08, 2021
Version 6.0Liliam_C_Leme
Microsoft
Joined May 04, 2020
Azure Synapse Analytics Blog
Follow this blog board to get notified when there's new activity