Scenario:
Azure Databricks offers many of the same features as the open-source Databricks platform, such as a web-based workspace for managing Spark clusters, notebooks, and data pipelines, along with Spark-based analytics and machine learning tools. It is fully integrated with Azure cloud services, providing native access to Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, and other Azure services. This blog shows example of mounting Azure Blob Storage or Azure Data Lake Storage in the Databricks File System (DBFS), with two authentication methods for mount: Access Key and SAS token.
Objective:
To become acquainted with Databricks storage mount with ABFS/WASB driver and various authentication methods.
Pre-requisites:
For this example, you would need:
Steps to mount storage container on Databricks File System (DBFS):
[STEP 1]: Create storage container and blobs
Below is the storage structure used in this example. I have created a container “aaa”, a virtual folder “bbb”, in which has 5 PNG files. The storage “charlesdatabricksadlsno” is a blob storage with no hierarchical namespace.
[STEP 2]: Mount with dbutils.fs.mount()
We can use below code snippet to mount container "aaa" with Azure Databricks.
storageAccountName = "charlesdatabricksadlsno"
storageAccountAccessKey = <access-key>
sasToken = <sas-token>
blobContainerName = "aaa"
mountPoint = "/mnt/data/"
if not any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()):
try:
dbutils.fs.mount(
source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName),
mount_point = mountPoint,
#extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey}
extra_configs = {'fs.azure.sas.' + blobContainerName + '.' + storageAccountName + '.blob.core.windows.net': sasToken}
)
print("mount succeeded!")
except Exception as e:
print("mount exception", e)
Some keypoints to note:
To get the Access Key, you would go to Azure portal/Access Keys and copy either key1 or key2.
To get a SAS token, you can generate in two ways:
[STEP 3]: Verify mount point (/mnt/data) with dbutils.fs.mounts()
dbutils.fs.mounts()
[STEP 4]: List the contents with dbutils.fs.ls()
dbutils.fs.ls("/mnt/data/bbb")
[STEP 5]: Unmount with dbutils.fs.unmount()
dbutils.fs.unmount('/mnt/data')
Others:
abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/
wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/
storageAccountName = "charlesdatabricksadlsno"
storageAccountAccessKey = <access-key>
sasToken = <sas-token>
blobContainerName = "aaa"
mountPoint = "/mnt/data/"
if any(mount.mountPoint == mountPoint for mount in dbutils.fs.mounts()):
dbutils.fs.unmount(mountPoint)
try:
dbutils.fs.mount(
source = "wasbs://{}@{}.blob.core.windows.net".format(blobContainerName, storageAccountName),
mount_point = mountPoint,
#extra_configs = {'fs.azure.account.key.' + storageAccountName + '.blob.core.windows.net': storageAccountAccessKey}
extra_configs = {'fs.azure.sas.' + blobContainerName + '.' + storageAccountName + '.blob.core.windows.net': sasToken}
)
print("mount succeeded!")
except Exception as e:
print("mount exception", e)
References:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.