Azure Databricks activities now support Managed Identity authentication
Published Nov 23 2020 03:27 AM 58K Views
Microsoft

Azure Databricks supports Azure Active Directory (AAD) tokens (GA) to authenticate to REST API 2.0. The AAD tokens support enables us to provide a more secure authentication mechanism leveraging Azure Data Factory's System-assigned Managed Identity while integrating with Azure Databricks.

 

Benefits of using Managed identity authentication:

  • Managed identities eliminate the need for data engineers having to manage credentials by providing an identity for the Azure resource in Azure AD and using it to obtain Azure Active Directory (Azure AD) tokens. In our case, Data Factory obtains the tokens using it's Managed Identity and accesses the Databricks REST APIs.  
  • It lets you provide fine-grained access control to particular Data Factory instances using Azure AD. 
  • It helps prevent usage of Databricks Personal Access Tokens, which acts as a password and needs to be treated with care, adding additional responsibility on data engineers on securing it.

Earlier, you could access the Databricks Personal Access Token through Key-Vault using Manage Identity. Now, you can directly use Managed Identity in Databricks Linked Service, hence completely removing the usage of Personal Access Tokens. 

 

High-level steps on getting started:

  1. Grant the Data Factory instance 'Contributor' permissions in Azure Databricks Access Control.
    databricks-grant-access-to-adf-msi-1.jpg databricks-grant-access-to-adf-msi-2.jpg
  2. Create a new 'Azure Databricks' linked service in Data Factory UI, select the databricks workspace (in step 1) and select 'Managed service identity' under authentication type.
    databricks-grant-access-to-adf-msi-3.jpg

 

Note: Please toggle between the cluster types if you do not see any dropdowns being populated under 'workspace id', even after you have successfully granted the permissions (Step 1). 

 

Sample Linked Service payload:

 

{
    "name": "AzureDatabricks_ls",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "annotations": [],
        "type": "AzureDatabricks",
        "typeProperties": {
            "domain": "https://adb-***.*.azuredatabricks.net",
            "authentication": "MSI",
            "workspaceResourceId": "/subscriptions/******-3ab0-48f2-b171-0f50ec******/resourceGroups/work-rg/providers/Microsoft.Databricks/workspaces/databricks-****",
            "existingClusterId": "****-030259-dent495"
        }
    }
}

 

Note: There are no secrets or personal access tokens in the linked service definitions!

5 Comments
Microsoft

Hi, is "Contributor" role necessary on the Azure Databricks instance? Is there any other role with a lower privileges that be used to provision access to the data factory MSI?

Copper Contributor

@v-reprav 

You can also add the ADF Managed Identity directly to the Databricks workspace using the Service Principal endpoint of the SCIM API. This avoids granting the Managed Identity the Contributor role. 

Copper Contributor

FYI if you have access and it still doesn't work... workspaceResourceId property value is case sensitive!

 

Microsoft

Could you please elaborate further on this , how ?

You can also add the ADF Managed Identity directly to the Databricks workspace using the Service Principal endpoint of the SCIM API. This avoids granting the Managed Identity the Contributor role. 

Copper Contributor

This does not work when creating the linked service through ARM automation. The linked service does get created but permission from the data factory to databricks is denied.

 

It appears than when creating the linked service through the data factory UI and hitting "Test connection", a databricks service principal is created behind the scenes, associated with the data factory managed identity. If not created through the UI, no such service principal creation takes place.

 

Facepalms.

Version history
Last update:
‎Nov 23 2020 05:58 AM
Updated by: