Blog Post

Azure Data Factory Blog
2 MIN READ

Azure Databricks activities now support Managed Identity authentication

Abhishek Narain's avatar
Nov 23, 2020

Azure Databricks supports Azure Active Directory (AAD) tokens (GA) to authenticate to REST API 2.0. The AAD tokens support enables us to provide a more secure authentication mechanism leveraging Azure Data Factory's System-assigned Managed Identity while integrating with Azure Databricks.

 

Benefits of using Managed identity authentication:

  • Managed identities eliminate the need for data engineers having to manage credentials by providing an identity for the Azure resource in Azure AD and using it to obtain Azure Active Directory (Azure AD) tokens. In our case, Data Factory obtains the tokens using it's Managed Identity and accesses the Databricks REST APIs.  
  • It lets you provide fine-grained access control to particular Data Factory instances using Azure AD. 
  • It helps prevent usage of Databricks Personal Access Tokens, which acts as a password and needs to be treated with care, adding additional responsibility on data engineers on securing it.

Earlier, you could access the Databricks Personal Access Token through Key-Vault using Manage Identity. Now, you can directly use Managed Identity in Databricks Linked Service, hence completely removing the usage of Personal Access Tokens. 

 

High-level steps on getting started:

  1. Grant the Data Factory instance 'Contributor' permissions in Azure Databricks Access Control.
     
  2. Create a new 'Azure Databricks' linked service in Data Factory UI, select the databricks workspace (in step 1) and select 'Managed service identity' under authentication type.

 

Note: Please toggle between the cluster types if you do not see any dropdowns being populated under 'workspace id', even after you have successfully granted the permissions (Step 1). 

 

Sample Linked Service payload:

 

{
    "name": "AzureDatabricks_ls",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "annotations": [],
        "type": "AzureDatabricks",
        "typeProperties": {
            "domain": "https://adb-***.*.azuredatabricks.net",
            "authentication": "MSI",
            "workspaceResourceId": "/subscriptions/******-3ab0-48f2-b171-0f50ec******/resourceGroups/work-rg/providers/Microsoft.Databricks/workspaces/databricks-****",
            "existingClusterId": "****-030259-dent495"
        }
    }
}

 

Note: There are no secrets or personal access tokens in the linked service definitions!

Updated Nov 23, 2020
Version 6.0

5 Comments

  • dtheodor's avatar
    dtheodor
    Copper Contributor

    This does not work when creating the linked service through ARM automation. The linked service does get created but permission from the data factory to databricks is denied.

     

    It appears than when creating the linked service through the data factory UI and hitting "Test connection", a databricks service principal is created behind the scenes, associated with the data factory managed identity. If not created through the UI, no such service principal creation takes place.

     

    Facepalms.

  • Could you please elaborate further on this , how ?

    You can also add the ADF Managed Identity directly to the Databricks workspace using the Service Principal endpoint of the SCIM API. This avoids granting the Managed Identity the Contributor role. 

  • WiJaN's avatar
    WiJaN
    Copper Contributor

    FYI if you have access and it still doesn't work... workspaceResourceId property value is case sensitive!

     

  • AnnaDatabricks's avatar
    AnnaDatabricks
    Copper Contributor

    v-reprav 

    You can also add the ADF Managed Identity directly to the Databricks workspace using the Service Principal endpoint of the https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/scim-sp This avoids granting the Managed Identity the Contributor role. 

  • v-reprav's avatar
    v-reprav
    Copper Contributor

    Hi, is "Contributor" role necessary on the Azure Databricks instance? Is there any other role with a lower privileges that be used to provision access to the data factory MSI?