Blog Post

Azure Data Factory Blog
3 MIN READ

Support for user-assigned managed identity in Azure Data Factory

Abhishek Narain's avatar
Oct 13, 2021

Credential safety is crucial for any enterprise. With that in mind, the Azure Data Factory (ADF) team is committed to making the data engineering process secure yet simple for data engineers.

 

We are excited to announce the support for user-assigned managed identity (Preview) in all connectors/ linked services that support Azure Active Directory (Azure AD) based authentication.

 

 

A quick recap on Managed Identities, Service Principal, User vs Service accounts: 

Typically, for running operationalized workflows/ data pipelines, you are suggested to use service accounts for authentication rather than user accounts to easily manage production workloads and ensure those workloads do not depend on a single data engineer's credentials. Since user account 'credentials' can change over time and cause data pipeline failures in production, the recommendation is to use Service Principals/ Managed Identities. Service Principals are analogous to service accounts.

 

Challenges with using Service account/ Service Principal:

  • Leaked/ stolen credentials
  • Expired credentials
  • Require auto-rotation for compliance
  • Lifecycle management of service accounts and its credentials are not easy, causing security risk if not cleaned up and need to be manually deleted after use.

  

Solution: Managed identities for Azure resources

You can build password-less data pipelines while using Azure AD authentication. It also means that data engineers do not need data store credentials/ superuser credentials; hence privileged credential abuse can be easily mitigated.

 

Managed identities for Azure resources provides Azure Data Factory with an automatically managed identity in Azure Active Directory. You can use this identity to authenticate any service that supports Azure AD authentication (Azure Storage, Synapse Analytics, etc.) without having credentials referenced in your data pipelines (linked service definitions).

 

There are two types of managed identities:

  1. System-assigned - ADF already supports system-assigned managed identity since its inception. When you create an ADF instance, an identity is created in Azure AD that is tied to the lifecycle of that ADF instance. For more details, refer to the doc.
  2. User-assigned - We are adding support for user-assigned managed identity. You can create a user-assigned managed identity and assign it to one or more instances of an ADF. In the case of user-assigned managed identities, the identity is managed separately from the resources used.

 

When to use system-assigned vs user-assigned managed identity?

Let's understand the scope of the different managed identities - 

 

System-assigned

User-assigned

Lifecycle

Tied to the particular ADF instance

Independent of ADF instance

Reuse

Since it's per ADF instance, it cannot be shared across resources

It can be shared with multiple ADF instances.

Management

Service created

Customer created

 

  • You have to grant permissions to each system-assigned managed identity that you have in the respective data stores. At times, this can be overwhelming if you have over many (say 100+) ADF instances. Also, if access needs to be revoked in case of a security breach/ incident, it needs to be done for all the identities. User-assigned managed identity helps here since you can decouple the identity from the ADF instance, which eases the management by not requiring multiple-permission granting.

 

  • If you do not want to bother creating a new Azure AD identity/ user-assigned managed identity manually and manage it, then use system-assigned. 

 

What if my datastore does not support AAD-based authentication/ Managed identities?

Not to worry! For data stores that do not support AAD-based authentication/ Managed identities, you can store those credentials in Azure Key Vault. ADF can reference those credentials during the pipeline run as and when needed using the respective system-assigned managed identity or user-assigned managed identity.

 

Get Started with user-assigned managed identity in ADF:

  1. Associate an existing user-assigned managed identity with the ADF instance.
    • It can be done through Azure Portal --> ADF instance --> Managed identities --> Add user-assigned managed identity.

       


      You can also associate the identity from step 2 as well.

  2. Create new credential with type 'user-assigned'. ADF UI --> Manage hub --> Credentials --> New.

     

  3. Create linked service and choose user-assigned managed identity under authentication type, and select the credential item.

     

Reference: 

 

 

 

 

Updated Oct 13, 2021
Version 3.0
  • James Cheng's avatar
    James Cheng
    Copper Contributor

    Hi Abhishek

    I created user-assigned managed identity through azure portal. And, grant this account with proper permission to ADLS Gen2.

     

    In case of firewall disabled, the Azure data Factory can access this storage without issue using user-assigned managed identity.

    In case of firewall enabled, the Azure Data Factory cannot access this ADLS Gen2 storage using user-assigned managed identity.
    And, I have “Allow Azure services on the trusted services list to access this storage account” checked.

     

    My main question is “Does the user-assigned managed identity work the same way as the system assigned managed identity for ADLS Gen2 firewall enabled point of view?”

     

    James

  • Abhishek Narain : can we expect the add dynamic content feature to the credential creation process? i dont see this option enabled currently. This should be useful during devops override values for environment specific credentials.

  • James Cheng's avatar
    James Cheng
    Copper Contributor

    eddieescopinc The user-assigned managed identity cannot access firewall enabled storage as the same way as the system-assigned managed identity. I got the confirmation from Microsoft support.

     

    James

  • BillWeisberg's avatar
    BillWeisberg
    Copper Contributor

    How do you create a data factory Credential using bicep instead of the UI?

  • gautamksr's avatar
    gautamksr
    Copper Contributor

    In my case, User Managed Identity not able access the blob storage account with private end point and with default IR. But its works for system managed indenty.

  • harlowy's avatar
    harlowy
    Copper Contributor

    Is it possible to associate the user assigned managed identity with data factory programmatically using REST API?

     

    Currently, I'm struggling to find any documentation on how to associate/add the managed identity to the azure data factory in the User Assigned tab under in Managed Identity settings.