Credential safety is crucial for any enterprise. With that in mind, the Azure Data Factory (ADF) team is committed to making the data engineering process secure yet simple for data engineers.
We are excited to announce the support for user-assigned managed identity (Preview) in all connectors/ linked services that support Azure Active Directory (Azure AD) based authentication.
A quick recap on Managed Identities, Service Principal, User vs Service accounts:
Typically, for running operationalized workflows/ data pipelines, you are suggested to use service accounts for authentication rather than user accounts to easily manage production workloads and ensure those workloads do not depend on a single data engineer's credentials. Since user account 'credentials' can change over time and cause data pipeline failures in production, the recommendation is to use Service Principals/ Managed Identities. Service Principals are analogous to service accounts.
Challenges with using Service account/ Service Principal:
- Leaked/ stolen credentials
- Expired credentials
- Require auto-rotation for compliance
- Lifecycle management of service accounts and its credentials are not easy, causing security risk if not cleaned up and need to be manually deleted after use.
Solution: Managed identities for Azure resources
You can build password-less data pipelines while using Azure AD authentication. It also means that data engineers do not need data store credentials/ superuser credentials; hence privileged credential abuse can be easily mitigated.
Managed identities for Azure resources provides Azure Data Factory with an automatically managed identity in Azure Active Directory. You can use this identity to authenticate any service that supports Azure AD authentication (Azure Storage, Synapse Analytics, etc.) without having credentials referenced in your data pipelines (linked service definitions).
There are two types of managed identities:
- System-assigned - ADF already supports system-assigned managed identity since its inception. When you create an ADF instance, an identity is created in Azure AD that is tied to the lifecycle of that ADF instance. For more details, refer to the doc.
- User-assigned - We are adding support for user-assigned managed identity. You can create a user-assigned managed identity and assign it to one or more instances of an ADF. In the case of user-assigned managed identities, the identity is managed separately from the resources used.
When to use system-assigned vs user-assigned managed identity?
Let's understand the scope of the different managed identities -
|
System-assigned |
User-assigned |
Lifecycle |
Tied to the particular ADF instance |
Independent of ADF instance |
Reuse |
Since it's per ADF instance, it cannot be shared across resources |
It can be shared with multiple ADF instances. |
Management |
Service created |
Customer created |
- You have to grant permissions to each system-assigned managed identity that you have in the respective data stores. At times, this can be overwhelming if you have over many (say 100+) ADF instances. Also, if access needs to be revoked in case of a security breach/ incident, it needs to be done for all the identities. User-assigned managed identity helps here since you can decouple the identity from the ADF instance, which eases the management by not requiring multiple-permission granting.
- If you do not want to bother creating a new Azure AD identity/ user-assigned managed identity manually and manage it, then use system-assigned.
What if my datastore does not support AAD-based authentication/ Managed identities?
Not to worry! For data stores that do not support AAD-based authentication/ Managed identities, you can store those credentials in Azure Key Vault. ADF can reference those credentials during the pipeline run as and when needed using the respective system-assigned managed identity or user-assigned managed identity.
Get Started with user-assigned managed identity in ADF:
- Associate an existing user-assigned managed identity with the ADF instance.
- It can be done through Azure Portal --> ADF instance --> Managed identities --> Add user-assigned managed identity.
You can also associate the identity from step 2 as well. - Create new credential with type 'user-assigned'. ADF UI --> Manage hub --> Credentials --> New.
- Create linked service and choose user-assigned managed identity under authentication type, and select the credential item.
Reference:
- Managed identities in data factory
- Credentials and user-assigned managed identity in data factory
- User-assigned managed identity in Azure Storage linked service (example).