Microservices brings a lot of opportunities to the table compared to old monoliths. Unfortunately, there are things that was/is easier with those big blocks of code that becomes more complex when breaking it into smaller pieces. We're not going to tackle all of those today, but rather we'll look closer at a very specific piece of the identity puzzle.
When you're using Azure Kubernetes Service (AKS) there's different parts that involve identities. You need permissions to manage the cluster. You need to login to the apps running in the cluster. You need cross-app or cross-pod exchange of data. And you need the cluster or things running inside it to interact with the greater Azure infrastructure.
What if you deploy an app that needs to query the MS Graph for various attributes. Or what if you need to create a database in an Azure SQL instance? Yes, you can create client ids and secrets and write the necessary code. Which is bound to be inadvertently checked into source code or some other mishap. So, you want this shiny new passwordless concept that you've heard about. For a while Microsoft has provided a component called "pod identity" for use with AKS. Unfortunately there were some lower level problems with the implementation of this that halted this in a preview state. It's even been deprecated now, and the new approach is called "workload identity":
https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview
The overview covers what it is and the high level details of how it works; the short version is that we "connect" the service accounts within Kubernetes with Azure AD identities. This bridge is based on Kubernetes being an OpenID Connect issuer and Azure AD trusting the tokens through federation.
The official docs have you covered if you jump a little back and forth so the intent of this post is to create an end-to-end lab.
Now, if you've been following along from the sidelines for a while you might be thinking "didn't they launch something early last year?". Well, sort of. There's "workload identity federation":
https://learn.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation
This is the overarching federation ability that enables things like GitHub Actions to deploy ARM/Bicep/Terraform code without using passwords and secrets, and workload identity for AKS builds on top of this.
There was also a version for AKS that involved using a Service Principal on top of this:
This newest iteration gets rid of the Service Principal and greatly simplifies the enablement process on the cluster. (The whole application object vs service principal explanation/discussion is out of scope for this post.)
Alrighty, let's get cracking with this and step through things incrementally.
Btw: don't spend time copying from this post and pasting in - pull down the necessary files from here instead: https://github.com/ahelland/Bicep-Landing-Zones/tree/main/aks-workload-identity
I made a Polyglot Notebook so as long as you're using Visual Studio Code and have the extension (https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode) installed you can just press play on the individual code snippets to run them. (Pre-req being you have Azure cli, PowerShell, etc. installed.)
The scripts also create a virtual network and a container registry. Not related to workload identity as such, but necessary for a complete setup. For completion there's also a sample app that plugs into the MS Graph with a workload identity.
Create an AKS cluster and a user managed identity
You will of course need a cluster to test this out. By default workload identities are not enabled so you will need to add the necessary parameters for that. (If you're using Azure cli that would be "--enable-oidc-issuer --enable-workload-identity".)
When using Bicep you need a few lines of definitions:
…
//Used for workload identity
oidcIssuerProfile: {
enabled: true
}
securityProfile: {
workloadIdentity: {
enabled: true
}
}
…
The workload identities are created as user managed identities:
resource aks_infra_identity 'Microsoft.ManagedIdentity/userAssignedIdentities@2018-11-30' = {
name: 'aks-infra-identity'
location: location
}
This is the part where you need to do your initial planning around the usage. The managed identity created here is the object that grants permissions into other areas of Azure, and you might not want just one super-privileged identity that can be accessed by everyone. For this lab I have configured two - one called "aks-infra-identity" to be used by the cluster components, and one called "aks-app-identity" to be used by the workload running on the cluster.
So, execute the following lines to create a cluster, registry and vnet:
# Create an Azure AD AKS Admin Group
#$adminGroupId=(az ad group create --display-name aks-admins --mail-nickname aks-admins --query objectId)
# Or get the id of an existing Azure AD AKS Admin Group
$adminGroupId=(az ad group show -g aks-admins --query id -o tsv)
# Deploy AKS, ACR & vnet
az deployment sub create --location norwayeast --name 1 --template-file .\main.bicep --parameters .\azuredeploy.parameters.json adminGroupId=$adminGroupId env=$ENVIRONMENT
# Get credentials
az aks get-credentials --resource-group $RG_AKS --name $CLUSTER_NAME --overwrite-existing
## Integrate ACR and AKS
$acrName=(az acr list -g "rg-$ENVIRONMENT-aks-acr" -o tsv --query [0].name)
az aks update -n $CLUSTER_NAME -g $RG_AKS --attach-acr $acrName
Installing cluster components
A naked cluster doesn't provide much value by itself, and doesn't really demo the concepts here either, so we need to perform some extra steps to make it all work.
First thing is to establish a federated credential which is basically linking service accounts in Kubernetes to the managed identity (for infra):
# Get OIDCUrl (issuer of tokens for the federated credential)
$oidcUrl=(az aks show --resource-group $RG_AKS --name $CLUSTER_NAME --query "oidcIssuerProfile.issuerUrl" -o tsv)
#Prep service-account.yaml
$appId=(az identity show --resource-group $RG_AKS --name aks-infra-identity --query clientId -o tsv)
$serviceAccount = (Get-Content (".\service-account.yaml")) | % {$_.replace('${USER_ASSIGNED_CLIENT_ID}',$appId)} | Out-String
# Install Service Account
$serviceAccount | kubectl create -f -
# Establish Federated Credential
az identity federated-credential create --name aksFederatedIdentity --identity-name aks-infra-identity --resource-group $RG_AKS `
--issuer $oidcUrl --subject system:serviceaccount:azure-workload-identity-system:workload-identity-sa
Pay attention to the "subject" and the format of it. It references a service account and its corresponding namespace. Service accounts are namespaced, so basically each namespace requires a federation like this. If you fail to do this the logs of the pod attempting to use the account (which it cannot find) you will get errors referring to the subject name.
We need ingress to get the traffic into the cluster. (Well, technically a loadbalancer also works for the purpose, but for this config we'll use ingress resources.) We'll use nginx for this purpose:
# Add the ingress-nginx repository
./helm.exe repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
./helm.exe repo update
# Use Helm to deploy an NGINX ingress controller
./helm.exe install nginx-ingress ingress-nginx/ingress-nginx --create-namespace --namespace nginx `
--set controller.replicaCount=2 --set controller.nodeSelector."kubernetes\.io/os"=linux `
--set defaultBackend.nodeSelector."kubernetes\.io/os"=linux --set controller.admissionWebhooks.patch.nodeSelector."kubernetes\.io/os"=linux
# Add default ingress class (to make available across namespaces)
kubectl apply -f .\IngressClass.yaml
# Build & Deploy default/dummy backend to registry
cd .\nginx-default-backend
az acr build --registry $acrName --image nginx-default-backend:latest .
cd -
# Deploy nginx-default-backend
$backend = (Get-Content (".\nginx-default-backend.yaml")) | % {$_.replace('${ACR_NAME}',$acrName)} | Out-String
$backend | kubectl create -f -
You don't have to automate creation of DNS records, but it sure makes things simpler when deploying new frontends. You can host DNS where you like and create the records through other means, but for this lab we include configuration of ExternalDNS using Azure DNS for this purpose. (The zone needs to be pre-created.)
# The kubelet needs permissions to the DNS zone
$PRINCIPAL_ID=$(az aks show -g $RG_AKS --name $CLUSTER_NAME --query "identityProfile.kubeletidentity.objectId" --output tsv)
$DNS_ID=$(az network dns zone show --name $DNS_ZONE -g $RG_DNS --query "id" --output tsv)
az role assignment create --role "DNS Zone Contributor" --assignee $PRINCIPAL_ID --scope $DNS_ID
# ExternalDNS.yaml requires a JSON pointing to usage of the managed identity of the nodepool
$tenantId=(az account show --query tenantId -o tsv)
$subscriptionId=(az account show --query id -o tsv)
$json = @{tenantId=$tenantId; subscriptionId=$subscriptionId; resourceGroup=$AZURE_DNS_ZONE_RESOURCE_GROUP; useManagedIdentityExtension=$true} | ConvertTo-Json
$byteArray = [System.Text.Encoding]::UTF8.GetBytes($json)
$base64 = [System.Convert]::ToBase64String($byteArray)
# Install and apply config
$extdns = (Get-Content (".\ExternalDNS.yaml")) | % {$_.replace('${az-conf-json}',$base64)} `
| % {$_.replace('${DOMAIN}',$DNS_ZONE)} | % {$_.replace('${DNS_RG}',$DNS_RG)} | Out-String
$extdns | kubectl create -f -
You will notice that we need to assign permissions for the DNS zone to get it working. However, ExternalDNS does not support workload identity yet. Instead it is able to leverage the system assigned managed identity of the node pool so we configure permissions accordingly.
DNS names instead of browsing to IP is user-friendly, but modern browsers get cranky when you're not using TLS/SSL so we'd better install something for that purpose as well:
# The managed identity needs DNS permissions
$USER_ASSIGNED_CLIENT_ID=(az identity show --resource-group $RG_AKS --name aks-infra-identity --query clientId -o tsv)
az role assignment create --role "DNS Zone Contributor" --assignee $USER_ASSIGNED_CLIENT_ID --scope $DNS_ID
$oidcUrl=(az aks show --resource-group $RG_AKS --name $CLUSTER_NAME --query "oidcIssuerProfile.issuerUrl" -o tsv)
az identity federated-credential create --name aksFederatedIdentity --identity-name aks-infra-identity --resource-group $RG_AKS --issuer $oidcUrl `
--subject system:serviceaccount:cert-manager:cert-manager
# Installing CertManager
# Add the Jetstack Helm repository
./helm.exe repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
./helm.exe repo update
# Install the cert-manager Helm chart
./helm.exe install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set installCRDs=true -f .\CertManagerWI.yaml
# Apply configuration
$certman = (Get-Content (".\CertManager.yaml")) | % {$_.replace('${USER_ASSIGNED_CLIENT_ID}',$USER_ASSIGNED_CLIENT_ID)} `
| % {$_.replace('${DOMAIN}',$DNS_ZONE)} | % {$_.replace('${DNS_RG}',$RG_DNS)} `
| % {$_.replace('${SUB_ID}',$SUB_ID)} | % {$_.replace('${ACME_EMAIL}',$ACME_EMAIL)} | Out-String
$certman | kubectl apply -f -
CertManager also requires access to the DNS zone to create and delete records for using the ACME protocol. It does not piggy back on ExternalDNS and works independently. CertManager already supports workload identities, so we assign the permissions to the aks-infra-identity managed identity.
This highlights what I already stated with regards to having multiple managed identities. Do you want the apps to have direct access to the DNS zone or only indirectly?
In addition to assigning the permissions this requires us to create a new federated credential. CertManager is installed in its own namespace so it needs it own service account that needs to be added to the managed identity. (Currently you can add 20 service accounts to each managed identity.)
It has of course been possible to achieve these things before either with the system managed identity (like for ExternalDNS) or injecting the clientId and clientSecret of a service principal for CertManager. As you saw there are more things required than just turning a switch so it's not like it's an automagic thing, and even if details improve upon the road to GA some assembly will probably be required. The bigger win currently is not involving secrets in any way. Which is a good thing of course.
Deploying a sample workload
The previous steps take care of the core infra. We like that, but that by itself is not the whole picture. You have workloads running in your cluster as well and that's the next part we need in our lab.
A generic web app will usually have identity in two ends:
- User signing in interactively on a front-end.
- The back-end accessing databases, generating events, and so forth.
The workload identity is intended for the back-end so that's what we will demo. Yes, there are on-behalf flows where the token from the user is passed along and added to API calls on the backend. Those are out of scope for now. We have a couple of lines of script to build a sample app and deploy to our cluster:
$USER_ASSIGNED_CLIENT_ID=(az identity show --resource-group $RG_AKS --name aks-app-identity --query clientId -o tsv)
# Build & Deploy frontend
cd .\workload-identity-app-dotnet7\workload-identity-frontend-dotnet7
az acr build --registry $acrName --image wi-front:latest .
cd -
# Build & Deploy backend
cd .\workload-identity-app-dotnet7\workload-identity-backend-dotnet7
az acr build --registry $acrName --image wi-back:latest .
cd -
# Create for workload service account
$oidcUrl=(az aks show --resource-group $RG_AKS --name $CLUSTER_NAME --query "oidcIssuerProfile.issuerUrl" -o tsv)
az identity federated-credential create --name aksFederatedIdentity --identity-name aks-app-identity --resource-group $RG_AKS --issuer $oidcUrl --subject system:serviceaccount:workload:workload-identity-sa
# Permissions for aks-app-identiy
$principalId=$(az identity show --resource-group $RG_AKS --name aks-app-identity --query principalId -o tsv)
# Application Id for the MS Graph is always 00000003-0000-0000-c000-000000000000
# Note: $filter must be urlencoded as %24filter
$graphObject=(az rest --method GET --url "https://graph.microsoft.com/v1.0/servicePrincipals?%24filter=appId eq '00000003-0000-0000-c000-000000000000'" | ConvertFrom-Json).value.id
#Hard-wired value for User.Read.All
$APP_ROLE_ID="df021288-bdef-4463-88db-98f22de89214"
# az rest is picky about the JSON payload so remove whitespace and escape the quotation marks
$json = (@{principalId=$principalId; resourceId=$graphObject; appRoleId=$APP_ROLE_ID } | ConvertTo-Json -Compress).Replace('"', '\"')
echo $json
az rest --method POST --url "https://graph.microsoft.com/v1.0/servicePrincipals/${principalId}/appRoleAssignedTo" --headers "Content-Type=application/json" --body $json
# Deploy workload-identity-app
# Prep file with correct values first
$wiApp = (Get-Content (".\workload-identity-app-dotnet7\workload-identity-app.yaml")) | % {$_.replace('${USER_ASSIGNED_CLIENT_ID}',$USER_ASSIGNED_CLIENT_ID)}`
| % {$_.replace('${ACR_NAME}',$acrName)} | % {$_.replace('${DOMAIN}',$DNS_ZONE)} | Out-String
$wiApp | kubectl apply -f -
The C# code is pushed to Azure Container Registry and bundled into Docker images. You can modify and play with it, but it should work also out of the box.
We create a separate service account for the namespace of the app and a new federated credential, but we attach it to the aks-app-identity to make sure it is not able to interfere with things like creating DNS records. (Notice how we can create the federation before the service account has been created.)
If you deal with registering applications in the Azure Portal for use in web apps you will have noticed it's fairly easy to assign permissions to the MS Graph by browsing the API list. Without going into details of how Azure AD objects are organized this works because behind the scenes there's both an application object and a service principal where the purpose of the latter is permissions. (For a single-tenant app this seems redundant, but multi-tenant SaaS apps needs to have separate objects for the application itself, and the permissions it has been granted in different tenants.) The user-managed identity doesn't have a service principal that you can attach permissions to. Which means there are some extra steps when we want to assign permissions to the identity used for apps.
In our sample app we query the Graph for properties of the app; more specifically the tenant name. And this requires the User.Read.All application permission to the Graph. For a walkthrough of the details check this blog:
https://gotoguy.blog/2022/03/15/add-graph-application-permissions-to-managed-identity-using-graph-explorer/
But it's not like applications all by itself understands that you're using this new fancy Kubernetes trickery is it? No, it requires a little nudge along the way. The details are language specific, but the general concept is that workload identity injects a JWT into the file system of the container and the app picks up on this and attaches it to authentication requests.
For C# you override the token acquisition process with a couple of overrides:
// <directives>
using Azure.Core;
using Microsoft.Identity.Client;
// <directives>
public class MyClientAssertionCredential : TokenCredential
{
private readonly IConfidentialClientApplication _confidentialClientApp;
public MyClientAssertionCredential()
{
// <authentication>
// Azure AD Workload Identity webhook will inject the following env vars
// AZURE_CLIENT_ID with the clientID set in the service account annotation
// AZURE_TENANT_ID with the tenantID set in the service account annotation. If not defined, then
// the tenantID provided via azure-wi-webhook-config for the webhook will be used.
// AZURE_FEDERATED_TOKEN_FILE is the service account token path
var clientID = Environment.GetEnvironmentVariable("AZURE_CLIENT_ID");
var tokenPath = Environment.GetEnvironmentVariable("AZURE_FEDERATED_TOKEN_FILE");
var tenantID = Environment.GetEnvironmentVariable("AZURE_TENANT_ID");
_confidentialClientApp = ConfidentialClientApplicationBuilder.Create(clientID)
.WithClientAssertion(ReadJWTFromFS(tokenPath))
.WithTenantId(tenantID).Build();
}
public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken)
{
return GetTokenAsync(requestContext, cancellationToken).GetAwaiter().GetResult();
}
public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
{
AuthenticationResult result = null;
try
{
result = await _confidentialClientApp
.AcquireTokenForClient(requestContext.Scopes)
.ExecuteAsync();
}
catch (MsalUiRequiredException ex)
{
// The application doesn't have sufficient permissions.
// - Did you declare enough app permissions during app creation?
// - Did the tenant admin grant permissions to the application?
}
catch (MsalServiceException ex) when (ex.Message.Contains("AADSTS70011"))
{
// Invalid scope. The scope has to be in the form "https://resourceurl/.default"
// Mitigation: Change the scope to be as expected.
}
return new AccessToken(result.AccessToken, result.ExpiresOn);
}
public string ReadJWTFromFS(string tokenPath)
{
string text = System.IO.File.ReadAllText(tokenPath);
return text;
}
}
To the rest of the app this is no different than tokens acquired by other means so once you've gotten to this point it should be smooth sailing. Because there's never any other identity-related challenges 😉
For this demo I skipped "proper" DevOps by creating a GitHub Action or Azure DevOps Pipeline, but it is of course possible to adapt as a fully automated process.