A Comprehensive Guide to Azure Managed HSM for Regulated Industries
Published Apr 02 2024 05:03 AM 1,298 Views
Microsoft

Azure Managed HSM is an FIPS 140 Level 3 validated device, using it customers gain exclusive control over all operations within their key store solution. This ensures that cloud providers have absolutely no access to customer keys in any manner, effectively mitigating the risk of insider threats. Moreover, by retaining control of the keys, users significantly reduce their dependency on the cloud provider, enhancing their autonomy and security posture. This shift in control paradigm also addresses concerns related to the Cloud Provider. relying solely on the provider's security practices places significant trust in their infrastructure.



Azure Key management Solution Portfolio

Azure Key Vault Standard Azure Key Vault Standard  Azure Key Vault Premium Az Managed HSM
Tenancy Multitenant Multitenant Single tenant
Compliance FIPS 140-2 Level 1 FIPS 140-2 Level 2 FIPS 140-2 Level 3
High availability Automatic Automatic Automatic
Use cases Encryption at rest Encryption at rest Encryption at rest
Key controls Customer Customer Customer
Root of trust control Microsoft Microsoft Customer
  1. Ownership: The security domain cryptographically ties each managed HSM to root of trust keys under your sole control. Microsoft cannot access your cryptographic key material.
  2. Cryptographic Boundary: Sets the boundary for key material within a managed HSM instance.
  3. Disaster Recovery: Allows full recovery of a managed HSM instance in disaster scenarios (e.g., catastrophic failure, soft deletion, or archiving projects).

Activating HSM:

  • Create 3 different RSA Keys & download security domain configuration of HSM.
  • Upload RSA keys & along with Security domain Config Json file as mentioned in command below.

 

 

 

 

az keyvault security-domain download --hsm-name ContosoMHSM --sd-wrapping-keys ./certs/cert_0.cer ./certs/cert_1.cer ./certs/cert_2.cer --sd-quorum 2 --security-domain-file ContosoMHSM-SD.json

 

 

 

 


Access to a managed HSM is controlled through two interfaces:

  • Management plane: On the management plane, you manage the HSM itself. Operations in this plane include creating and deleting managed HSMs and retrieving managed HSM properties.

  • Data plane: On the data plane, you work with the data that's stored in a managed HSM. which is basically the keys generated on HSM or imported to HSM from different key manager.

There are two level of permission required to work with HSM.

Azure RBAC: All management plane operation on HSM, Operations in this plane include Create/Delete, Backup/restore, Networking, Manage security domain.

Local RBAC: Role assignment at this is either scope at All keys or Single key.
Perform all sort of operation i.e. Create/Retrieve/Delete on Keys are granted using Local RBAC.

In Networking section for HSM, options are either 'Allow All network' or 'Private Endpoints with allow trusted services.'

As general rule of thumb always prefers private connection over public.

  • All Networks: Exposed over public endpoint, accessible over internet by default.

  • Private Endpoint: It only exposes HSM on your specific VNet, Resources in that VNet only will have access to HSM Data Plane.
    Create private endpoint in HSM steps are similar to private endpoints for any Az resources. specify VNet and Subnet for PE where an application/service has line of sight to HSM PE.

  • One thing to note with private endpoint is that it restricts access to HSM data plane from Az ARM interface. you would need Azure VM in Same VNet or its Peered VNet to be able Access/manage HSM keys from Az ARM interface i.e. Portal/CLI

    Error observed accessing HSM via portal once PE is enabled for HSM.

    OsamaSheikh_0-1711708897646.jpeg

     

Operational Excellence: Encryption with managed HSM


  • Based on requirement use HSM keys to encrypt data at rest on Azure service such as Blob Storage, PostgreSQL, MySQL etc.

  • Let's take an example of encryption existing blob storage with CMK on HSM, we already have HSM configured with Private endpoint & Required RBAC "Managed HSM Crypto User."

  • Noticed once we enable PE on HSM, we can't access data plane on HSM, that means all operation on keys would be restricted from portal/cli including local RBAC management.

    OsamaSheikh_1-1711708897307.jpeg

     

  • When trying to encrypt storage account using CMK option and after HSM is selected, noticed error related to connection with HSM data plane on Storage account blade.

    OsamaSheikh_2-1711708897312.jpeg

     

    Note: Allow Microsoft trusted service is also enabled along with PE on HSM

  • I have tested couple of more service i.e. MySQL/PGSQL with similar error that mean its common with all Azure data services that supports encryption with CMK on HSM

  • Primary reason for this error, when we enable Private network on HSM. it lockdown network access on HSM and only enable access to HSM via Private Network i.e. VNet connected Device to maximize security.

  • Even though option for "Allow Microsoft trusted services" is enabled, setting up encryption on Blob storage failed because request to HSM API's goes via end user browser not via Storage service IP range.

  • When we access HSM interface from Azure portal, user's browser interacts with managed HSM API. Even when configuring encryption for other services via portal user's browser used as client to interact with ARM Api for HSM

    Here is error Screenshot of setting encryption for new Blob Storage

    OsamaSheikh_3-1711708897350.jpeg

     

  • Now if admin want to perform operations on Private managed HSM, would need an Azure VM which has line of sight connectivity towards HSM private endpoint.
    In this screenshot users are able to configure encryption on Blob storage via Azure portal on Az VM*

OsamaSheikh_4-1711708897347.png


Resiliency: Disaster Recovery with HSM


  • HSM offers multi region feature which allows data from primary instance replicate to Secondary instance.

  • Once HSM Replica is enabled for secondary region, its function as Active-Passive behind backend Traffic manager endpoint.

  • However, replica instance isn't visible to users in its subscription rather function in backend as extension to primary instance.

  • Failover of HSM is managed by Azure in case of outage of its service for primary instance.

  • Since secondary instance not visible on portal/cli interfaces many data related services such as Postgres, MySQL requires keystore/HSM to be available local on DR region.
    Recommendation deploy another HSM instance rather than replica.



Disaster Recovery: Backup Restore on Managed HSM


Pre-Requisites

  • Create User managed Identity and Link it with Source & Destination Managed HSM
  • Create Blob storage account with two containers which will be used as target to save HSM backup Data.
  • Assigned Following RBAC permission on HSM and Blob storage on Managed Identity
    "Storage Blob Contributor" & "Managed HSM Backup/Restore user"


Easiest way is to setup another HSM instance in DR region to perform complete Backup & Restore

Note: you would need security domain of primary HSM while restoring backup

  1. Create manage identity and Assign RBAC 'Storage Blob Contributor' on SA.

  2. Associate managed identity on your primary HSM to enable backup write permission on SA (Storage account).

     az keyvault update-hsm --hsm-name primary-hsm --mi-user-assigned "/subscriptions/subid/resourcegroups/rgname/providers/Microsoft.ManagedIdentity/userAssignedIdentities/manageidentityname"
  3. Once MI is associated to HSM, create container on SA for your HSM backups & triggers backup using Az CLI (backup/restore options are not on Az Portal)

    az keyvault backup start --use-managed-identity true --hsm-name primary-hsm --storage-account-name hsmbackupsaname --blob-container-name conatiner1  --subscription Subs-guid
  4. Look for success message when received response from backup command, Now create another vanilla HSM instance in DR region, but don't Activate it

    OsamaSheikh_0-1711713221808.png

     

  5. Normally after creating HSM instance, we initialize and download the new HSM's Security Domain as mentioned at start. However, since we're executing DR procedure, we will enable Security Recovery mode on this HSM.

    az keyvault security-domain init-recovery --hsm-name secondry-hsm --sd-exchange-key hsmrecoveryfilename
  6. Collect/download security domain of primary HSM along with its 2/3 Keys based on Quorum configurations.

  7. Before we triggers restore make sure secondary HSM has access to Blob storage where backup is stored

  8. Now Initiate restore of backup on secondary HSM, noticed that I have received an error which means before restoration of any backup on HSM, backup of target HSM should be triggered within 30 minutes.

  9. Once backup is completed, we reinitiated restoration of secondary HSM & it completed without error.


     

 

 

 

 

#!/bin/bash

# Variables
sourceHSMName=""
targetHSMName=""
storageAccountName=""
containerName=""
targetcontainerName=""
resourceGroupName=""
managedIdentityResourceId="

# Ensure Azure CLI is using Managed Identity for authentication if required
az login --identity

# Step 1: Link Managed Identity to Source HSM
#az keyvault update-hsm --name $sourceHSMName --mi-user-assigned $managedIdentityResourceId

# Step 2: Start Backup Process
backupOperationId=$(az keyvault backup start --use-managed-identity true --hsm-name $sourceHSMName --storage-account-name $storageAccountName --blob-container-name $containerName --query "id" -o tsv)

# Assuming a wait time. Adjust this based on your expected backup duration.
# For example, wait for 30 minutes
echo "Waiting for the backup to complete. This will wait for 5 minutes."
sleep 300


# List blobs in the container and find the most recent backup folder
# Assuming AZ CLI is authenticated and has access to the storage account
connectionString=$(az storage account show-connection-string --name $storageAccountName --resource-group $resourceGroupName --query connectionString -o tsv)
blobFolders=$(az storage blob list --container-name  $containerName --connection-string $connectionString --query "[].name" -o tsv | awk -F'/' '{print $1}' | sort -u)

# Extract and sort backup folders to find the most recent one
# Select the last folder in the list as the most recent
latestBackupFolder=$(echo "$blobFolders" | tail -n 1)



#before restoration backup of target HSM, an backup of target HSM should be performed less than 30 minutes before restoration

# step 3 :Bakcup of Target HSM, used an different container in same blob for target HSM backup
az keyvault backup start --use-managed-identity true --hsm-name $targetHSMName --storage-account-name $storageAccountName --blob-container-name $targetsourcecontainerName --query "id" -o tsv

echo "Backup of target HSM has started, please wait few minutes , before restoration will start"

if [[ -n $latestBackupFolder ]]; then
    echo "Latest backup folder: $latestBackupFolder"
    
    # Step 4: Perform Restore on Target HSM using the dynamically determined folder name
    az keyvault restore start --use-managed-identity true --hsm-name $targetHSMName --storage-account-name $storageAccountName --blob-container-name $containerName --backup-folder $latestBackupFolder 
echo "backup restoration has started"

else
    echo "No backup folders found."
fi

 

 

 

 

I have tried to cover important areas around Implementation of HSM which faced during discussion with FSI customers. Feel free to share your thoughts or question in comments & for more details around HSM solution, refer to official documentation.

 

Co-Authors
Version history
Last update:
‎Apr 03 2024 11:00 PM
Updated by: