TDE High availability with customer-managed key for Azure SQL Databases
Published Oct 16 2022 03:53 AM 4,117 Views
Microsoft

When using CMK to protect data at rest, customers are responsible for and in a full control of a key lifecycle management (key creation, upload, rotation, deletion), the key used for encryption of the Database Encryption Key (DEK), called TDE protector, is a customer-managed asymmetric key stored in a customer-owned and customer-managed Azure Key Vault (AKV).

 

If the server loses access to the stored Database Encryption Key (DEK) in AKV, in up to 10 minutes a database will start denying all connections with the corresponding error message and change its state to Inaccessible. The only action allowed on a database in the Inaccessible state is deleting it. For more information see Inaccessible TDE protector.

 

Losing access to Key vault for TDE, and when TDE protector become Inaccessible:

There is a portal experience to re-validate key permission and to trigger a workflow to make the database available again. We don’t have an SLA published, however it depends on how large the database is to bring it back online. Its approximately from few min to hours. Within the 30 minutes timeline, we keep the database around by disabling external connection to the database for customer to notice that database lost key access and fix the issue immediately. After 30 minutes we want to move the database to stable state to avoid any issues in the system and to keep the database inaccessible since key is gone/revoked.

 

When a database become inaccessible for more than 30 minutes, the database service tier can play an important role here, for Standard/General Purpose service tiers, the recovery will be much faster than Premium/Business Critical service tiers that’s because the Standard/GP is using a remote storage and will use the attach/detach process retrieve data; while Premium/Business Critical service tiers will trigger a restore process on the backend and the restoration operation has an RTO up to 12 hours (Most database restores finish in less than 12 hours) depending on many factors that can affect the recovery time such as the size of the database, the compute size of the database, number of transaction logs involved, network bandwidth. More information here: Recovery time.

 

To ensure high availability it's highly recommended to configure the server to use two different key vaults in two different regions with the same key material.

 

AKV1.jpg

PowerShell example:

 

# add the key from Key Vault to the server

Add-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName> -KeyId <KeyVaultKeyId>

 

# Confirm server's keys added

Get-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName>

 

Note: The key in the secondary key vault in the other region shouldn't be marked as TDE protector, and it's not even allowed.

 

AKV2.jpg

 

  • If there's an outage affecting the primary key vault, and only then, the system will automatically switch to the other linked key with the same thumbprint in the secondary key vault, if it exists. Note though that switch won't happen if TDE protector is inaccessible because of revoked access rights, or because key or key vault is deleted, as it may indicate that customer intentionally wanted to restrict server from accessing the key.

 

Thank you!

1 Comment
Co-Authors
Version history
Last update:
‎Nov 25 2023 11:13 PM
Updated by: