Authored by: Michael Piskorski, Laura Grob, Wilson Souza, Armen Kaleshian, David Pless, Anna Hoffman
Setting the Stage
We recently worked with a customer that migrated their Windows and SQL Servers to Azure that wanted to use Azure Backup for a consistent enterprise backup experience. The SQL Servers had multiple databases of varying sizes, some that were multi-terabyte. A single Azure Backup vault was deployed using a policy that was distributed to all the SQL Servers. During the migration process, the customer observed issues with the quality of the backups and poor virtual machine performance while the backups were running. We worked through the issues by reviewing the best practices, modifying the Azure Backup configuration, and changing the virtual machine SKU. For this specific example, the customer needed to change their SKU from Standard_E8bds_v5 to Standard_E16bds_v5 to support the additional IOPS and throughput required for the backups. They used premium SSD v1 and the configuration met the IOPS and throughput requirements.
In this post, we share some of the techniques we used to identify and resolve the performance issues that were observed.
Azure Backup Vault configuration
There are three primary areas to consider when defining your Azure Backup vault strategy for SQL database workloads. They are the following: 1). private network access 2). DNS resolution and 3). limitations of the Backup Vault. In our experience, we found that most customers, especially in regulated industries, are required to disable public network access to the Azure Backup vault requiring additional configuration to be made.
There are several important considerations to be aware of when applying this restriction to the Azure Backup vault.
- When creating your private endpoint make sure you select the correct resource type.
- Azure Recovery Services Vault supports private endpoints for Azure Backup and Azure Site Recovery. Ensure you select Recovery Services vault as the Resource and AzureBackup as the Target subresource type. . The ‘Create and use private endpoints (v2 experience) for Azure Backup’ article describes the process of configuring private endpoints for use by an Azure Backup Recovery Vault.
- Ensure private endpoints for target resources are integrated into an Azure Private DNS zone.
- Databases on the virtual machine configured for protection by Azure Backup will require name resolution of the backup vault private endpoint. In addition, having a well architected DNS strategy is necessary to ensure reliability of the service. This area is often challenging for newer Azure customers. The Azure Private Endpoint DNS integration guide provides a good overview of private endpoints and how they are integrated into Azure Private DNS Zones.
- Consider Azure Backup limits
- Review the current limitations outlined in the Backup Vault Support documentation
- Using multiple Azure Recovery Services Vaults in Azure Backup can enhance data security and disaster recovery readiness, but it also increases management complexity. While Azure’s immutable vaults protect against ransomware and other threats, ensuring a clear and efficient recovery process is important.
- Balancing the number of vaults is key to aligning with recovery objectives and compliance needs without adding unnecessary administrative overhead.
- Review the Azure Backup Reliability document to ensure you select the proper configuration for your Recovery Services Vault to meet your architecture requirements.
In designing a comprehensive and consistent enterprise backup strategy, consider the types of workloads that are protected, the amount of data protected, the sensitivity of the data being backed up, and the total number of protected items that a single vault can support. The best practice is to protect similar workloads. For example, virtual machines are configured against one vault and databases against a separate vault. This allows for ease of management, isolation per least privilege, and autonomy allowing each workload owner to manage their own backups. The Backup cloud and on-premises workloads to cloud guide provides a comprehensive set of best practices and designs for a variety of multiple vault architectures.
Azure Virtual Machine sizing considerations
Given the Azure Well-Architected Framework, selecting the appropriate virtual machine size for a workload requires consideration of its business continuity requirements. Some customers may not take their backup requirements into consideration when collecting the performance metrics to properly size the VM SKU and disk configuration. This means the IOPS, disk throughput, and network throughput metrics may not reflect the backup activity in the data collected during the sizing exercise.
For example, Azure Backup for SQL Server can (by default) support running up to 20 database backups simultaneously using 200 MB/sec of disk throughput per database. If you have larger databases on an instance those iterative backup processes can take a long time to complete. You will want to consider the following items when determining how many backups can run concurrently:
- Application workloads and other business processes that may run at the same time backups are running
- Disk IOPS and throughput required by both the SQL Server application workload and the backups
- VM IOPS and throughput limits
- Network consumption rate
Ensure you consider the window available for your backups and maintenance tasks as this will help determine if you need to scale up the VM or storage or reduce the number of concurrent backups. If you decide you need to override the default backup configuration, you do have an option to configure the number of concurrent backups. To override the default setting, you will want to create an ExtensionSettingsOverrides.json file on the server located in the C:\Program Files\Azure Workload Backup\bin folder. You would configure the Default Backup Tasks Threshold parameter to 5 in this example by using the following code:
{"DefaultBackupTasksThreshold": 5}
Once you save your changes and close the file you will need to restart the following service: AzureWLBackupCoordinatorSvc
Ensure there are sufficient IOPS and throughput capacity at both the virtual machine and disk layers. You can use the metrics in the Azure portal for the virtual machine and disks to check if the maximum throughput is being reached. If your virtual machine supports less than 200 mb/sec, Azure Backup will not transfer the data at the optimum speed. There is a spreadsheet that can be leveraged to assist with sizing (link).
There are a few items to consider when you are deploying your backup policies. The policy supports all three types of SQL backups: full, differential, and transaction log backups and different recovery models. One key point to consider is the backup compression setting in the policy overrides the SQL Server instance level setting so if you want to use backup compression, ensure that you check that box in the policy. If you choose to leverage differential backups you can trigger one per day in the backup policy. Transaction log backups can be triggered as often as every 15 minutes.
Auto-protection is an option that runs a discovery process typically every 8 hours to determine if new databases have been added to the instance. When it finds newly created databases it will trigger the backup within 32 hours. You can manually run a discovery to ensure new databases are backed up sooner. If you select auto-protect you cannot exclude databases, it will cover all databases on the instance.
The option to configure writing backups to local storage and the recovery vault simultaneously can be invoked for the backup type of your choice by creating a PluginConfigSettings.json file in the C:\Program Files\Azure Workload Backup\bin\plugins location. An example of the JSON code is shown below:
{
"EnableLocalDiskBackupForBackupTypes": ["Log"],
"LocalDiskBackupFolderPath": "E:\\LocalBackup",
}
This example enables simultaneous writes of the transaction log backups to the E drive as well as the recovery services vault.
Azure Backup for SQL Server has several feature considerations and limitations which have changed over time. The most current information can be found in the 'Support matrix for SQL Server Backup in Azure VMs' and under the 'Feature considerations and limitations' section.
Troubleshooting tools and tips
In this section we will discuss some tips you can leverage if you run into issues with backups failing or running longer than expected. The two primary areas of focus will include metric data for virtual machine and disk throughput as well as how to correlate the databases to the scheduled tasks for backups.
Instrumentation
One area to consider when experiencing slow backup performance is the constraints on the virtual machine and any attached disk. To determine the cause of the slowness on both the virtual machine and disk, review the metric “VM Uncached Bandwidth Consumed Percentage”. This metric defines if your virtual machine is I/O capped. The value is calculated by dividing the total actual uncached throughput on a virtual machine by the maximum provisioned virtual machine throughput. If you observe this metric reaching 100 percent during a backup job, this will affect the performance of the backup due to the virtual machine using all the uncached bandwidth of the VM. This screenshot identifies the metric that is being referenced in this article.
To see if there are constraints on the disk level you would look at the associated disk IOPS and throughput limits using the Logical Disk MB/s and Logical Disk IOPS within the virtual machine’s Insights blade. The screenshots below show the metrics referenced.
Now let us use this example, if you are using the virtual machine SKU “Standard_E8bds_v5” the max uncached throughput is 650MBps. When you take into consideration that each database backup that is running may consume up to 200MBps of uncached throughput it is important to ensure your virtual machine and disk configuration can support the number of concurrent backups running.
Leveraging the metrics and logging will help you when troubleshooting performances issues and below we will discuss a script you can leverage to help to split database backups into multiple schedules. The key point to understand is it is particularly important to understand the SKU size of the virtual machine and disks with relation to uncached throughput limits and how many databases backups are required on your server in relation to your backup requirements / goals.
The recommended amount of uncached throughput at the virtual machine or disk is above 850 MBps, which is referenced in Azure Backup support matrix for SQL Server Backup in Azure VMs.
Correlate Database Backup to Task Script:
As we have discussed, the disk throughput capacity based on the virtual machine SKU and disk configuration is important in designing a performant and reliable backup strategy in Azure. . If you observe throughput levels that are hitting the maximum causing throttling, you can leverage the script below to determine which database backups are scheduled to run and when. Based on this information, you can then choose to either scale up the virtual machine SKU, modify the disk configuration (depending on where the throttling is occurring), or modify the number of concurrent backups as detailed in an earlier section of this post.
Sample Code:
# Script to associate database name and type of backup to GUID
$tasks = Get-ScheduledTask -TaskPath *\IaaSWorkloadBackup* | ? { $_.taskpath -notlike "*HKTask*" -and $_.taskpath -notlike "*Telemetry*" -and $_.taskpath -notlike "*WorkloadInquiry*" -and $_.state -ne "Disabled" }
$dirlist = dir "C:\Program Files\Azure Workload Backup\Catalog\WorkloadExtDatasourceCatalog\WorkloadExtDatasourceTable\*"
$tasklist = @()
foreach ($datasource in $dirlist)
{
$json = Get-Content -Path $datasource.fullname | ConvertFrom-Json
$schedules = @($tasks | ? { $_.taskpath -like ('*' + $json.datasourceId + '*')})
foreach ($task in $schedules)
{
$jobsobject = New-Object PSObject
Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name poname -Value $json.poName
Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name datasourceid -Value $json.datasourceId
Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name TaskName -Value $task.URI.Split("\")[-1]
Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name Container -Value $json.containerUniqueName
switch ($task.Triggers)
{
{$_.repetition.interval -ne $null} {Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name Type -Value "Log"; break}
{$_.DaysOfWeek -gt 1} {Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name Type -Value "Differential"; break}
DEFAULT {Add-Member -InputObject $jobsobject -MemberType NoteProperty -Name Type -Value "Full"}
}
$tasklist += $jobsobject
}
}
$tasklist | Sort-Object poname | ft -AutoSize
This sample code once executed on the virtual machine will provide an output of all the databases and correlate them to the backup task ID as shown in the below for an example output.
This output will show you the database name that is assigned to the datasourceid and the type of backup. Now you can open task scheduler on the virtual machine and relate the database to the scheduled task. You can see in the screenshot below that the “model” database is associated with datasourceid “5422562649858860379” which correlates to the schedule the database is on for log and database backups.
This allows you to see the Azure Backup schedule for your database backups and helps you plan the number of concurrent backups you can run based on the uncached throughput capacity of your virtual machine and disk configuration.
Summary
In this post, we provided details on the main considerations when designing a solution that leverages Azure Backup to protect both your Azure virtual machines and SQL Servers.
Below are the key summary points from this article:
Backup Vault Configuration: It is crucial to consider private network access, DNS resolution, and the limitations of the Backup Vault. Private endpoints must be correctly configured, and the Azure Private DNS zone integrated.
VM Sizing Considerations: Proper VM sizing should account for backup requirements, as Azure Backup for SQL Server can run multiple database backups simultaneously, affecting disk throughput and performance.
Backup Policy Deployment: Policies support full, differential, and transaction log backups. Note that backup compression settings in the policy override SQL Server instance level settings.
Troubleshooting Tools: Utilize metric data for VM and disk throughput and scripts to correlate databases to scheduled tasks for backups, ensuring performance is not hindered by reaching uncached bandwidth limits.
Links to reference materials and scripts to troubleshoot common issues have been provided to facilitate the use of Azure Backup to protect your SQL server workloads. Any scripts, metrics, or limitations may change over time as our products continue to evolve.
Helpful references:
FAQ - Backing up SQL Server databases on Azure VMs - Azure Backup | Microsoft Learn
Back up SQL Server databases to Azure - Azure Backup | Microsoft Learn
Azure Backup support matrix for SQL Server Backup in Azure VMs - Azure Backup | Microsoft Learn
Azure Backup support matrix - Azure Backup | Microsoft Learn
Back up SQL Server databases to Azure - Azure Backup | Microsoft Learn
Restore SQL Server databases on an Azure VM - Azure Backup | Microsoft Learn
Automation in Azure Backup support matrix - Azure Backup | Microsoft Learn