This article explores when to choose Zone Redundancy (ZR), compares it to Failover Groups, discusses key considerations for ZR, and outlines strategies to address its challenges.
High availability is a critical requirement for modern cloud applications. Azure SQL Managed Instance (SQL MI) offers Zone Redundancy (ZR) for additional protection against a certain class of failures such as datacenter and Availability Zone (AZ) level outages. While these outages are very rare, they can have a significant impact on your business.
However, ZR is not always the best option depending on your specific business needs and constraints. This article explores when to choose ZR, compares it to Failover Groups (FOG), discusses key considerations for ZR, and outlines strategies to address its challenges.
Important concepts (Terminology)
Availability
As a service provider, it is our core responsibility to ensure the availability of our service. Azure SQL MI offer availability as a built-in feature, backed by a robust Service Level Agreements (SLA) of 99.99%. Automated backups provide protection from data corruption or accidental deletion.
In the PaaS database market, the industry standard definition for High Availability within a region has evolved to enabling Zone Redundancy for the database.
AZs are separate groups of datacenters within a region. Each AZ has independent power, cooling, and networking infrastructure, so that if one zone experiences an outage, then regional services, capacity, and high availability are supported by the remaining zones.
ZR (also known as Multi-AZ) is an HA feature in SQL MI that provides resilience against failures in a specific availability zone within an Azure region.
To achieve redundancy across regions, customers enable DR capabilities to quickly recover the instance from a catastrophic regional failure. Options for disaster recovery with Azure SQL Managed Instance are Failover groups and Geo-restore.
A failover group allows all user databases within a managed instance to fail over as a unit to another Azure region in case the primary managed instance becomes unavailable due to a primary region outage. Failover groups are designed to simplify deployment and management of geo-replicated databases at scale.
Recovery Time Objective (RTO)
The time required for an application to fully recover after an availability incident is known as the RTO.
Recovery Point Objective (RPO)
RPO is defined as the maximum amount of data – as measured by time – that can be lost after a recovery from an availability incident before data loss will exceed what is acceptable to an organization.
Locally Redundant (Default) Configuration
If a managed instance is not configured with Zone Redundancy (ZR), it will be deployed with a locally redundant configuration. This configuration provides built-in availability with a 99.99% uptime Service Level Agreement (SLA).
Locally redundant availability ensures that your compute nodes and data are stored within a single datacenter in the region, providing protection against localized failures such as minor network disruptions or power outages. However, in the event of a large-scale disaster affecting the whole region, all replicas of a storage account or data on the compute nodes may be lost or rendered unrecoverable.
When to Choose a Zone Redundant Configuration
ZR configuration enhances resilience by distributing replicas of your SQL MI across multiple Availability Zones the same region. This setup provides protection against datacenter-level failures, ensuring minimal downtime and no data loss. ZR is particularly beneficial when:
- Applications require high availability with low latency, as all replicas are maintained within the same Azure region.
- Protection is needed against failures impacting individual datacenters without extending to larger geographic disruptions.
- Industry or regulatory compliance mandates the use of a ZR configuration, i.e. for applications with stringent SLA that require a 99.995% uptime guarantee.
Zone Redundancy vs. Failover Groups: Which One to Choose?
Both ZR and FOG provide high availability but serve different purposes. The key differences are presented in the table below.
Configuration |
Locally Redundant |
Zone Redundant |
FOG |
Scope |
Protection against user and application errors, accidental deletion, and prolonged outages. |
Additional protection against failures in a specific availability zone within an Azure region. |
Additional protection against the failure of the entire Azure region. |
Latency |
Low Latency |
Mid latency since replicas are in the same region |
Higher latency as the secondary replica is in another region |
Additional cost |
None (just backup storage) |
+60% compute +100% storage +0% license |
+100% compute +100% storage +100% license (0% if passive) |
Replication |
|
Sync |
Async |
RPO |
Non-zero (minutes) |
0 (No data loss) |
Non-zero (seconds) |
RTO |
Hours |
Near-instant |
Longer (varies by setup) |
Failover |
Manual |
Automatic failover with minimal downtime |
Manual or semi-automatic failover with possible data loss. |
Use Case |
Default |
Business continuity within a region |
Disaster recovery across regions |
ZR and FOG can go together!
ZR and FOG are not mutually exclusive and can be combined based on business needs. In case they are combined, customers are not required to make both ends of the Geo-DR link be ZR - it’s their choice. For example, they can use ZR in a zonal region as a primary and also replicate using Geo-DR to a non-zonal region.
Other considerations (and How to Mitigate Them or Work around)
While ZR provides strong high availability, it comes with some trade-offs:
- Higher Pricing: ZR configurations require multiple replicas across different Availability Zones, leading to increased costs. Mitigation: Purchasing Reserved Instances can significantly reduce the cost of long-term ZR deployments. Alternatively, consider the non-ZR instance as Azure SQL MI offers excellent protection out of the box.
- Performance Penalty: ZR configurations introduce latency overhead due to data synchronization across zones. Because zone-redundant instances have replicas in different datacenters with some distance between them, the increased network latency might increase the transaction commit time and thus impact the performance of some OLTP workloads. No mitigation.
- Single AZ regions: Some regions consist of only one Availability Zone and cannot support ZR configurations. Workaround: Create an instance in another region that supports ZR.
- Capacity Constraints in Azure Regions: ZR requires additional VMs and storage, leading to higher capacity usage within a region, while every region has limited resources. Modifying your zone redundant instance may be temporarily disabled due to insufficient capacity of the hardware generation in your region. Workaround: To proceed with modifying your zone redundant instance, either select a different hardware generation or disable zone redundancy for the instance.
Call to Action
It’s easy to enable ZR for your new and existing instances - all it takes is a couple of clicks. The operation to change the ZR configuration is fully online with a failover at the end (see Dynamic Scaling ). You can always return to the single-zone configuration by disabling the zone-redundancy setting. This process is an online operation similar to the regular service tier objective upgrade. At the end of the process, the instance is migrated from a zone-redundant ring to a single-zone ring or vice versa.
To get started with zone redundancy for your SQL managed instance, review Configure zone redundancy.
Summary
In this article, we provided guidance for businesses to make informed decisions about using Zone Redundancy (ZR) in their Azure SQL Managed Instance deployments. We outlined the benefits of ZR, such as protection against datacenter and Availability Zone failures, while comparing it with Failover Groups to help businesses choose the best option for their needs. We also discussed the downsides of ZR, including increased costs and potential performance trade-offs, and suggested strategies to address the challenges.
Learn more
- Enable zone redundancy for Azure SQL Managed Instance.
- Learn How to initiate a manual failover on SQL Managed Instance
- For more options for high availability and disaster recovery, see Business Continuity