Issue Statement
With the adoption of Azure Data Factory (ADF), it has become increasingly popular across industries. However, data redundancy has become a major concern. Some Cloud Solution Architects (CSAs) are not very clear on this issue conceptually and architecturally, leading to prolonged discussions and arguments. This blog aims to clear up some confusion and accelerate the adoption of ADF. I might write a separate blog to cover other solution areas such as storage, databases, API management, and so on.
ADF Data Redundancy
ADF does not support zone redundancy; it only has built-in regional redundancy, which is completely managed by Microsoft. We cannot enable or disable this feature.
Microsoft's official documentation states: “Azure Data Factory data is stored and replicated in the paired region to protect against metadata loss. During regional datacenter failures, Azure may initiate a regional failover of your Azure Data Factory instance. In most cases, no action is required on your part. When the Microsoft-managed failover has completed, you'll be able to access your Azure Data Factory.” This should clear up a lot of confusion that most CSAs are facing.
This is why we do not see any option for configuring zone redundancy when creating new data factory resources. We cannot enable or disable zone redundancy for an existing data factory.
How to Implement Zone Redundancy
Using Source Control in Azure Data Factory
To ensure you can track and audit changes made to your metadata, consider setting up source control for your Azure Data Factory. This will also enable you to access your metadata JSON files for pipelines, datasets, linked services, and triggers. Azure Data Factory allows you to work with different Git repositories (Azure DevOps and GitHub). Through source control you can easily implement zone level or regional level redundancy manually whenever an outage occurs.
The details are covered in this article.
.
Monitoring New Releases
I will closely monitor new releases of ADF to see if we can have the option to configure zone or broader level data redundancy.