This post walks through a production‑proven pattern for running stateful services across Azure regions using Terraform. We’ll cover a primary–replica Redis architecture, regional isolation with Key Vault and networking, and a clean Terraform parameterization strategy that scales from development to production without duplication.
Why Multi‑Region State Is Hard
Running applications globally is easy when everything is stateless—if something fails, you redeploy.
But stateful services tell a different story.
Caches, message brokers, and data stores can’t be treated as disposable. They hold business‑critical data, and downtime or inconsistency quickly becomes customer‑visible.
In real‑world systems, common requirements include:
- Low‑latency reads from multiple regions
- Automatic recovery when a region becomes unavailable
- Predictable data consistency
- Repeatable infrastructure from dev through production
Manually configuring this per region doesn’t scale. Drift sets in. Failover is unclear. Backups get forgotten.
That’s where Terraform + Azure Managed Redis geo‑replication shines.
Github Link : https://github.com/vsakash5/Managed-redis.git
High‑Level Architecture
We use a primary–replica Redis Enterprise model:
- Primary Redis
- Single write endpoint
- Highly available inside its region
- Source of truth
- Replica Redis
- Read‑only
- Asynchronously synced from primary
- Can be promoted during disaster recovery
Each region is fully isolated:
- Separate subnets
- Separate Key Vaults
- Private Endpoints only (no public exposure)
This prevents shared failure domains and allows each region to operate independently if needed.
The Terraform Design Principle
Instead of maintaining separate Terraform stacks per region, the key idea is:
One reusable module, one tfvars file per environment, multiple regions inside it.
The module is written once.
Regional differences are supplied via parameter suffixes like:
- _replica
- _secondary
- _tertiary
This keeps logic centralized and environments consistent.
Core Parameter Layers
1. Environment Identity (Shared)
Terraform
environment = "dev" # dev | staging | prod
context_prefix = "app"
Show more lines
These values are reused everywhere—names, tags, and identifiers.
2. Primary Region
Terraform
location = "eastus2"
resource_group_name = "rg-app-dev-primary"
Show more lines
3. Replica Region
Terraform
location_replica = "uksouth"
resource_group_name_replica = "rg-app-dev-replica"
The symmetry is intentional.
Terraform can now apply the same module twice without branching logic.
Regional Isolation: Networking and Secrets
Why isolation matters
Geo‑replication copies data, not dependencies.
If both Redis instances depend on:
- the same subnet
- the same Key Vault
then a failure in one region can cascade into the other.
Networking (One Subnet per Region)
Benefits:
- Independent NSGs
- Independent routing
- Independent capacity planning
Key Vault (One per Region)
Why this matters:
- Redis credentials are not replicated
- Each region stores its own secrets
- A Key Vault outage doesn’t take both regions down
Redis Configuration
Primary Redis (Writes Enabled)
The geo‑replication group name must match.
That’s the logical binding Azure uses to link instances.
Private Endpoint‑Only Access
No Redis instance is exposed publicly.
Each region uses:
- A private endpoint
- A workload subnet
- Internal DNS resolution
This means:
- No public IPs
- No inbound attack surface
- Traffic stays on the Azure backbone
Linking Primary and Replica
Terraform explicitly defines the relationship:
Terraform
managed_redis_geo_replication_config = {
primary_to_replica = {
primary_redis_key = "primary"
replica_keys = ["replica"]
}
}
Terraform ensures:
- Primary is created first
- Replica is deployed second
- Geo‑replication is established last
Environment Scaling: Dev → Staging → Prod
The infrastructure pattern never changes.
Only values do.
| Environment | Group Name |
|---|---|
| Dev | dev-grp |
| Staging | stg-grp |
| Prod | prod-grp |
This is how you avoid “snowflake” environments.
Disaster Recovery Strategy
If the primary region fails:
- Applications fail over to the replica read endpoint
- Terraform configuration is updated to:
- Remove geo‑replication
- Promote replica config to primary
- Traffic is fully restored
Once the original region recovers, roles can be re‑established cleanly.
No click‑ops.
No guesswork.
Key Lessons Learned
1. Naming is Infrastructure
Predictable names enable automation, discovery, and auditing.
2. Key Vault Isolation Beats Availability
A shared Key Vault is a shared outage.
3. Parameterization Beats Copy‑Paste
Fix once → benefit everywhere.
4. Geo‑Replication Is a Contract
Matching replication group names is non‑negotiable.
5. The tfvars File Is the Source of Truth
If it’s not in Terraform, it’s not real.
Final Thoughts
Running stateful services in multiple regions doesn’t require magic—
it requires discipline:
- Isolate aggressively
- Parameterize consistently
- Automate everything
- Test failure often
With this approach, adding a new region becomes configuration—not redesign.
That’s how infrastructure scales.