Introduction
Over the past few months, I’ve been helping several digital-native customers navigate capacity constraints while scaling AI and compute-intensive workloads on Azure. Many teams run into the same frustrating message:
“SkuNotAvailable – The requested size is currently not available in the location.”
This post summarizes the strategy I’ve been using to help customers design around these challenges combining Quota Groups, Capacity Reservations (ODCR), VMSS Instance Mix, and Compute Fleet. These tools don’t create capacity where none exists, but together, when paired with proactive alerts, they form a practical playbook for scaling reliably through regional constraints.
Quota vs. Capacity: What’s the difference?
|
Concept |
What It Is |
Who Controls It |
Can You Fix It Yourself? |
|
Quota |
A logical limit on how many vCPUs or specific VM series you can deploy. |
Microsoft (adjustable on request). |
✅ Yes, request an increase. |
|
Capacity |
The physical availability of hardware in the datacenter. |
Azure datacenter (supply and utilization). |
❌ No, if no servers exist, no deployment will succeed. |
Example: You have 300 vCPUs of quota for the D-series in East US 2. You try to deploy 100 D8as_v5 VMs and get a failure. You open a support request and find:
- Your quota is fine
- But the region has no physical capacity for D8as_v5
Even if Microsoft raised your quota to 1,000 vCPUs, the deployment would still fail because quota ≠ capacity.
- Quota issue: You’ll see errors like OperationNotAllowed or QuotaExceeded.
- Capacity issue: The message will be SkuNotAvailable or AllocationFailed.
If you see a quota error, open the Usage + quotas blade and request an increase. If it’s a capacity error, switching zones, SKUs, or regions, or using VMSS Instance Mix or Compute Fleet is your best next step.
“Quota is a number on paper. Capacity is what’s physically sitting in the racks.”
Strategy 1: Quota management and Quota Groups
Azure applies vCPU quotas by region and VM family (e.g., Dsv5, Esv5). Quota Groups provide a consolidated way to monitor and manage these logical limits across families.
Learn more:
Quota limits are easy to overlook until automation or scale pipelines fail. AI-heavy startups often discover too late that they’ve maxed out their quota family.
Best practices:
- Monitor with Quota Group alerts: Use Quota Alerts (preview) to automatically notify you when usage reaches thresholds (e.g., 80%). Alerts integrate with Azure Monitor and Action Groups.
- Request increases proactively: Portal path: Subscriptions → Usage + quotas → Request increase. Most CPU SKUs are approved quickly; GPUs can take longer.
- Plan by family, not by SKU: If you only check “D8as_v5 usage,” you may miss that the entire D-series family is at its quota limit.
Strategy 2: Capacity Reservations (ODCR)
A Capacity Reservation (formally On-Demand Capacity Reservation, ODCR) lets you pre-book physical infrastructure in a specific region, zone, and VM size. You’re reserving capacity, not committing to a term or discount. Azure holds those servers for your subscription, ensuring your workloads can always start.
Learn more:
Capacity Reservation vs. Reserved Instance (RI)
|
Aspect |
Capacity Reservation (ODCR) |
Reserved Instance (RI) |
|
Purpose |
Guarantees capacity (hardware availability). |
Locks in price (discounted rate). |
|
Scope |
Specific region, zone, and VM size. |
Region and VM family. |
|
Billing |
Pay-as-you-go, no term commitment. |
1 or 3-year fixed term. |
|
Capacity Guarantee |
✅ Yes, hardware is held for you. |
❌ No, no guarantee. |
|
Price Benefit |
❌ None, PAYG rate. |
✅ Up to ~70% discount. |
|
Flexibility |
Modify or cancel anytime. |
Bound to term. |
In short:
- ODCR = “Hold my spot in the datacenter.”
- RI = “Give me a discount because I’ll keep using it.”
You can use both: ODCR for capacity, RI for savings.
Example: A startup consistently runs 20× D16as_v5 VMs nightly for training. They reserve that capacity (ODCR) and apply RIs for discounts ensuring predictable performance and cost.
Limitations:
- You can’t reserve SKUs already out of stock.
- ODCR doesn’t autoscale, it holds your baseline.
- Best for core workloads, not ephemeral jobs.
Strategy 3: VMSS Instance Mix
Virtual Machine Scale Set (VMSS) Instance Mix brings the same capacity-aware logic from Compute Fleet into managed workloads like AKS node pools, web apps, and autoscaling services. It allows you to define multiple acceptable VM SKUs within a single Scale Set. Azure will automatically select whichever size has available capacity when scaling out.
Learn more:
Example: Here’s a simplified configuration snippet from an ARM or Bicep template using Instance Mix:
"virtualMachineProfile": {
"hardwareProfile": {
"vmSizeProperties": {
"vmSizes": [
"Standard_D8as_v5",
"Standard_E8as_v5",
"Standard_F8s_v2"
]
}
}
}
For AKS or other autoscaled workloads, VMSS Instance Mix drastically reduces the risk of scale-out failures caused by SKU saturation.
You no longer have to guess which SKU will succeed, Azure makes that decision dynamically based on live capacity signals.
Limitations:
- Works across zones, not regions.
- Doesn’t mix Spot + Standard in the same pool.
- Doesn’t reserve hardware capacity, it only improves allocation success rates.
Strategy 4: Azure Compute Fleet
Azure Compute Fleet can deploy up to 10,000 VMs across multiple SKUs, zones, and (in preview) regions. You define acceptable SKUs, and Azure picks the ones that have capacity.
Learn more:
Fleet automatically:
- Tries alternate SKUs (D8as_v5 → E8as_v5).
- Expands to other zones or regions.
- Combines Standard and Spot instances.
In short, it automates the “try this, then that” logic, improving your odds of successful deployment.
Example: A rendering studio needs 2,000 VMs nightly. Fleet dynamically uses D8s_v5, D16s_v5, or E8s_v5 across East US 2 and West US 2, depending on live availability.
Limitations:
Fleet doesn’t create capacity it just searches smarter. If every zone and region is full, it still fails. Ideal for AI training, batch jobs, rendering, or HPC, not for stateful services.
When to use what
|
Scenario |
Best tool |
What it solves |
|
Logical limits before deployment |
Quota Groups + Alerts |
Prevent hitting soft limits. |
|
Guaranteed baseline |
Capacity Reservation (ODCR) |
Reserve real hardware. |
|
Managed autoscaling |
VMSS Instance Mix |
Scale out despite partial shortages. |
|
Large-scale/bursty workloads |
Azure Compute Fleet |
Try alternate SKUs and regions. |
|
GPU/high-demand SKUs |
ODCR + Fleet |
Reserve base, burst flexibly. |
Real Talk: There’s no magic when a datacenter is full. Let’s be transparent: If a region has no physical servers available, no tool can make capacity appear.
- Quota Groups remove logical blockers.
- Capacity Reservations secure what you need.
- Compute Fleet and VMSS Instance Mix increase the odds of success.
Together, they maximize probability, but none can override a physically full region.
The Azure capacity strategy flow
Final thoughts
For fast-scaling digital-native companies, the right question isn’t “How do I guarantee capacity?”. It’s “How do I design for capacity uncertainty?” Start by putting the basics on autopilot: Configure Quota Group alerts to prevent silent blockers.
- Use Capacity Reservations (ODCR) to secure your baseline compute.
- Add elasticity through VMSS Instance Mix and, when flexibility allows, Compute Fleet.
- Monitor everything with Azure Monitor alerts — from quotas and reservations to scale-out failures and Fleet allocation health.
💡 Pro tip: Combine Quota Group Alerts, Reservation coverage monitoring, and VMSS/Fleet deployment telemetry in Azure Monitor to detect issues early. The faster you know what kind of failure you’re hitting, the faster you can act.
Accept that capacity is finite, but also that visibility is your greatest advantage. Azure gives you multiple levers; success comes from knowing when and how to use each one together.
Over the past few months, I’ve supported multiple customers, from AI platforms to SaaS startups, who faced real capacity challenges in regions like East US 2 and West US 2. This post came directly from those experiences, with one goal: to help others move from reactive firefighting to proactive, layered capacity planning. If your workloads are scaling fast, I hope this guide helps you build not just a plan, but a mindset, for running reliably when the cloud gets crowded.