Startups at Microsoft

6 MIN READ

Azure capacity planning: Using quotas, reservations, vmss instance mix, and compute fleet

Microsoft

Oct 31, 2025

Introduction

Over the past few months, I’ve been helping several digital-native customers navigate capacity constraints while scaling AI and compute-intensive workloads on Azure. Many teams run into the same frustrating message:

“SkuNotAvailable – The requested size is currently not available in the location.”

This post summarizes the strategy I’ve been using to help customers design around these challenges combining Quota Groups, Capacity Reservations (ODCR), VMSS Instance Mix, and Compute Fleet. These tools don’t create capacity where none exists, but together, when paired with proactive alerts, they form a practical playbook for scaling reliably through regional constraints.

Quota vs. Capacity: What’s the difference?

Concept	What It Is	Who Controls It	Can You Fix It Yourself?
Quota	A logical limit on how many vCPUs or specific VM series you can deploy.	Microsoft (adjustable on request).	✅ Yes, request an increase.
Capacity	The physical availability of hardware in the datacenter.	Azure datacenter (supply and utilization).	❌ No, if no servers exist, no deployment will succeed.

Example: You have 300 vCPUs of quota for the D-series in East US 2. You try to deploy 100 D8as_v5 VMs and get a failure. You open a support request and find:

Your quota is fine
But the region has no physical capacity for D8as_v5

Even if Microsoft raised your quota to 1,000 vCPUs, the deployment would still fail because quota ≠ capacity.

Quota issue: You’ll see errors like OperationNotAllowed or QuotaExceeded.
Capacity issue: The message will be SkuNotAvailable or AllocationFailed.

If you see a quota error, open the Usage + quotas blade and request an increase. If it’s a capacity error, switching zones, SKUs, or regions, or using VMSS Instance Mix or Compute Fleet is your best next step.

“Quota is a number on paper. Capacity is what’s physically sitting in the racks.”

Strategy 1: Quota management and Quota Groups

Azure applies vCPU quotas by region and VM family (e.g., Dsv5, Esv5). Quota Groups provide a consolidated way to monitor and manage these logical limits across families.

Learn more:

Quota limits are easy to overlook until automation or scale pipelines fail. AI-heavy startups often discover too late that they’ve maxed out their quota family.

Best practices:

Monitor with Quota Group alerts: Use Quota Alerts (preview) to automatically notify you when usage reaches thresholds (e.g., 80%). Alerts integrate with Azure Monitor and Action Groups.
Request increases proactively: Portal path: Subscriptions → Usage + quotas → Request increase. Most CPU SKUs are approved quickly; GPUs can take longer.
Plan by family, not by SKU: If you only check “D8as_v5 usage,” you may miss that the entire D-series family is at its quota limit.

Strategy 2: Capacity Reservations (ODCR)

A Capacity Reservation (formally On-Demand Capacity Reservation, ODCR) lets you pre-book physical infrastructure in a specific region, zone, and VM size. You’re reserving capacity, not committing to a term or discount. Azure holds those servers for your subscription, ensuring your workloads can always start.

Learn more:

Capacity Reservation vs. Reserved Instance (RI)

Aspect	Capacity Reservation (ODCR)	Reserved Instance (RI)
Purpose	Guarantees capacity (hardware availability).	Locks in price (discounted rate).
Scope	Specific region, zone, and VM size.	Region and VM family.
Billing	Pay-as-you-go, no term commitment.	1 or 3-year fixed term.
Capacity Guarantee	✅ Yes, hardware is held for you.	❌ No, no guarantee.
Price Benefit	❌ None, PAYG rate.	✅ Up to ~70% discount.
Flexibility	Modify or cancel anytime.	Bound to term.

In short:

ODCR = “Hold my spot in the datacenter.”

RI = “Give me a discount because I’ll keep using it.”

You can use both: ODCR for capacity, RI for savings.

Example: A startup consistently runs 20× D16as_v5 VMs nightly for training. They reserve that capacity (ODCR) and apply RIs for discounts ensuring predictable performance and cost.

Limitations:

You can’t reserve SKUs already out of stock.
ODCR doesn’t autoscale, it holds your baseline.
Best for core workloads, not ephemeral jobs.

Strategy 3: VMSS Instance Mix

Virtual Machine Scale Set (VMSS) Instance Mix is a feature of VMSS Flex that enables capacity-aware scaling across multiple VM sizes, and even across different purchase options (Standard and Spot). When you define more than one acceptable VM size, Azure automatically chooses whichever has capacity available during scale-out.

Learn more:

VMSS Instance Mix – Overview

Example: Here’s a simplified configuration snippet from an ARM or Bicep template using Instance Mix:

"virtualMachineProfile": {
  "hardwareProfile": {
    "vmSizeProperties": {
      "vmSizes": [
        "Standard_D8as_v5",
        "Standard_E8as_v5",
        "Standard_F8s_v2"
      ]
    }
  }
}

VMSS Instance Mix helps you survive temporary SKU shortages by dynamically selecting the next available size, while Spot Priority Mix lets you blend Spot and Standard instances to reduce cost and improve resilience. This makes it ideal for large-scale app tiers, batch processing, and AI inference.

Limitations:

Works across zones, not regions.
Doesn’t mix Spot + Standard in the same pool.
Doesn’t reserve hardware capacity, it only improves allocation success rates.

Strategy 4: Azure Compute Fleet

Azure Compute Fleet can deploy up to 10,000 VMs across multiple SKUs, zones, and (in preview) regions. You define acceptable SKUs, and Azure picks the ones that have capacity.

Learn more:

Azure Compute Fleet – Overview

Fleet automatically:

Tries alternate SKUs (D8as_v5 → E8as_v5).
Expands to other zones or regions.
Combines Standard and Spot instances.

In short, it automates the “try this, then that” logic, improving your odds of successful deployment.

Example: A rendering studio needs 2,000 VMs nightly. Fleet dynamically uses D8s_v5, D16s_v5, or E8s_v5 across East US 2 and West US 2, depending on live availability.

Limitations:

Fleet doesn’t create capacity it just searches smarter. If every zone and region is full, it still fails. Ideal for AI training, batch jobs, rendering, or HPC, not for stateful services.

When to use what

Scenario	Best tool	What it solves
Logical limits before deployment	Quota Groups + Alerts	Prevent hitting soft limits.
Guaranteed baseline	Capacity Reservation (ODCR)	Reserve real hardware.
Managed autoscaling	VMSS Instance Mix	Scale out despite partial shortages.
Large-scale/bursty workloads	Azure Compute Fleet	Try alternate SKUs and regions.
GPU/high-demand SKUs	ODCR + Fleet	Reserve base, burst flexibly.

Real Talk: There’s no magic when a datacenter is full. Let’s be transparent: If a region has no physical servers available, no tool can make capacity appear.

Quota Groups remove logical blockers.
Capacity Reservations secure what you need.
Compute Fleet and VMSS Instance Mix increase the odds of success.

Together, they maximize probability, but none can override a physically full region.

The Azure capacity strategy flow

Final thoughts

For fast-scaling digital-native companies, the right question isn’t “How do I guarantee capacity?”. It’s “How do I design for capacity uncertainty?” Start by putting the basics on autopilot: Configure Quota Group alerts to prevent silent blockers.

Use Capacity Reservations (ODCR) to secure your baseline compute.
Add elasticity through VMSS Instance Mix and, when flexibility allows, Compute Fleet.
Monitor everything with Azure Monitor alerts — from quotas and reservations to scale-out failures and Fleet allocation health.

💡 Pro tip: Combine Quota Group Alerts, Reservation coverage monitoring, and VMSS/Fleet deployment telemetry in Azure Monitor to detect issues early. The faster you know what kind of failure you’re hitting, the faster you can act.

Accept that capacity is finite, but also that visibility is your greatest advantage. Azure gives you multiple levers; success comes from knowing when and how to use each one together.

Over the past few months, I’ve supported multiple customers, from AI platforms to SaaS startups, who faced real capacity challenges in regions like East US 2 and West US 2. This post came directly from those experiences, with one goal: to help others move from reactive firefighting to proactive, layered capacity planning. If your workloads are scaling fast, I hope this guide helps you build not just a plan, but a mindset, for running reliably when the cloud gets crowded.

Blog Post

Azure capacity planning: Using quotas, reservations, vmss instance mix, and compute fleet

Introduction

Quota vs. Capacity: What’s the difference?

Strategy 1: Quota management and Quota Groups

Strategy 2: Capacity Reservations (ODCR)

Strategy 3: VMSS Instance Mix

Strategy 4: Azure Compute Fleet

When to use what

Final thoughts

Further reading