Event details
⏱️ This live AMA is on January 22nd, 2026 at 9:00 AM PT. This same session is also scheduled at 5:00 PM PT on January 22nd.
SESSION DETAILS
This session breaks down the complexity of Azure pr...
Aaron_Bode
Updated Jan 23, 2026
Vandit
Jan 22, 2026Copper Contributor
How should teams decide the right PTU sizing to balance performance and cost, and what common mistakes lead to unexpectedly high PTU charges?
- kyleikedaJan 22, 2026
Microsoft
Thanks for the question. Here are some best practices to consider when purchasing PTUs:
- Workload Characteristics: Different workloads consume varying amounts of processing capacity. Generations require more capacity than prompts. Analyze historical token usage data or call shape estimations (input and output tokens, requests per minute) to approximate PTUs needed.
- Traffic Patterns: A wide distribution of call shapes, including some large calls, may lead to lower throughput per PTU compared to a narrower distribution with similar average sizes.
- Capacity Planning Tools: Utilize the Foundry calculator to size specific workload shapes and estimate the required PTUs based on input and output tokens.
- Benchmarking: The most accurate way to determine capacity is to benchmark a deployment with a representative workload for your use case.
Resources to learn more:
- The Foundry PTU quota calculator: https://ai.azure.com/resource/calculator
- Understanding costs associated with PTUs: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding?…