Blog Post

Microsoft Foundry Blog
3 MIN READ

Announcing Priority Processing in Microsoft Foundry for Performance-Sensitive AI Workloads

Sethu_Raman's avatar
Sethu_Raman
Icon for Microsoft rankMicrosoft
Mar 23, 2026

Authors: Sethu Raman, Chris Hoder

As generative AI solutions move from experimentation to production, consistent performance under variable demand is becoming a critical requirement. Today, we are announcing Priority Processing is now generally available in Microsoft Foundry, a new capability designed to help organizations run latency-sensitive AI workloads with greater performance consistency and pay-per-call spending flexibility.

Predictable Performance for Latency-Sensitive AI in Production

As AI applications move into production, enterprises face growing pressure to deliver predictable, low‑latency performance for real-time copilots and agentic workflows—without upfront monthly or annual financial commitments.

Priority Processing is designed to address these deployment challenges by prioritizing latency‑sensitive inference requests with pay-per-call flexible spending, enabling SLA‑backed performance for interactive AI workloads without requiring provisioned throughput commitments.

  • Enables consistent high-speed performance on a pay-as-you-go basis
  • Dynamically allocates compute for time-critical workloads
  • Supports real time AI applications without monthly or yearly throughput commitments

Built for Real-Time AI Experiences 

Enterprise AI deployments frequently combine synchronous and asynchronous workloads such as live chat assistants, internal productivity copilots, scheduled document processing pipelines, and offline summarization jobs.

For example, a financial services organization running real-time fraud detection alongside nightly transaction summarization experienced detection latency spikes during batch windows. With Priority Processing enabled, fraud detection requests maintained consistent response times regardless of background workload volume.

Priority Processing integrates directly into Microsoft Foundry deployments and can be applied across a range of production use cases, including:

  • Real-time customer engagement copilots
  • Interactive developer tools
  • Financial services decisioning workflows
  • Operational dashboards
  • AI-powered agent orchestration scenarios

Organizations can differentiate between background workloads and interactive production applications without modifying their infrastructure or resource management strategies.

 

Customer Spotlight

Organizations including Adobe and Harvey are using Priority Processing in Microsoft Foundry to support latency-sensitive AI experiences while maintaining throughput for background workloads.

 

Early adopters report improved responsiveness for interactive workloads while continuing to process asynchronous jobs without requiring dedicated infrastructure or manual traffic management during peak demand periods.

Pricing

Priority Processing uses the same token-based pricing model as Standard. It is available in both Global and Data Zone deployments. For Global deployments, Priority Processing is priced at a premium over the Standard tier (for example, 2× for the latest models such as GPT 5.4), reflecting prioritized access for latency sensitive workloads. Data Zone deployments carry a modest additional 10% uplift over Global pricing to support regional data processing requirements.

Choose the Right Deployment for Your Workload Needs

Microsoft Foundry provides flexibility for organizations to select deployment options based on three workload considerations:

  • Data processing requirements
  • Latency performance needs
  • Overall throughput requirements

To help customers choose the right deployment for your workload needs, we recommend the following steps:

Select the Right Data Processing Boundary

  • Global: Broadest model access and highest throughput at the lowest price.
  • Data Zone: Data processed within US/EU boundaries with higher price and lower default throughput.
  • Regional: Strict data residency for regulated environments with reduced model availability.

Align Deployment to Latency Sensitivity

  • Latency-sensitive production workloads should use Priority Processing.
  • Balanced production workloads can run on Standard deployments.
  • Mission-critical, high-scale production workloads should consider Provisioned Throughput
  • Bulk processing workloads without latency requirements can run using Batch deployments.

Diagram: Guidance on Selecting the Right Deployment

Workload Type and Needs

Recommended Deployment Type

Latency-sensitive production

Priority Processing for Standard

Balanced production

Standard

Mission-critical, high-scale

Provisioned Throughput (PTUM)

Bulk processing

Batch

Scale from Development to Production

Customers can deploy using pay-per-call environments and scale to commitment-based offers as workloads grow, unlocking:

  • Lower cost at scale
  • Service level agreements (SLAs)
  • Enterprise production features

With Foundry full‑stack deployments, customers can flexibly scale AI workloads from development to mission‑critical production—balancing performance, reliability, and cost efficiency without re‑architecting their infrastructure.

Get Started

Evaluate Priority Processing for your production AI applications by reviewing Microsoft Foundry documentation or connecting with your Microsoft account team to assess deployment readiness for latency-sensitive workloads.

Updated Mar 23, 2026
Version 2.0
No CommentsBe the first to comment