Blog Post
Managing Traffic Jams with Azure OpenAI PTU Spillover
I tried using the PTU Spillover feature by adding x-ms-spillover-deployment in the header through "by request". However, it did not work correctly. I still received a 429 response and was not automatically redirected to the standard deployment.
import requests
import json
url = "$AZURE_OPENAI_ENDPOINT/openai/deployments/{ptu-deployment}/chat/completions?api-version=2025-02-01-preview"
payload = json.dumps({
"messages": [
{
"content": "Does Azure OpenAI support customer managed keys?",
"role": "user"
}
],
"model": "gpt-4o",
"stream": False
})
headers = {
"Content-Type": "application/json",
"x-ms-spillover-deployment": "{spillover-standard-deployment}",
"Authorization": "Bearer YOUR_AUTH_TOKEN"
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
I believe I have met the following prerequisites:
1. A primary (provisioned) deployment. (Model: Gpt-4o, Provisioned Managed)
2. A standard deployment (Model: Gpt-4o, Global Standard)
3. Both deployments need to be part of the same Azure OpenAI Service resource. (Deploy on same region)
The only part I’m unsure about is whether a provisioned deployment can spill over to a global standard deployment?
According to the documentation, both deployments must have the same data processing type (e.g., a global provisioned lane can only spill over to a global standard lane).
By the way, is there a minimum API version requirement for PTU Spillover?
- jakeatmsftApr 21, 2025
Microsoft
In the current preview state, PTU spillover only supports Azure OpenAI resources with Networking access set to "Allow All". The feature will support all configurations when it is announced GA (General Availability).