A practical, enterprise‑grade approach to model upgrades
Model upgrades in Microsoft Foundry are not a one‑time platform operation, they are an ongoing application lifecycle discipline. Microsoft documentation makes this explicit: models evolve continuously, older versions are eventually retired, and customers are expected to evaluate, migrate, and validate applications ahead of retirement deadlines. In addition, Microsoft clearly separates model versioning from API versioning, which means a safe upgrade strategy must validate both runtime behavior and application contracts.
This blog outlines a practical, enterprise‑grade approach to model upgrades that aligns with current Microsoft guidance. The goal is to help teams move from reactive upgrades to a repeatable, evaluation‑driven release process that minimizes risk while keeping applications current.
1. Treat model upgrades as a planned release motion
Microsoft Foundry models release new versions on a regular cadence. Teams can pin to a specific version or configure deployments to automatically move to newer defaults. However, Microsoft also documents that deployments that remain pinned beyond a retirement date will stop accepting requests.
The implication is straightforward: a “set‑and‑forget” approach is not viable. Model upgrades should be treated like any other production dependency upgrade similar to a database engine, Kubernetes version, or security library.
Recommended baseline
- Every production AI application has an explicit model owner.
- Each deployment has a documented upgrade policy and retirement tracker.
- A tested fallback deployment is always available.
- Upgrades follow a defined rollout and rollback plan.
This mindset turns upgrades into a predictable release motion instead of an emergency response.
2. Separate model versioning from API versioning
Microsoft recommends separates model versioning from API versioning and treating them as independent concerns is critical for safe upgrades. The API version is an application contract: it defines request and response semantics, schema expectations, and SDK compatibility. The model version, by contrast, controls runtime behavior such as reasoning quality, latency, token usage, and safety characteristics.
- The model version (weights and runtime behavior).
- The API version used by the application to invoke the deployment.
This distinction matters because your migration may involve one of three scenarios:
Scenario A: model weights/version change only: Your prompt behavior, latency, safety, formatting, and reasoning may change, but the API contract may remain stable.
Scenario B: API contract changes only: Your code may need updates even if the application behavior is conceptually similar.
Scenario C: both change together: This is the highest-risk migration, because both runtime behavior and application integration can shift.
A model upgrade does not always require an API version change, and a single deployment can often support multiple API versions. However, risk increases when both change at the same time.
What to validate during upgrades
- Prompt and output behavior
- Structured output and function calling conformance
- Latency and token consumption
- API and SDK compatibility
- Downstream parsers and integrations
Explicitly testing both dimensions avoids silent regressions.
Sample code for API versioning & Model deployment alias
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
# Required in new Foundry: project endpoint (documented format)
# https://<AIFoundryResourceName>.services.ai.azure.com/api/projects/<ProjectName>
project = AIProjectClient(
credential=DefaultAzureCredential(),
endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"],
)
deployment_alias = os.environ["MODEL_DEPLOYMENT_NAME"] # e.g., "prod-gpt4o" or "candidate-gpt4o"
# Pin the API contract if you are using the versioned inference surface (YYYY-MM-DD).
# (If you move to the v1 API, you typically don’t need dated api_version pins.)
api_version = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-10-21")
with project.get_openai_client(api_version=api_version) as client:
resp = client.chat.completions.create(
model=deployment_alias,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key risks of model upgrades."}
],
temperature=0.2
)
print(resp.choices[0].message.content)
3. Build a centralized model inventory
Before upgrading anything, teams should create a central model inventory across environments. Microsoft configures upgrade policies per deployment, and behavior at retirement depends on that configuration.
At minimum, the inventory should capture:
- Application and business criticality
- Environment and Azure region
- Deployment type (standard vs. provisioned)
- Model name and model version
- API version used in code
- Upgrade policy and fallback deployment
- Retirement stage and owner
Without this inventory, controlled migration planning is not possible.
4. Choose upgrade policies deliberately
For standard deployments, Microsoft documents three upgrade policy options:
- Opt out of automatic upgrades
- Upgrade when a new default becomes available
- Upgrade when the current version expires
Each option represents a trade‑off between stability and agility.
Recommended enterprise pattern
- Tier 1 (high‑risk / regulated): manual upgrades with explicit certification
- Tier 2 (important): upgrade on expiry
- Tier 3 (low‑risk): upgrade on new default
This tiered approach creates consistency and avoids ad‑hoc decisions across teams
5. Establish early warning for retirements
Microsoft provides retirement notifications through Azure Service Health, subscription email alerts, and lifecycle information in the documentation. Models progress through Preview → GA → Legacy → Deprecated → Retired stages, with defined notice windows.
Operational best practice
- Monitor Service Health for upgrade and retirement advisories.
- Map Microsoft notices into internal runbooks.
- Begin evaluation and candidate testing as soon as a model enters Legacy or Deprecated status.
Early awareness turns retirements into planned work rather than outages.
6. Design for version coexistence
Microsoft notes that not all model versions are available in all regions, and N and N+1 versions may not coexist in the same region. In some cases, upgrades can skip versions due to capacity constraints.
Recommended architecture
- Introduce a model abstraction layer in applications.
- Route requests through deployment aliases or configuration.
- Support side‑by‑side baseline and candidate deployments.
This design enables traffic switching without code redeployment and supports regional testing strategies.
7. Use evaluation as the upgrade gate
Microsoft positions Microsoft Foundry Evaluations as the primary mechanism for comparing model versions across quality, safety, and consistency dimensions. Both UI‑based and code‑first workflows are supported.
Best practice
- Maintain a versioned golden evaluation dataset per application.
- Include representative prompts, edge cases, safety‑sensitive inputs, and RAG scenarios.
- Define application‑specific pass/fail criteria rather than relying on ad‑hoc prompting.
Evaluation is not intuition it should determine upgrade readiness.
8. Use a two‑layer validation model
A mature strategy combines:
- Offline certification using Foundry Evaluations before any traffic shift.
- Controlled online validation using canary traffic to observe real‑world behavior.
Offline tests catch regressions early, while online validation captures production variance that test datasets cannot fully represent.
Offline Evaluation Gate (simple harness pattern)
This is a simple evaluation “gate” harness used to test whether a new AI model (or agent) is good enough before deploying it. Think of it as a quality checkpoint in an AI/LLMOps pipeline.
import os
from azure.ai.evaluation import (
evaluate,
RelevanceEvaluator,
GroundednessEvaluator,
ContentSafetyEvaluator
)
# Model config (deployment alias, not version)
model_config = {
"azure_endpoint": os.environ["AZURE_OPENAI_ENDPOINT"],
"api_key": os.environ["AZURE_OPENAI_API_KEY"],
"azure_deployment": os.environ["MODEL_DEPLOYMENT_NAME"]
}
# Define evaluators
evaluators = {
"relevance": RelevanceEvaluator(model_config),
"groundedness": GroundednessEvaluator(model_config),
"safety": ContentSafetyEvaluator(model_config)
}
# Golden dataset (example)
dataset = [
{
"query": "List key risks in model upgrades",
"response": "Model upgrades may affect latency, output format, and safety behavior.",
"context": "Model upgrades change runtime behavior and may impact downstream systems."
},
{
"query": "Explain rollback strategy",
"response": "Use side-by-side deployments and config-based traffic routing.",
"context": "Foundry supports deployment aliases and canary routing."
}
]
# Run evaluation
results = evaluate(
data=dataset,
evaluators=evaluators
)
print(results)
9. Standard vs. provisioned deployments
Microsoft explicitly states that documented upgrade policies apply to standard deployments. Provisioned deployments require separate capacity and model‑management planning.
Enterprises should branch their operational playbooks early:
- Standard deployments use policy‑based upgrades.
- Provisioned deployments require explicit cutover and rollback planning.
10. Next step to implementation.
The next steps to implement this guidance focus on operationalizing ownership, de‑risking model upgrades through architecture and policy, and gating every change with evidence‑based evaluation.
- Operationalize ownership and visibility: Create a centralized model inventory, assign a clear owner per application, and define upgrade, fallback, and retirement policies so upgrades are planned, not reactive.
- De-risk upgrades through architecture and policy: Choose upgrade policies by risk tier and design for version coexistence (baseline + candidate deployments) to enable safe cutover and rollback without code redeployments.
- Gate every upgrade with evidence: Use Foundry Evaluations and a two‑stage validation approach (offline certification + limited canary traffic) to approve upgrades based on measurable quality, safety, and performance outcomes
Final recommendation
The most effective upgrade strategy in Microsoft Foundry is neither “upgrade immediately” nor “never change.” Microsoft documentation consistently points to a balanced approach:
Architect for flexibility, operate conservatively, and gate every upgrade with evaluation.
Teams that institutionalize this approach turn model upgrades into a routine, low‑risk release motion and avoid the operational and business impact of unplanned retirements.
References
Model lifecycle, deprecation, and retirement
Design to Support Foundation Model Life Cycles - Azure Architecture Center | Microsoft Learn
API versioning vs. model versioning
Azure OpenAI in Microsoft Foundry Models REST API reference - Microsoft Foundry | Microsoft Learn
Evaluations as an upgrade gate
Run evaluations from the Microsoft Foundry portal - Microsoft Foundry | Microsoft Learn