Blog Post

Microsoft Foundry Blog
4 MIN READ

Introducing OpenAI's newest chat model in Microsoft Foundry

Naomi Moneypenny's avatar
May 05, 2026

OpenAI's GPT-5.5 Instant (or Chat-latest in the API) begins rolling out in Microsoft Foundry today as GPT-chat-latest. Built on GPT-5.4 and GPT-5.3-chat, the new model delivers measurable gains in factual accuracy, tool calling, and response efficiency. These improvements translate directly into more reliable production deployments.

GPT-chat-latest is designed for the workflows builders are actually shipping: multi-turn assistants, agentic systems that orchestrate tools, and retrieval-grounded applications where precision and grounding matter as much as conversational quality.

Why the name is changing

In Microsoft Foundry, we are introducing GPT-chat-latest as the product name for this release, while the model continues to follow the existing Preview lifecycle and standard notice periods. We are also evaluating ways to simplify how customers access continuously updated models over time, but current behavior remains unchanged as that work continue

Smarter, more factually reliable

GPT-chat-latest closes the factuality gap from prior iterations with significant reductions in hallucinations, especially in domains where accuracy matters most. According to OpenAI, the new model produces 52.5% fewer hallucinations and reduces hallucinated claims by 37.3% on conversations previously flagged for factual errors when compared to GPT-5.3-chat.

These gains extend beyond text. GPT-chat-latest shows improvements in visual reasoning, expert multimodal understanding, and STEM tasks, with measurable lifts across standard benchmarks:

Benchmark

GPT-5.3-chat

GPT-chat-latest

CharXiv-reasoning 

Scientific Chart Reasoning

75.0

81.6

MMMU-Pro

Expert multimodal reasoning

69.2

76.0 

GPQA

PhD-level science questions

78.5

85.6

AIME 2025

Competition math

65.4

81.2

 *Data shown comes from OpenAI’s testing”

For builders shipping into regulated workloads such as clinical decision support, legal research, financial advisory, and technical analysis, these improvements raise the bar on the kinds of applications GPT-chat-latest can assist with.

More efficient outputs

GPT-chat-latest produces responses that may be more to-the-point without losing substance. The model may reduce verbosity and over formatting, ask fewer follow-up questions, and avoid cluttered output patterns that often require post-processing in production UIs.

For builders, this can translate to two concrete benefits: lower output token costs at scale, and cleaner responses that drop into product surfaces with less downstream cleanup. In comparative testing from OpenAI, GPT-chat-latest produced roughly 25–30% fewer words than GPT-5.3-chat across a range of common prompts while preserving response quality, and in many cases improving it.

Improving intelligence and tool calling

GPT-chat-latest introduces measurable improvements in how the model interacts with tools, including better judgment about when and how to invoke them. The model produces more structured and context-aware tool invocation outputs, which is particularly relevant for workflows that rely on function calling, retrieval-augmented generation, and multi-step reasoning.

Equally important, the model is better at deciding whether a tool is needed in the first place, reducing unnecessary tool calls in scenarios where it already has the information to answer directly.

Improved search and context handling

GPT-chat-latest includes targeted improvements to how the model retrieves, interprets, and synthesizes information when search is involved, with enhancements to query formulation, result ranking, and filtering, plus more grounded synthesis of retrieved content into final responses. These changes improve handling of ambiguous or underspecified queries and reduce noise in answers that depend on retrieved content.

The model also makes better use of the context developers pass in, including system prompts, conversation history, retrieved documents, and structured data. Applications that maintain long-running state or stitch together multiple retrieval steps produce more coherent, context-aware outputs without developers having to over-engineer prompt scaffolding.

Use Cases: When to choose the chat model

Developers typically choose a chat-optimized model like GPT-5.5-chat when the application needs to sustain multi-turn conversations while reliably following instructions and coordinating external tools. This is a fit for assistants and agentic workflows where the model must interpret user intent over time, decide when to retrieve additional context, and produce structured outputs for downstream systems rather than just generate free-form text.

  • Customer support and contact centers: virtual agents that maintain conversational context across a case, retrieve policy or product documentation via search, and hand off to a ticketing or CRM system through tool calls when escalation is needed.
  • Retail and e-commerce: shopping and service assistants that clarify preferences over multiple turns, reference catalogs and policies via retrieval, and generate structured actions such as returns, exchanges, and order lookups through integrated tools.
  • Manufacturing and field service: technician-facing assistants that combine conversational guidance with retrieval of manuals and work instructions, plus structured task creation in maintenance systems.

 

Use GPT-chat-latest

Use GPT-5.5 Reasoning

Multi-turn assistants and customer-facing chat experiences

Harder problems that benefit from more deliberate, step-by-step thinking

Agentic workflows that coordinate tools (search, retrieval, ticketing, CRM) and benefit from structured tool outputs

Complex analysis, planning, or decision support where correctness matters more than conversational flow

Interactive experiences where you want quick back-and-forth clarification and task completion

Tasks involving multi-constraint reasoning (policy interpretation, detailed requirements, long-horizon plans)

RAG-based apps where the model must decide when to retrieve and then synthesize grounded answers

Offline or low-tool scenarios where the main value is deeper reasoning over provided context

 

Pricing

Model

Input ($/1M tokens)

Cached input ($/1M tokens)

Output ($/1M tokens)

GPT-chat-latest

$1.75$0.175$14.00

 

Responsible AI in Microsoft Foundry

At Microsoft, our mission to empower people and organizations remains constant. In the age of AI, trust is foundational to adoption, and earning that trust requires a commitment to transparency, safety, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy models responsibly in production environments, aligned with Microsoft's Responsible AI principles.

Getting started

GPT-chat-latest is rolling out in Microsoft Foundry today.

Updated May 05, 2026
Version 1.0
No CommentsBe the first to comment