Blog Post

Microsoft Foundry Blog
4 MIN READ

Introducing OpenAI’s GPT-5.4 mini and GPT-5.4 nano for low-latency AI

Naomi Moneypenny's avatar
Mar 17, 2026

Imagine you’re a developer building a research assistant agent on top of GPT‑5.4. The agent retrieves documents, summarizes findings, and answers follow‑up questions across multiple turns. In early testing, the reasoning quality is strong, but as the agent chains together retrieval, tool calls, and generation, latency starts to add up. For interactive experiences, those delays matter—so many teams adopt a multi‑model approach, using a larger model to plan and smaller models to execute subtasks quickly at scale.

This is where GPT‑5.4 mini and GPT‑5.4 nano come in. These smaller variants of GPT-5.4 are optimized for developer workloads where latency, cost savings, and agentic design are top of mind. GPT-5.4 mini and GPT-5.4 nano are available today in Microsoft Foundry, so you can evaluate them in the model catalog and deploy the right option for each workload.

GPT-5.4 mini: efficient reasoning for production workflows

GPT-5.4 mini distills GPT-5.4’s strengths into a smaller, more efficient model for developer workloads where responsiveness matters. It significantly improves over GPT-5 mini across coding, reasoning, multimodal understanding, and tool use while running about 2X faster.

  • Text and image inputs: build multimodal experiences that combine prompts with screenshots or other images.
  • Tool use and function calling: reliably invoke tools and APIs for agentic workflows.
  • Web search and file search: ground responses in external or enterprise content as part of multi-step tasks.
  • Computer use: support software-interaction loops where the model interprets UI state and takes well-scoped actions.

Where GPT-5.4 mini thrives

  • Developer copilots and coding assistants: latency-sensitive coding help, code review suggestions, and fast iteration loops where turnaround time matters.
  • Multimodal developer workflows: applications that interpret screenshots, understand UI state, or process images as part of coding and debugging loops.
  • Computer-use sub-agents: fast executors that take well-scoped actions in software (for example, navigating UIs or completing repetitive steps) within a larger agent loop coordinated by a planner model.

GPT-5.4 nano: ultra-low latency automation at scale

GPT-5.4 nano is the smallest and fastest model in the lineup, designed for low-latency and low-cost API usage at high throughput. It’s optimized for short-turn tasks like classification, extraction, and ranking, plus lightweight sub-agent work where speed and cost are the priority and extended multi-step reasoning isn’t required.

  • Strong instruction following: consistent adherence to developer intent across short, well-defined interactions.
  • Function and tool calling: dependable invocation of tools and APIs for lightweight agent and automation scenarios.
  • Coding support: optimized performance for common coding tasks where fast turnaround is required.
  • Image understanding: multimodal image input support for basic image interpretation alongside text.
  • Low-latency, low-cost execution: designed to deliver responses quickly and efficiently at scale.

Where GPT-5.4 nano thrives

GPT-5.4 nano is a strong fit when you need predictable behavior at very high throughput and the task can be expressed as short, well-scoped instructions.

  • Classification and intent detection: fast labeling and routing decisions for high-volume requests.
  • Extraction and normalization: pull structured fields from text, validate formats, and standardize outputs.
  • Ranking and triage: reorder candidates, prioritize tickets/leads, and select best-next actions under tight latency budgets.
  • Guardrails and policy checks: lightweight safety and policy classification, prompt gating, and enforcement decisions before dispatching to tools or larger models.
  • High-volume text processing pipelines: batch transformation, cleanup, deduping, and normalization steps where unit cost and throughput dominate.
  • Routing and prioritization at the edge: select the right downstream workflow (template, queue, or model) for each request under tight latency budgets.

Choosing the right GPT-5.4 model

Microsoft Foundry makes it possible to deploy multiple GPT-5.4 variants side by side, so teams can route requests to the model that best fits each task. Here’s a practical way to think about the lineup:

Model

Best suited for

Typical workloads

GPT-5.4

Sustained, multi-step reasoning with reliable follow-through

Agentic workflows, research assistants, document analysis, complex internal tools

GPT-5.4 Pro

Deeper, higher-reliability reasoning for complex production scenarios

 

High-stakes agentic workflows, long-form analysis and synthesis, complex planning, advanced internal copilots

 

GPT-5.4 mini

Balanced reasoning with lower latency for interactive systems

Real-time agents, developer tools, retrieval-augmented applications

GPT-5.4 nano

Ultra-low latency and high throughput

High-volume request routing, real-time chat, lightweight automation

Responsible AI in Microsoft Foundry

At Microsoft, our mission to empower people and organizations remains constant. In the age of AI, trust is foundational to adoption, and earning that trust requires a commitment to transparency, safety, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy GPT-5.4 models responsibly in production environments, aligned with Microsoft's Responsible AI principles.

Pricing

Model

Deployment

Input (USD $/M tokens)

Cached input (USD $/M tokens)

Output (USD $/M tokens)

GPT-5.4 mini

Standard Global

$0.75

 $0.075

 $4.5

GPT-5.4 nano

Standard Global

$0.20

 $0.02

 $1.25

The models are also available in Data Zone US. It is rolling out to Data Zone EU.

Getting started

Explore the models in Microsoft Foundry. Sign in to the Foundry portal and browse the model catalog to evaluate GPT-5.4 mini and GPT-5.4 nano alongside other options, then deploy the right model for each workload.

Updated Mar 17, 2026
Version 3.0
No CommentsBe the first to comment