artifical intelligence
68 TopicsFoundry IQ: Unlocking ubiquitous knowledge for agents
Introducing Foundry IQ by Azure AI Search in Microsoft Foundry. Foundry IQ is a centralized knowledge layer that connects agents to data with the next generation of retrieval-augmented generation (RAG). Foundry IQ includes the following features: Knowledge bases: Available directly in the new Foundry portal, knowledge bases are reusable, topic-centric collections that ground multiple agents and applications through a single API. Automated indexed and federated knowledge sources – Expand what data an agent can reach by connecting to both indexed and remote knowledge sources. For indexed sources, Foundry IQ delivers automatic indexing, vectorization, and enrichment for text, images, and complex documents. Agentic retrieval engine in knowledge bases – A self-reflective query engine that uses AI to plan, select sources, search, rank and synthesize answers across sources with configurable “retrieval reasoning effort.” Enterprise-grade security and governance – Support for document-level access control, alignment with existing permissions models, and options for both indexed and remote data. Foundry IQ is available in public preview through the new Foundry portal and Azure portal with Azure AI Search. Foundry IQ is part of Microsoft's intelligence layer with Fabric IQ and Work IQ.42KViews6likes4CommentsNew Azure Open AI models bring fast, expressive, and real‑time AI experiences in Microsoft Foundry
Modern AI applications, whether voice‑first experiences or building large software systems, rarely fit into a single prompt. Real work unfolds over time: maintaining context, following instructions, invoking tools, and adapting as requirements evolve. When these foundations break down through latency spikes, instruction drift, or unreliable tool calls, both user conversations and developer workflows are impacted. OpenAI’s latest models address this shared challenge by prioritizing continuity and reliability across real‑time interaction and long‑running engineering tasks. Starting today, GPT-Realtime-1.5, GPT-Audio-1.5, and GPT-5.3-Codex are rolling out into Microsoft Foundry. Together, these models reflect the growing needs of the modern developer and push the needle from short, stateless interactions toward AI systems that can reason, act, and collaborate over time. GPT-5.3-Codex at a glance GPT‑5.3‑Codex brings together advanced coding capability with broader reasoning and professional problem solving in a single model built for real engineering work. It unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT5.2 in one system. This shifts the experience from optimizing isolated outputs to supporting longer running development efforts; where repositories are large, changes span multiple steps, and requirements aren’t always fully specified at the start. What’s improved Model experiences 25% faster execution time, according to Open AI, than its predecessors so developers can accelerate development of new applications. Built for long-running tasks that involve research, tool use, and complex, multi‑step execution while maintaining context. Midtask steerability and frequent updates allow developers to redirect and collaborate with the model as it works without losing context. Stronger computer-use capabilities allow developers to execute across the full spectrum of technical work. Common use cases Developers and teams can apply GPT‑5.3‑Codex across a wide range of scenarios, including: Refactoring and modernizing large or legacy applications Performing multi‑step migrations or upgrades Running agentic developer workflows that span analysis, implementation, testing, and remediation Automating code reviews, test generation, and defect detection Supporting development in security‑sensitive or regulated environments Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.3-Codex $1.75 $0.175 $14.00 GPT-Realtime-1.5 and GPT-Audio-1.5 at a glance The models deliver measurable gains in reasoning and speech understanding for real‑time voice interactions on Microsoft Foundry. In OpenAI’s evaluations, it shows a +5% lift on Big Bench Audio (reasoning), a +10.23% improvement in alphanumeric transcription, and a +7% gain in instruction following, while maintaining low‑latency performance. Key improvements include: What's improved More natural‑sounding speech: Audio output is smoother and more conversational, with improved pacing and prosody. Higher audio quality: Clearer, more consistent audio output across supported voices. Improved instruction following: Better alignment with developer‑provided system and user instructions during live interactions. Function calling support: Enables structured, tool‑driven interactions within real‑time audio flows. Common use cases Developers are using GPT-Realtime-1.5 and GPT-Audio-1.5 for scenarios where low‑latency voice interaction is essential, including: Conversational voice agents for customer support or internal help desks Voice‑enabled assistants embedded in applications or devices Live voice interfaces for kiosks, demos, and interactive experiences Hands‑free workflows where audio input and output replace keyboard interaction Pricing Model Text Audio Image Input Cached Input Output Input Cached Input Output Input Cached Input Output GPT-Realtime-1.5 $4.00 $0.04 $16.0 $32.0 $0.40 $64.00 $4.00 $0.04 $16.0 GPT-Audio-1.5 $2.50 n/a $10.0 $32.00 n/a $64.00 $2.50 n/a $10.0 Getting started in Microsoft Foundry Start building in Microsoft Foundry, evaluate performance, and explore Azure Open AI models today. Foundry brings evaluation, deployment, and governance into a single workflow, helping teams progress from experiments to scalable applications while maintaining security and operational controls.16KViews1like0CommentsIntroducing OpenAI's GPT-image-2 in Microsoft Foundry
Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy. OpenAI's GPT-image-2 is now generally available and rolling out today to Microsoft Foundry, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one. What is new in GPT-image-2? GPT-image-2 brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows. Real world intelligence GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks. Multilingual understanding GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized. Increased resolution support GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions. Resolution guidelines to keep in mind: Constraint Detail Total pixel budget Maximum pixels in final image cannot exceed 8,294,400 Minimum pixels in final image cannot be less than 655,360 Requests exceeding this are automatically resized to fit. Resolutions 4K, 1024x1024, 1536x1024, and 1024x1536 Dimension alignment Each dimension must be a multiple of 16 Note: If your requested resolution exceeds the pixel budget, the service will automatically resize it down. Intelligent routing layer GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value. Mode 1 — Legacy size selection In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation: Size tier Description smimage Small image output image Standard image output xlimage Large image output This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes. Mode 2 — Token size bucket selection In Mode 2, the routing layer selects from six token size buckets — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers: Token bucket Approximate legacy size 16, 24 smimage 36, 48 image 64, 96 xlimage This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt. See it in action GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used: Prompt: Interior of an empty subway car (no people). Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows. Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor). As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image: Prompt: Populate the ad frames with a cohesive ad campaign for “Zava Flower Delivery” and use an array of flower types. And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change: Prompt: In all Zava Flower Delivery advertisements, change the flowers shown to roses (red and pink roses). And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases. Image generation across industries These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios: Retail & e-commerce: Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing. Marketing: Produce crisp, rich in color campaign visuals and social assets localized to different markets. Media & entertainment: Generate storyboard panels and scene at resolutions suited to production pipelines. Education & training: Create visual learning aids and course materials formatted to exact display requirements across devices. UI/UX design: Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires. Trust and safety At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. Learn more about responsible AI at Microsoft. For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content. Pricing Model Offer type Pricing - Image Pricing - Text GPT-image-2 Standard Global Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $30 Input Tokens: $5 Cached Input Tokens: $1.25 Note: All prices are per 1M token. There is no billing for output tokens for the GPT-image-2 model. Getting started Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. GPT-image-2 gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the GPT-image-2 in Microsoft Foundry today! Deploy the model in Microsoft Foundry Experiment with the model in the Image playground Read the documentation to learn more15KViews3likes3CommentsIntroducing OpenAI’s GPT-5.4 mini and GPT-5.4 nano for low-latency AI
Imagine you’re a developer building a research assistant agent on top of GPT‑5.4. The agent retrieves documents, summarizes findings, and answers follow‑up questions across multiple turns. In early testing, the reasoning quality is strong, but as the agent chains together retrieval, tool calls, and generation, latency starts to add up. For interactive experiences, those delays matter—so many teams adopt a multi‑model approach, using a larger model to plan and smaller models to execute subtasks quickly at scale. This is where GPT‑5.4 mini and GPT‑5.4 nano come in. These smaller variants of GPT-5.4 are optimized for developer workloads where latency, cost savings, and agentic design are top of mind. GPT-5.4 mini and GPT-5.4 nano will be rolling out today in Microsoft Foundry, so you can evaluate them in the model catalog and deploy the right option for each workload. GPT-5.4 mini: efficient reasoning for production workflows GPT-5.4 mini distills GPT-5.4’s strengths into a smaller, more efficient model for developer workloads where responsiveness matters. It significantly improves over GPT-5 mini across coding, reasoning, multimodal understanding, and tool use while running about 2X faster. Text and image inputs: build multimodal experiences that combine prompts with screenshots or other images. Tool use and function calling: reliably invoke tools and APIs for agentic workflows. Web search and file search: ground responses in external or enterprise content as part of multi-step tasks. Computer use: support software-interaction loops where the model interprets UI state and takes well-scoped actions. Where GPT-5.4 mini thrives Developer copilots and coding assistants: latency-sensitive coding help, code review suggestions, and fast iteration loops where turnaround time matters. Multimodal developer workflows: applications that interpret screenshots, understand UI state, or process images as part of coding and debugging loops. Computer-use sub-agents: fast executors that take well-scoped actions in software (for example, navigating UIs or completing repetitive steps) within a larger agent loop coordinated by a planner model. GPT-5.4 nano: ultra-low latency automation at scale GPT-5.4 nano is the smallest and fastest model in the lineup, designed for low-latency and low-cost API usage at high throughput. It’s optimized for short-turn tasks like classification, extraction, and ranking, plus lightweight sub-agent work where speed and cost are the priority and extended multi-step reasoning isn’t required. Strong instruction following: consistent adherence to developer intent across short, well-defined interactions. Function and tool calling: dependable invocation of tools and APIs for lightweight agent and automation scenarios. Coding support: optimized performance for common coding tasks where fast turnaround is required. Image understanding: multimodal image input support for basic image interpretation alongside text. Low-latency, low-cost execution: designed to deliver responses quickly and efficiently at scale. Where GPT-5.4 nano thrives GPT-5.4 nano is a strong fit when you need predictable behavior at very high throughput and the task can be expressed as short, well-scoped instructions. Classification and intent detection: fast labeling and routing decisions for high-volume requests. Extraction and normalization: pull structured fields from text, validate formats, and standardize outputs. Ranking and triage: reorder candidates, prioritize tickets/leads, and select best-next actions under tight latency budgets. Guardrails and policy checks: lightweight safety and policy classification, prompt gating, and enforcement decisions before dispatching to tools or larger models. High-volume text processing pipelines: batch transformation, cleanup, deduping, and normalization steps where unit cost and throughput dominate. Routing and prioritization at the edge: select the right downstream workflow (template, queue, or model) for each request under tight latency budgets. Choosing the right GPT-5.4 model Microsoft Foundry makes it possible to deploy multiple GPT-5.4 variants side by side, so teams can route requests to the model that best fits each task. Here’s a practical way to think about the lineup: Model Best suited for Typical workloads GPT-5.4 Sustained, multi-step reasoning with reliable follow-through Agentic workflows, research assistants, document analysis, complex internal tools GPT-5.4 Pro Deeper, higher-reliability reasoning for complex production scenarios High-stakes agentic workflows, long-form analysis and synthesis, complex planning, advanced internal copilots GPT-5.4 mini Balanced reasoning with lower latency for interactive systems Real-time agents, developer tools, retrieval-augmented applications GPT-5.4 nano Ultra-low latency and high throughput High-volume request routing, real-time chat, lightweight automation Responsible AI in Microsoft Foundry At Microsoft, our mission to empower people and organizations remains constant. In the age of AI, trust is foundational to adoption, and earning that trust requires a commitment to transparency, safety, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy GPT-5.4 models responsibly in production environments, aligned with Microsoft's Responsible AI principles. Pricing Model Deployment Input (USD $/M tokens) Cached input (USD $/M tokens) Output (USD $/M tokens) GPT-5.4 mini Standard Global $0.75 $0.075 $4.5 GPT-5.4 nano Standard Global $0.20 $0.02 $1.25 The models are also available in Data Zone US. It is rolling out to Data Zone EU. Getting started Explore the models in Microsoft Foundry. Sign in to the Foundry portal and browse the model catalog to evaluate GPT-5.4 mini and GPT-5.4 nano alongside other options, then deploy the right model for each workload.14KViews0likes1CommentIntroducing OpenAI’s GPT-image-1.5 in Microsoft Foundry
Developers building with visual AI can often run into the same frustrations: images that drift from the prompt, inconsistent object placement, text that renders unpredictably, and editing workflows that break when iterating on a single asset. That’s why we are excited to announce OpenAI's GPT Image 1.5 is now generally available in Microsoft Foundry. This model can bring sharper image fidelity, stronger prompt alignment, and faster image generation that supports iterative workflows. Starting today, customers can request access to the model and start building in the Foundry platform. Meet GPT Image 1.5 AI driven image generation began with early models like OpenAI's DALL-E, which introduced the ability to transform text prompts into visuals. Since then, image generation models have been evolving to enhance multimodal AI across industries. GPT Image 1.5 represents continuous improvement in enterprise-grade image generation. Building on the success of GPT Image 1 and GPT Image 1 mini, these enhanced models introduce advanced capabilities that cater to both creative and operational needs. The new image model offer: Text-to-image: Stronger instruction following and highly precise editing. Image-to-image: Transform existing images to iteratively refine specific regions Improved visual fidelity: More detailed scenes and realistic rendering. Accelerated creation times: Up to 4x faster generation speed. Enterprise integration: Deploy and scale securely in Microsoft Foundry. GPT Image 1.5 delivers stronger image preservation and editing capabilities, maintaining critical details like facial likeness, lighting, composition, and color tone across iterative changes. You’ll see more consistent preservation of branded logos and key visuals, making it especially powerful for marketing, brand design, and ecommerce workflows—from graphics and logo creation to generating full product catalogs (variants, environments, and angles) from a single source image. Benchmarks Based on an internal Microsoft dataset, GPT Image 1.5 performs higher than other image generation models in prompt alignment and infographics tasks. It focuses on making clear, strong edits – performing best on single-turn modification, delivering the higher visual quality in both single and multi-turn settings. The following results were found across image generation and editing: Text to image Prompt alignment Diagram / Flowchart GPT Image 1.5 91.2% 96.9% GPT Image 1 87.3% 90.0% Qwen Image 83.9% 33.9% Nano Banana Pro 87.9% 95.3% Image editing Evaluation Aspect Modification Preservation Visual Quality Face Preservation Metrics BinaryEval SC (semantic) DINO (Visual) BinaryEval AuraFace Single-turn GPT image 1 99.2% 51.0% 0.14 79.5% 0.30 Qwen image 81.9% 63.9% 0.44 76.0% 0.85 GPT Image 1.5 100% 56.77% 0.14 89.96% 0.39 Multi-turn GPT Image 1 93.5% 54.7% 0.10 82.8% 0.24 Qwen image 77.3% 68.2% 0.43 77.6% 0.63 GPT image 1.5 92.49% 60.55% 0.15 89.46% 0.28 Using GPT Image 1.5 across industries Whether you’re creating immersive visuals for campaigns, accelerating UI and product design, or producing assets for interactive learning GPT Image 1.5 gives modern enterprises the flexibility and scalability they need. Image models can allow teams to drive deeper engagement through compelling visuals, speed up design cycles for apps, websites, and marketing initiatives, and support inclusivity by generating accessible, high‑quality content for diverse audiences. Watch how Foundry enables developers to iterate with multimodal AI across Black Forest Labs, OpenAI, and more: Microsoft Foundry empowers organizations to deploy these capabilities at scale, integrating image generation seamlessly into enterprise workflows. Explore the use of AI image generation here across industries like: Retail: Generate product imagery for catalogs, e-commerce listings, and personalized shopping experiences. Marketing: Create campaign visuals and social media graphics. Education: Develop interactive learning materials or visual aids. Entertainment: Edit storyboards, character designs, and dynamic scenes for films and games. UI/UX: Accelerate design workflows for apps and websites. Microsoft Foundry provides security and compliance with built-in content safety filters, role-based access, network isolation, and Azure Monitor logging. Integrated governance via Azure Policy, Purview, and Sentinel gives teams real-time visibility and control, so privacy and safety are embedded in every deployment. Learn more about responsible AI at Microsoft. Pricing Model Pricing (per 1M tokens) - Global GPT-image-1.5 Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $32 Cost efficiency improves as well: image inputs and outputs are now cheaper compared to GPT Image 1, enabling organizations to generate and iterate on more creative assets within the same budget. For detailed pricing, refer here. Getting started Learn more about image generation, explore code samples, and read about responsible AI protections here. Try GPT Image 1.5 in Microsoft Foundry and start building multimodal experiences today. Whether you’re designing educational materials, crafting visual narratives, or accelerating UI workflows, these models deliver the flexibility and performance your organization needs.9.3KViews2likes1CommentBuilding Production-Ready, Secure, Observable, AI Agents with Real-Time Voice with Microsoft Foundry
We're excited to announce the general availability of Foundry Agent Service, Observability in Foundry Control Plane, and the Microsoft Foundry portal — plus Voice Live integration with Agent Service in public preview — giving teams a production-ready platform to build, deploy, and operate intelligent AI agents with enterprise-grade security and observability.9.3KViews2likes0CommentsFoundry IQ: boost response relevance by 36% with agentic retrieval
The latest RAG performance evaluations and results for knowledge bases and built-in agentic retrieval engine. Foundry IQ by Azure AI Search is a unified knowledge layer for agents, designed to improve response performance, automate RAG workflows and enable enterprise-ready grounding. These evaluations tested RAG performance for knowledge bases and new features including retrieval reasoning effort and federated sources like web and SharePoint for M365. Foundry IQ and Azure AI Search are part of Microsoft Foundry.7.9KViews5likes0CommentsAnnouncing GPT‑5‑Codex: Redefining Developer Experience in Azure AI Foundry
Today, we’re excited to announce OpenAI’s GPT‑5‑Codex is generally available in Azure AI Foundry, and in public preview for GitHub Copilot in Visual Studio Code. This release is the next step in our continuous commitment to empower developers with the latest model innovation, now building on the proven strengths of the earlier Codex generation along with the speed and CLI fluency many teams have adopted with the latest codex‑mini. Next-level features for developers Multimodal coding in a single flow: GPT-5-Codex accepts multimodal inputs including text and image. With this multimodal intelligence, developers are now empowered to tackle complex tasks, delivering context-aware, repository-scale solutions in one single workflow. Advanced tool use across various experiences: GPT-5-Codex is built for real-world developer experiences. Developers in Azure AI Foundry can get seamless automation and deep integration via the Response API, improving developers’ productivity and reducing development time. Code review expertise: GPT‑5‑Codex is specially trained to conduct code reviews and surface critical flows, helping developers catch issues early and improve code quality with AI-powered insights. It transforms code review from a manual bottleneck into an intelligent, adaptive and integrated process, empowering developers to deliver high-quality code experience. How GPT‑5‑Codex makes your life easier Stay in flow, not in friction: With GPT‑5‑Codex, move smoothly from reading issues to writing code and checking UI; all in one place. It keeps context, so developers stay focused and productive. No more jumping between tools or losing track of what they were doing. Refactor and migrate with confidence: Whether cleaning up code or moving to a new framework, GPT‑5‑Codex helps stage updates, run tests, and fix issues as you go. It’s like having a digital colleague for those tricky transitions. Hero use cases: real impact for developers Repo‑aware refactoring assistant: Feed repo and architecture diagrams to GPT‑5‑Codex. Get cohesive refactors, automated builds, and visual verification via screenshots. Flaky test hunter: Target failing test matrices. The model executes runs, polls status, inspects logs, and recommends fixes looping until stability. Cloud migration copilot: Edit IaC scripts, kick off CLI commands, and iterate on errors in a controlled loop, reducing manual toil. Pricing and Deployment available at GA Deployment Available Region Pricing ($/million tokens) Standard Global East US 2 Sweden Central Input Cached Input Output $1.25 $0.125 $10.00 GPT-5-Codex is bringing developers’ coding experience to a new level. Don’t just write code. Let’s redefine what’s possible. Start building with GPT-5-Codex today and turn your bold ideas into reality now powered by the latest innovation in Azure AI Foundry.7.3KViews2likes2CommentsIntroducing OpenAI's newest chat model in Microsoft Foundry
OpenAI's GPT-5.5 Instant (or Chat-latest in the API) begins rolling out in Microsoft Foundry today as GPT-chat-latest. Built on GPT-5.4 and GPT-5.3-chat, the new model delivers measurable gains in factual accuracy, tool calling, and response efficiency. These improvements translate directly into more reliable production deployments. GPT-chat-latest is designed for the workflows builders are actually shipping: multi-turn assistants, agentic systems that orchestrate tools, and retrieval-grounded applications where precision and grounding matter as much as conversational quality. Why the name is changing In Microsoft Foundry, we are introducing GPT-chat-latest as the product name for this release, while the model continues to follow the existing Preview lifecycle and standard notice periods. We are also evaluating ways to simplify how customers access continuously updated models over time, but current behavior remains unchanged as that work continue Smarter, more factually reliable GPT-chat-latest closes the factuality gap from prior iterations with significant reductions in hallucinations, especially in domains where accuracy matters most. According to OpenAI, the new model produces 52.5% fewer hallucinations and reduces hallucinated claims by 37.3% on conversations previously flagged for factual errors when compared to GPT-5.3-chat. These gains extend beyond text. GPT-chat-latest shows improvements in visual reasoning, expert multimodal understanding, and STEM tasks, with measurable lifts across standard benchmarks: Benchmark GPT-5.3-chat GPT-chat-latest CharXiv-reasoning Scientific Chart Reasoning 75.0 81.6 MMMU-Pro Expert multimodal reasoning 69.2 76.0 GPQA PhD-level science questions 78.5 85.6 AIME 2025 Competition math 65.4 81.2 *Data shown comes from OpenAI’s testing” For builders shipping into regulated workloads such as clinical decision support, legal research, financial advisory, and technical analysis, these improvements raise the bar on the kinds of applications GPT-chat-latest can assist with. More efficient outputs GPT-chat-latest produces responses that may be more to-the-point without losing substance. The model may reduce verbosity and over formatting, ask fewer follow-up questions, and avoid cluttered output patterns that often require post-processing in production UIs. For builders, this can translate to two concrete benefits: lower output token costs at scale, and cleaner responses that drop into product surfaces with less downstream cleanup. In comparative testing from OpenAI, GPT-chat-latest produced roughly 25–30% fewer words than GPT-5.3-chat across a range of common prompts while preserving response quality, and in many cases improving it. Improving intelligence and tool calling GPT-chat-latest introduces measurable improvements in how the model interacts with tools, including better judgment about when and how to invoke them. The model produces more structured and context-aware tool invocation outputs, which is particularly relevant for workflows that rely on function calling, retrieval-augmented generation, and multi-step reasoning. Equally important, the model is better at deciding whether a tool is needed in the first place, reducing unnecessary tool calls in scenarios where it already has the information to answer directly. Improved search and context handling GPT-chat-latest includes targeted improvements to how the model retrieves, interprets, and synthesizes information when search is involved, with enhancements to query formulation, result ranking, and filtering, plus more grounded synthesis of retrieved content into final responses. These changes improve handling of ambiguous or underspecified queries and reduce noise in answers that depend on retrieved content. The model also makes better use of the context developers pass in, including system prompts, conversation history, retrieved documents, and structured data. Applications that maintain long-running state or stitch together multiple retrieval steps produce more coherent, context-aware outputs without developers having to over-engineer prompt scaffolding. Use Cases: When to choose the chat model Developers typically choose a chat-optimized model like GPT-5.5-chat when the application needs to sustain multi-turn conversations while reliably following instructions and coordinating external tools. This is a fit for assistants and agentic workflows where the model must interpret user intent over time, decide when to retrieve additional context, and produce structured outputs for downstream systems rather than just generate free-form text. Customer support and contact centers: virtual agents that maintain conversational context across a case, retrieve policy or product documentation via search, and hand off to a ticketing or CRM system through tool calls when escalation is needed. Retail and e-commerce: shopping and service assistants that clarify preferences over multiple turns, reference catalogs and policies via retrieval, and generate structured actions such as returns, exchanges, and order lookups through integrated tools. Manufacturing and field service: technician-facing assistants that combine conversational guidance with retrieval of manuals and work instructions, plus structured task creation in maintenance systems. Use GPT-chat-latest Use GPT-5.5 Reasoning Multi-turn assistants and customer-facing chat experiences Harder problems that benefit from more deliberate, step-by-step thinking Agentic workflows that coordinate tools (search, retrieval, ticketing, CRM) and benefit from structured tool outputs Complex analysis, planning, or decision support where correctness matters more than conversational flow Interactive experiences where you want quick back-and-forth clarification and task completion Tasks involving multi-constraint reasoning (policy interpretation, detailed requirements, long-horizon plans) RAG-based apps where the model must decide when to retrieve and then synthesize grounded answers Offline or low-tool scenarios where the main value is deeper reasoning over provided context Pricing Model Input ($/1M tokens) Cached input ($/1M tokens) Output ($/1M tokens) GPT-chat-latest $5 $0.50 $30 Responsible AI in Microsoft Foundry At Microsoft, our mission to empower people and organizations remains constant. In the age of AI, trust is foundational to adoption, and earning that trust requires a commitment to transparency, safety, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy models responsibly in production environments, aligned with Microsoft's Responsible AI principles. Getting started GPT-chat-latest is rolling out in Microsoft Foundry today.7.3KViews1like0CommentsA New Chapter for Realtime AI: Reasoning, Translation, and Real-Time Transcription
Voice can be one of the most direct and productive interfaces for AI — enabling customer support agents that may resolve issues without a single keystroke, live multilingual communication that can take on language barriers as conversations happen, and voice assistants capable of reasoning through complex requests in real time. Developers building these experiences need models that can keep pace with increasingly demanding latency, accuracy, and language coverage requirements. Today, OpenAI’s GPT-realtime-translate, GPT‑realtime‑2 and, GPT-realtime-whisper are rolling out into Microsoft Foundry starting today — together representing a significant step forward for the realtime model lineup available to developers on the platform. GPT-realtime-translate and GPT-realtime-whisper GPT-realtime-translate and GPT-realtime-whisper together extend the realtime stack for live multilingual audio workflows. GPT-realtime-translate is built for continuous, real-time translation, producing translated output as speech unfolds without relying on segmented pipeline processing, while GPT-realtime-whisper provides low-latency streaming transcription of the original audio in parallel. Used together, they help developers support scenarios such as live events, cross-language customer experiences, captions, monitoring, and archival workflows that require both translated output and visibility into the source speech. Continuous stream processing: This new model translates live audio without segmenting or buffering allowing for more natural interactions. New translation and transcription capabilities: Translate between languages in real time and observe faster text to speech. Available via the Realtime API GPT-realtime-2 GPT‑realtime‑2 is a generational upgrade to OpenAI's speech-to-speech model, bringing internal reasoning and an expanded context window to real-time voice applications. Where previous speech to speech models responded immediately, GPT‑realtime‑2 can work through a problem before speaking — making it well suited for voice applications that need to handle complex, multi-step queries entirely in the audio layer without routing to a separate text pipeline. Native reasoning capability: The newest realtime model introduces stronger reasoning capabilities. Now the model thinks internally before responding. Adjustable reasoning effort via {reasoning.effort}: Explicitly request the level of reasoning the model uses -- minimal, low, medium, high – to save on cost and latency. Audio in, audio out: No need for an intermediary text step, conversation stays fluid and natural. Available via the Realtime API This models is coming soon to Microsoft Foundry. Since, May 6, the models have been rolling out into the model catalog. We are excited for you to explore and build with our evolving collection of frontier models. Use cases These models work independently, but they're designed to complement each other in real-world pipelines: Live multilingual events. GPT-realtime-translate enables real-time translation of live audio, producing translated speech along with a transcript in the target language. GPT‑realtime‑whisper can be used in parallel to capture a transcription of the original speech for captions, monitoring, or archival purposes. Together, they enable multilingual live streaming with both translated experiences and visibility into the source language. Global customer support. Route inbound calls through GPT-realtime-translate to translate conversations in real time and provide a translated transcript for agents. Use GPT‑realtime‑whisper alongside it to capture the original conversation as text for compliance, quality review, or analytics. Then pass the interaction to an agent built with GPT‑realtime‑2 using {reasoning.effort}: high for complex issue resolution, all within a continuous audio pipeline. International voice assistants. Build once and deploy across languages. GPT-realtime-translate enables multilingual interaction and provides translated output with a target-language transcript, while GPT‑realtime‑whisper can optionally capture the original user input as text. GPT‑realtime‑2 manages reasoning and conversational context, supporting more complex voice interactions. Pricing Model Deployment Modality Pricing per 1M tokens Input Cached Input Output GPT-realtime-2 Global Standard Audio $32.00 $0.40 $64.00 Text $4.00 $0.40 $24.00 Image $5.00 $0.50 -- GPT-realtime-translate Global Standard Audio -- -- $2.04/hour GPT-realtime-whisper Global Standard Audio -- -- $1.02/hour *Pricing for GPT-realtime-translate and GPT-realtime-whisper will be done by the hour Getting Started Looking for ways to dive in? GPT-realtime-translate, GPT-realtime-whisper, and GPT‑realtime‑2 are rolling out into Microsoft Foundry today. Explore the model catalog and start building: https://ai.azure.com5.3KViews1like5Comments