Today, we are excited to announce the general availability of Voice Live API, which enables real-time speech-to-speech conversational experience through a unified API powered by generative AI models. With Voice Live API, developers can easily voice-enable any agent built with the Azure AI Foundry Agent Service.
Azure AI Foundry Agent Service, enables the operation of agents that make decisions, invoke tools, and participate in workflows across development, deployment, and production. By eliminating the need to stitch together disparate components, Voice Live API offers a low latency, end-to-end solution for voice-driven experiences.
As always, a diverse range of customers provided valuable feedback during the preview period. Along with announcing general availability, we are also taking this opportunity to address that feedback and improve the API. Following are some of the new features designed to assist developers and enterprises in building scalable, production-ready voice agents.
More natively integrated GenAI models including GPT-Realtime
Voice Live API enables developers to select from a range of advanced AI models designed for conversational applications, such as GPT-Realtime, GPT-5, GPT-4.1, Phi, and others. These models are natively supported and fully managed, eliminating the need for developers to manage model deployment or plan for capacity.
These natively supported models may each have a distinct stage in their life cycle (e.g. public preview, generally available) and be subject to varying pricing structures. The table below lists the models supported in each pricing tier.
Pricing Tier |
Generally Available |
In Public Preview |
Voice Live Pro |
GPT-Realtime, GPT-4.1, GPT-4o |
GPT-5 |
Voice Live Standard |
GPT-4o-mini, GPT-4.1-mini |
GPT-4o-Mini-Realtime, GPT-5-mini |
Voice Live Lite |
NA |
Phi-4-MM-Realtime, GPT-5-Nano, Phi-4-Mini |
Extended speech languages to 140+
Voice Live API now supports speech input in over 140 languages/locales. View all supported languages by configuration.
Automatic multilingual configuration is enabled by default, using the multilingual model.
Integrated with Custom Speech
Developers need customization to better manage input and output for different use cases. Besides the support for Custom Voice released in May 2025, Voice Live now supports seamless integration with Custom Speech for improved speech recognition results.
Developers can also improve speech input accuracy with phrase lists and refine speech synthesis pronunciation using custom lexicons, all without training a model.
Learn how to customize speech and voice models for Voice Live API.
Natural HD voices upgraded
Neural HD voices in Azure AI Speech are contextually aware and engineered to provide a natural, expressive experience, making them ideal for voice agent applications. The latest V2 upgrade enhances lifelike qualities with features such as natural pauses, filler words, and seamless transitions between speaking styles, all available with Voice Live API.
Check out the latest demo of Ava Neural HD V2.
Improved VAD features for interruption detection
Voice Live API now features semantic Voice Activity Detection (VAD), enabling it to intelligently recognize pauses and filler word interruptions in conversations. In the latest en-US evaluation on Multilingual filler words data, Voice Live API achieved ~20% relative improvement from previous VAD models. This leap in performance is powered by integrating semantic VAD into the n-best pipeline, allowing the system to better distinguish meaningful speech from filler noise and enabling more accurate latency tracking and cleaner segmentation, especially in multilingual and noisy environments.
4K avatar support
Voice Live API enables efficient integration with streaming avatars. With the latest updates, avatar options offer support for high-fidelity 4K video models. Learn more about the avatar update.
Improved function calling and integration with Azure AI Foundry Agent Service
Voice Live API enables function calling to assist developers in building robust voice agents with their chosen generative AI models. This release improves asynchronous function calls and enhances integration with Azure AI Foundry Agent Service for agent creation and operation. Learn more about creating a voice live real-time voice agent with Azure AI Foundry Agent Service.
More developer resources and availability in more regions
Developer resources are available in C# and Python, with more to come. Get started with Voice Live API.
Voice Live API is available in more regions now including Australia East, East US, Japan East, and UK South, besides the previously supported regions such as Central India, East US 2, South East Asia, Sweden Central, and West US 2. Check the features supported in each region.
Customers adopting Voice Live
In healthcare, patient experience is always the top priority. With Voice Live, eClinicalWorks’ healow Genie contact center solution is now taking healthcare modernization a step further. healow is piloting Voice Live API for Genie to inform patients about their upcoming appointments, answer common questions, and return voicemails. Reducing these routine calls saves healthcare staff hours each day and boosts patient satisfaction through timely interactions. “We’re looking forward to using Azure AI Foundry Voice Live API so that when a patient calls, Genie can detect the question and respond in a natural voice in near-real time,” said Sidd Shah, Vice President of Strategy & Business Growth at healow. “The entire roundtrip is all happening in Voice Live API.” If a patient asks about information in their medical chart, Genie can also fetch data from their electronic health record (EHR) and provide answers. Read the full story here.
“If we did multiple hops to go across different infrastructures, that would add up to a diminished patient experience. The Azure AI Foundry Voice Live API is integrated into one single, unified solution, delivering speech-to-text and text-to-speech in the same infrastructure.” Bhawna Batra, VP of Engineering at eClinicalWorks |
Capgemini, a global business and technology transformation partner, is reimagining its global service desk managed operations through its Capgemini Cloud Infrastructure Services (CIS) division. The first phase covers 500,000 users across 45 clients, which is only part of the overall deployment base. The goal is to modernize the service desk to meet changing expectations for speed, personalization, and scale.
To drive this transformation, Capgemini launched the “AI-Powered Service Desk” platform powered by Microsoft technologies including Dynamics 365 Contact Center, Copilot Studio, and Azure AI Foundry. A key enhancement was the integration of Voice Live API for real-time voice interactions, enabling intelligent, conversational support across telephony channels. The new platform delivers a more agile, truly conversational, AI-driven service experience, automating routine tasks and enhancing agent productivity. With scalable voice capabilities and deep integration across Microsoft’s ecosystem, Capgemini is positioned to streamline support operations, reduce response times, and elevate customer satisfaction across its enterprise client base.
"Integrating Microsoft’s Voice Live API into our platform has been transformative. We’re seeing measurable improvements in user engagement and satisfaction thanks to the API’s low-latency, high-quality voice interactions. As a result, we are able to deliver more natural and responsive experiences, which have been positively received by our customers.” Stephen Hilton, EVP Chief Operating Officer at CIS Capgemini |
Astra Tech, a fast-growing UAE-based technology group part of G42, is bringing Voice Live API to its flagship platform, botim, a fintech-first and AI-native platform. Eight out of 10 smartphone users in the UAE already rely on the app. The company is now reshaping botim from a communications tool into a fintech-first service, adding features such as digital wallets, international remittances, and micro-loans. To achieve its broader vision, Astra Tech set out to make botim simpler, more intuitive, and more human. “Voice removes a lot of complexity, and it’s the most natural way to interact,” says Frenando Ansari, Lead Product Manager at Astra Tech. “For users with low digital literacy or language barriers, tapping through traditional interfaces can be difficult. Voice personalizes the experience and makes it accessible in their preferred language.”
" The Voice Live API acts as a connective tissue for AI-driven conversation across every layer of the app. It gives us a standardized framework so that different product teams can incorporate voice without needing to hire deep AI expertise.” Frenando Ansari, Lead Product Manager at Astra Tech
“The most impressive thing about the Voice Live API is the voice activity detection and the noise control algorithm.” Meng Wang, AI Head at Astra Tech |
Get started
Voice Live API is transforming how developers build voice-enabled agent systems by providing an integrated, scalable, and efficient solution. By combining speech recognition, generative AI, and text-to-speech functionalities into a unified interface, it addresses the challenges of traditional implementations, enabling faster development and superior user experiences. From streamlining customer service to enhancing education and public services, the opportunities are endless. The future of voice-first solutions is here—let’s build it together!
Voice Live API introduction (video)
Try Voice Live in Azure AI Foundry
Voice Live Agent code sample in GitHub