Blog Post

Microsoft Foundry Blog
2 MIN READ

Voice Live API now supports WebRTC (Preview)

ArcherZ's avatar
ArcherZ
Icon for Microsoft rankMicrosoft
Apr 30, 2026

We are thrilled to announce that Voice Live API now supports WebRTC (Web Real-Time Communication) connection, enabling low‑latency, real‑time voice interactions directly from web and mobile clients.

Why WebRTC is required in real-time voice agent building

WebRTC enables real‑time, bi‑directional audio and video streaming directly in the browser without plugins or native installs. Unlike WebSocket, which treat audio as generic data and require custom buffering and timing logic, WebRTC is purpose‑built for media‑aware streaming needed for responsive conversational experiences. Specifically:

  • Lower latency: WebRTC is designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization.
  • Built-in media handling: WebRTC has built-in support for audio and video codecs, providing optimized handling of media streams.
  • Network resilience: WebRTC includes mechanisms for handling packet loss and jitter, which are essential for maintaining the quality of audio streams over unpredictable networks.

How to set up WebRTC in Voice Live

In a typical setup, the client establishes a WebSocket‑based control channel with the Voice Live API to exchange SDP offer and answer messages required for WebRTC session negotiation. Once negotiation completes, audio is transmitted over WebRTC RTP media tracks.

Non‑audio events, such as voice activity and response lifecycle signals, are exchanged over WebRTC data channels alongside the media streams. Session configuration, control‑plane messages, and error notifications are delivered through the WebSocket control channel.

 

When initiating a WebRTC call session, simply use the voice-live/realtime/calls endpoint instead of voice-live/realtime. For example:

wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime/calls?api-version=2026-01-01-preview&model=gpt-realtime

 

For more information, see the step-by-step instruction.

Learn more

Voice Live API is transforming how developers build voice-enabled agent systems by providing an integrated, scalable, and efficient solution. By combining speech recognition, generative AI, and text-to-speech functionalities into a unified interface, it addresses the challenges of traditional implementations, enabling faster development and superior user experiences. From streamlining customer service to enhancing education and public services, the opportunities are endless. The future of voice-first solutions is here—let’s build it together!

 

Voice Live API introduction (video)

Try Voice Live in Azure AI Foundry

Voice Live API documents

Voice Live quickstart

Voice Live Agent code sample in GitHub

Updated Apr 29, 2026
Version 1.0
No CommentsBe the first to comment