Blog Post

Microsoft Foundry Blog
5 MIN READ

Public preview: Voice-native agents in Microsoft Foundry

QinyingLiao's avatar
QinyingLiao
Icon for Microsoft rankMicrosoft
Mar 16, 2026

We’re excited to introduce the native support of Voice Live with the new Foundry Agent Service.

We’re excited to introduce the native support of Voice Live with the new Foundry Agent Service, making it dramatically easier to build real-time, speech-to-speech AI agents on Azure.

This integration brings together agent orchestration and real-time voice interaction into a unified developer experience — helping you to easily build production-ready voice-native agents that delivers natural conversational AI experiences.

Shift toward real-time AI agents

AI interaction is rapidly evolving beyond chat interfaces. Users increasingly expect systems that can listen, reason, and respond instantly in natural voices — more like collaborators than tools.

Until now, building a production-grade voice agent required assembling multiple components: Speech recognition (STT), language reasoning and agent logic, tool orchestration, speech synthesis (TTS) and real-time streaming infrastructure, etc.

Developers often had to manage latency, synchronization, and conversational state across separate services.

With the General availability (GA) of the new Foundry Agent Service, we are also introducing the public preview of the native support of Voice Live API, reducing engineering complexity and accelerating development for voice-enabled agents.

The new Foundry Agent Service is a redesigned API format and runtime to help teams build and operate agents that move from prototype to production with confidence, providing out-of-the-box evaluators, custom evaluators, and continuous evaluation to systematically measure quality, groundedness, safety, and risk across the AI lifecycle.

Voice Live is a unified, real-time API that provides premium speech capabilities with foundation models to better support speech-to-speech conversation experience in use cases like voice agents, assistants, and chatbots. This comprehensive, managed service consolidates multiple functionalities like speech recognition, text-to-speech, conversational enhancement including natural turn detection, interruption and avatars into a single API call, making it ideal for developers seeking low-latency, high-quality voice experiences.

  • Broad locale coverage: Supports over 140 locales with more than 700 out of box voices available, among which 40+ are highly natural conversational voices optimized with the latest neural HD models.
  • Advanced conversational enhancement features: Ensures smooth and natural interactions with Noise Suppression that reduces environmental noise, making conversations clearer even in busy settings, Echo Cancellation that prevents the agent from picking up its own audio responses, avoiding feedback loops, Robust Interruption Detection that accurately identifies interruptions during conversations and Advanced End-of-Turn Detection that allows natural pauses without prematurely concluding interactions.
  • TTS Avatar integration: Provides avatars synchronized with audio output, offering a visual identity for voice agents.
  • Customization: Design unique, brand-aligned voices for audio output and customized avatars to reinforce brand identity. Improve speech input accuracy with phrase lists and a fine-tuned custom speech model.

With the full integration of Voice Live to the new Foundry Agent Service, developers can benefit from:

  • Faster Time to Value: Pre-integrated services eliminate custom glue code, allowing teams to focus on designing intelligent experiences instead of infrastructure.
  • Real-Time, Natural Conversations: Voice Live API enables low-latency streaming speech input and output, supporting fluid back-and-forth dialogue rather than turn-based interaction.
  • Built-In Agent Reasoning: Foundry Agent Service handles planning, reasoning, and tool execution so voice interactions can trigger real business workflows.
  • Enterprise-Ready Foundation: enterprise-grade security and compliance, scalable cloud infrastructure, unified deployment and management.

Example customer scenarios

Voice is becoming a first-class interface for AI. At the same time, agents are becoming the primary abstraction for intelligent applications.

Voice-enabled AI agents are transforming customer service across industries by offering natural, conversational experiences that blend human interaction with automated efficiency. These agents can respond to inquiries, access account details, and take actions—all while keeping track of ongoing conversations, leading to faster resolutions and greater customer satisfaction. This includes voice-enabled multi-agent systems for real-time self-service support, and solutions that handle user requests or escalate issues to human representatives as needed. Customers can access these agents through phone calls, web chats, or front-line assistant platforms.

Organizations can also embed voice agents into workflows as business operation agents to summarize meetings, retrieve internal knowledge, or automate operational tasks. Personal assistants offer a speech-focused way for users who depend on voice interfaces, making access feel more intuitive and natural. Public Services are developing voice agents to assist citizens with administrative queries, public service information, appointment scheduling and more. Automotive companies are building hands-free, in-car voice assistants for command execution, navigation assistance and general inquiries.

Gulf Air, the national carrier of Bahrain, utilizes Voice Live and Foundry Agent services to develop virtual assistants that support business operations, thereby improving status monitoring and enhancing decision-making processes within the corporate environment.

"Gulf Air’s Falcon Eye is a real-time operational intelligence platform that unifies live flight, passenger, crew, revenue, catering, connection, weather, and disruption data into a single command view. Integrated with Microsoft Foundry Voice Live and Microsoft Foundry Agents, the real-time conversational capability enables executives, including the CEO, leadership team, and operational management, to speak naturally and receive immediate, accurate spoken answers grounded in live operational data. The system can also open dashboards and analyses through voice-driven navigation, transforming airline data from static reporting into a conversational mission-control co-pilot that supports faster, confident decision-making in real time. "

Ahmed Naeemi, Chief Information Officer, Technology and Digital Services

By combining live voice capabilities with agent orchestration, Azure enables developers to move from simple conversational bots to interactive AI systems capable of reasoning and acting in real time.

This integration lowers the barrier to building multimodal, agent-driven applications and accelerates adoption across enterprise scenarios.

How it works

At a high level, the integration connects real-time audio streaming with agent reasoning:

  1. User speaks → audio streams through Voice Live API
  2. Speech is processed in real time → converted into conversational input
  3. Agent Service reasons and calls tools → executes workflows or retrieves knowledge
  4. Response is generated and synthesized → streamed back as natural speech

📌 Architecture tip: Think of Voice Live API as the real-time interaction layer and Agent Service as the intelligence and orchestration layer.

 

 

With the new Foundry portal, it’s straightforward to enable Voice Live with any agent you build.

Once you have an agent built, switch the Voice mode toggle On. Your agent now connects to Voice Live. You can configure Voice Live settings including languages, voices, avatars and the audio enhancement features in the right pane.

You can easily run the quickstart sample to test live conversations. Follow the step-by-step guide here for Python, C#, Java and JavaScript. If you're using Voice Live with Agent Service (classic), follow the instructions here to migrate to the new Foundry Agent Service

Use this Call Center Voice Agent solution template to build real-time speech-to-speech voice agents that integrates seamlessly with telephony systems.

Get started

With out-of-box integration between the new Foundry Agent Service and Voice Live API, building intelligent voice agents no longer requires complex system assembly. Developers can now prototype quickly, integrate enterprise workflows, and deploy scalable conversational experiences using Azure AI.

We’re excited to see what you build — try the quickstart today and share your feedback with the community.

Voice Live API Overview

How to use the Voice Live API

How to build a voice agent

 

Updated Mar 16, 2026
Version 2.0
No CommentsBe the first to comment