Model Mondays S2E11: Exploring Speech AI in Azure AI Foundry

Sharda_Kaur

Iron Contributor

Aug 26, 2025

Welcome to Episode 11! This week, we dove into the world of speech AI using the Azure AI Foundry platform. From real-time transcription to custom avatars and multilingual voice agents, this episode showcased how you can bring natural, expressive speech capabilities to your apps—plus a real-world healthcare demo and lots of live Q&A.

1. Weekly Highlights

This week’s top news in the Azure AI ecosystem included:

Lakuna — Copilot Studio Agent for Product Teams:
A hackathon project built with Copilot Studio and Azure AI Foundry, Lakuna analyzes your requirements and docs to surface hidden assumptions, helping teams reflect, test, and reduce bias in product planning.
Azure ND H200 v5 VMs for AI:
Azure Machine Learning introduced ND H200 v5 VMs, featuring NVIDIA H200 GPUs (over 1TB GPU memory per VM!) for massive models, bigger context windows, and ultra-fast throughput.
Agent Factory Blog Series:
The next wave of agentic AI is about extensibility: plug your agents into hundreds of APIs and services using Model Connector Protocol (MCP) for portable, reusable tool integrations.
GPT-5 Tool Calling on Azure AI Foundry:
GPT-5 models now support free-form tool calling—no more rigid JSON! Output SQL, Python, configs, and more in your preferred format for natural, flexible workflows.
Microsoft a Leader in 2025 Gartner Magic Quadrant:
Azure was again named a leader for Cloud Native Application Platforms—validating its end-to-end runway for AI, microservices, DevOps, and more.

2. Spotlight On: Azure AI Foundry Speech Playground

The main segment featured a live demo of the new Azure AI Speech Playground (now part of Foundry), showing how developers can experiment with and deploy cutting-edge voice, transcription, and avatar capabilities.

Key Features & Demos:

Speech Recognition (Speech-to-Text):
Try real-time transcription directly in the playground—recognizing natural speech, pauses, accents, and domain terms.
Batch and Fast transcription options for large files and blob storage.
Custom Speech: Fine-tune models for your industry, vocabulary, and noise conditions.
Text to Speech (TTS):
Instantly convert text into natural, expressive audio in 150+ languages with 600+ neural voices.
Demo: Listen to pre-built voices, explore whispering, cheerful, angry, and more styles.
Custom Neural Voice: Clone and train your own professional or personal voice (with strict Responsible AI controls).
Avatars & Video Translation:
Bring your apps to life with prebuilt avatars and video translation, which syncs voice-overs to speakers in multilingual videos.
Voice Live API:
Voice Live API (Preview) integrates all premium speech capabilities with large language models, enabling real-time, proactive voice agents and chatbots.
Demo: Language learning agent with voice, avatars, and proactive engagement.
One-click code export for deployment in your IDE.

3. Customer Story: Hilo Health

This week’s customer spotlight featured Helo Health—a healthcare technology company using Azure AI to boost efficiency for doctors, staff, and patients.

How Hilo Uses Azure AI:

Document Management:
Automates fax/document filing, splits multi-page faxes by patient, reduces staff effort and errors using Azure Computer Vision and Document Intelligence.
Ambient Listening:
Ambient clinical note transcription captures doctor-patient conversations and summarizes them for easy EHR documentation.
Genie AI Contact Center:
Agentic voice assistants handle patient calls, book appointments, answer billing/refill questions, escalate to humans, and assist human agents—using Azure Communication Services, Azure Functions, FastAPI (community), and Azure OpenAI.
Conversational Campaigns:
Outbound reminders, procedure preps, and follow-ups all handled by voice AI—freeing up human staff.

Impact:
Hilo reaches 16,000+ physician practices and 180,000 providers, automates millions of communications, and processes $2B+ in payments annually—demonstrating how multimodal AI transforms patient journeys from first call to post-visit care.

4. Key Takeaways

Here’s what you need to know from S2E11:

Speech AI is Accessible:
The Azure AI Foundry Speech Playground makes experimenting with voice recognition, TTS, and avatars easy for everyone.
From Playground to Production:
Fine-tune, export code, and deploy speech models in your own apps with Azure Speech Service.
Responsible AI Built-In:
Custom Neural Voice and avatars require application and approval, ensuring ethical, secure use.
Agentic AI Everywhere:
Voice Live API brings real-time, multimodal voice agents to any workflow.
Healthcare Example:
Hilo’s use of Azure AI shows the real-world impact of speech and agentic AI, from patient intake to after-visit care.
Join the Community:
Keep learning and building—join the Discord and Forum.

Sharda's Tips: How I Wrote This Blog

I organize key moments from each episode, highlight product demos and customer stories, and use GitHub Copilot for structure. For this recap, I tested the Speech Playground myself, explored the docs, and summarized answers to common developer questions on security, dialects, and deployment.

Here’s my favorite Copilot prompt this week:

"Generate a technical blog post for Model Mondays S2E11 based on the transcript and episode details. Focus on Azure Speech Playground, TTS, avatars, Voice Live API, and healthcare use cases. Add practical links for developers and students!"

Coming Up Next Week

Next week: Observability!
Learn how to monitor, evaluate, and debug your AI models and workflows using Azure and OpenAI tools.

Register For The Livestream – Sep 1, 2025
Register For The AMA – Sep 5, 2025
Ask Questions & View Recaps – Discussion Forum

About Model Mondays

Model Mondays is your weekly Azure AI learning series:

5-Minute Highlights: Latest AI news and product updates
15-Minute Spotlight: Demos and deep dives with product teams
30-Minute AMA Fridays: Ask anything in Discord or the forum

Start building:

Join The Community

Don’t build alone! The Azure AI Developer Community is here for real-time chats, events, and support:

Join the Discord
Explore the Forum

About Me

I'm Sharda, a Gold Microsoft Learn Student Ambassador focused on cloud and AI. Find me on GitHub, Dev.to, Tech Community, and LinkedIn. In this blog series, I share takeaways from each week’s Model Mondays livestream.

Updated Aug 27, 2025

Version 4.0