Embedded Speech is now generally available

Microsoft

Nov 15, 2023

We're excited to announce that Azure AI Embedded Speech is now generally available! Embedded speech is designed for on-device speech to text and text to speech scenarios where cloud connectivity is intermittent or unavailable. It provides an additional way for customers to access Azure AI Speech beyond Azure cloud and connected/disconnected containers.

Scenarios and use-cases

Embedded speech is suitable for delivering speech-enabled experiences across a wide range of scenarios. For example:

Enhance the user experience in automotive scenarios where there is intermittent connectivity, such as voice navigation or voice control of car features.
Comply with strict regulations on data privacy in industrial scenarios, such as voice commands for factory workers or hands-free notetaking in lab environments.
Improve accessibility and inclusion in scenarios where users have limited hearing and/or mobility, such as screen reading, real-time captioning of audio, or voice output of alerts.

Use the same technology that powers accessibility on Windows 11

Windows 11 uses the same embedded speech technology, both speech to text and text to speech, to power accessibility experiences like Live Captions, Voice Access, Voice Typing, and Narrator. You can experience the quality of embedded speech today on your Windows 11 PC: Windows 11 Accessibility Features | Microsoft.

Hybrid or on-device only

Embedded speech can be used in two ways:

Hybrid – use Azure AI Speech via Azure cloud whenever network is available and fall back to embedded speech only when network is not available. This mode ensures the best possible speech quality and accuracy while also providing a robust fallback option.
On-device only – always use embedded speech regardless of the network availability. This mode ensures the highest level of data privacy and security.

The Speech SDK provides built-in support for hybrid usage based on network connectivity and latency optimizations. We encourage customers to determine the best approach for their scenarios and user experiences.

Language support

At the time of general availability, speech to text is offered for 21 locales: German (Germany), English (Australia), English (Canada), English (United Kingdom), English (Ireland), English (India), English (New Zealand), English (United States), Spanish (Spain), Spanish (Mexico), French (Canada), French (France), Italian (Italy), Japanese (Japan), Korean (Republic of Korea), Danish (Denmark), Portuguese (Brazil), Portuguese (Portugal) Chinese (China, Simplified script), Chinese (Hong Kong, Traditional script), and Chinese (Taiwan, Traditional script).

We’re continuously improving locale coverage for our speech to text models and would love to hear your input on the locales needed for your scenarios.

Text to speech is offered for all 147 locales that are also supported by cloud TTS. You can find the full list of supported voices here: https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#text-to-speech.

Getting started

Access to embedded speech is limited and requires an application at https://aka.ms/csgate-embedded-speech. For scenarios where your devices must be in a secure environment like a bank or government entity, we encourage you to first explore containers for Azure AI Speech.

Learn more about embedded speech

We hope you are as excited as we are about the general availability release of Azure AI embedded speech! To learn more about embedded speech, please visit: https://docs.microsoft.com/azure/cognitive-services/speech-service/embedded-speech.

Updated Nov 15, 2023

Version 1.0

hasyashah

Microsoft

Joined May 03, 2021

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

Blog Post

Embedded Speech is now generally available

2 Comments