We're excited to announce that Azure AI Embedded Speech is now generally available! Embedded speech is designed for on-device speech to text and text to speech scenarios where cloud connectivity is intermittent or unavailable. It provides an additional way for customers to access Azure AI Speech beyond Azure cloud and connected/disconnected containers.
Scenarios and use-cases
Embedded speech is suitable for delivering speech-enabled experiences across a wide range of scenarios. For example:
Enhance the user experience in automotive scenarios where there is intermittent connectivity, such as voice navigation or voice control of car features.
Comply with strict regulations on data privacy in industrial scenarios, such as voice commands for factory workers or hands-free notetaking in lab environments.
Improve accessibility and inclusion in scenarios where users have limited hearing and/or mobility, such as screen reading, real-time captioning of audio, or voice output of alerts.
Use the same technology that powers accessibility on Windows 11
Windows 11 uses the same embedded speech technology, both speech to text and text to speech, to power accessibility experiences like Live Captions, Voice Access, Voice Typing, and Narrator. You can experience the quality of embedded speech today on your Windows 11 PC: Windows 11 Accessibility Features | Microsoft.
Hybrid or on-device only
Embedded speech can be used in two ways:
Hybrid – use Azure AI Speech via Azure cloud whenever network is available and fall back to embedded speech only when network is not available. This mode ensures the best possible speech quality and accuracy while also providing a robust fallback option.
On-device only – always use embedded speech regardless of the network availability. This mode ensures the highest level of data privacy and security.
The Speech SDK provides built-in support for hybrid usage based on network connectivity and latency optimizations. We encourage customers to determine the best approach for their scenarios and user experiences.
At the time of general availability, speech to text is offered for 21 locales: German (Germany), English (Australia), English (Canada), English (United Kingdom), English (Ireland), English (India), English (New Zealand), English (United States), Spanish (Spain), Spanish (Mexico), French (Canada), French (France), Italian (Italy), Japanese (Japan), Korean (Republic of Korea), Danish (Denmark), Portuguese (Brazil), Portuguese (Portugal) Chinese (China, Simplified script), Chinese (Hong Kong, Traditional script), and Chinese (Taiwan, Traditional script).
We’re continuously improving locale coverage for our speech to text models and would love to hear your input on the locales needed for your scenarios.
Access to embedded speech is limited and requires an application at https://aka.ms/csgate-embedded-speech. For scenarios where your devices must be in a secure environment like a bank or government entity, we encourage you to first explore containers for Azure AI Speech.