This post is co-authored with Garfield He, Binggong Ding, Jacky Kang
Speech is the most natural form of communication for human beings. With the development of AI & speech technologies in past years, our computers & mobile phones start to listen and speak human languages. At Microsoft, we keep our mission to empower every organization and everyone on the planet to achieve more, without the language barrier. Through Azure Cognitive Services, we bring start-of-the-art speech technologies to every developer, making the AI product smart with new capabilities. Language Expansion is one of our top missions. Despite where you are in the world, what language you speak, our STT (Speech-to-text) will recognize and TTS (Text to Speech) will speak your language, with diversity and inclusion.
Azure Speech Use Cases
As part of the Azure Cognitive Service, speech technology boosts human productivity & efficiency. It is being integrated into most aspects of our life and work. Azure STT and TTS have been widely applied & expanded in various scenarios global wide, with growing market expansion. In our daily meetings, we utilize captioning that converts human speech with different languages into a text transcript of who said what and translate into other languages in real-time. In Contact Centers, automatic speech transcription result generated from large audio recording data can significantly benefit Post Call Analytics and save huge human efforts. On consumer devices, from every smart mobile on your hand to the latest cutting-edge electric vehicle driving on the road, we have voice assistants to listen and speak your language. This will free your hands and execute tasks precisely with STT & TTS capabilities. TTS may also help people to “read” news, articles, and audio books with Read Aloud feature with your browser. It is also an efficient way for learning and education. Both STT & TTS also help people with disabilities, amplify human capability for millions of people around the world with power of AI.
Expansion of STT and TTS
Four years ago, Microsoft Speech kicked off a continuous expansion plan. Since then, we always strive to ship more speech models and voices to our developers and customers, ship more than 30 new languages & varieties per year. Now we are excited to announce our leading advantage of language coverage in the market on both STT & TTS. Azure STT & TTS now supports 139 & 140 languages and varieties respectively. With the latest modeling technology and data evolution we are confident to continuously expand language coverage to bring our speech capability to every corner around the world. From the past year, we continued to expand languages & varieties across Europe, Asia & Africa. That helps an additional 700+ million speakers to use our speech ability. For detailed released time please refer to Release notes - STT & Release notes - TTS. Full list in Language support - Speech service.
There are thousands of endangered minority languages all over the world. They constitute cultural good with historical & social value and impact from human. Although governments & communities are saving languages with cultural heritage, large number of languages are still being declined recent years. In Microsoft, language and cultural preservation is one of our top goals. From past period, we have released supports on minority languages that suffered decline. For example: Welsh (Cymraeg), Celtic language that is native to Welsh people and first local language from the history; Galician, Romance language spoken in Galicia, northwestern Spain and other regions. Our research team talks with native speakers to learn their local culture and knowledge, enable such minority languages in our Azure Speech with both STT & TTS capabilities. Their speakers may now speak and listen with Azure Speech technology and be proud of their culture.
User Experience and Customization
Unified user experience is also one key feature Azure speech offered. With state-of-the-art multilingual modeling architecture and transfer learning method, all STT new languages were developed based on big data and learnt acoustics and language knowledge among languages. The models support both dictation and conversation scenarios, across all common language domains. Display format features including ITN (Inverse Text Normalization), Capitalization (if applicable), Implicit punctuation (automatically generate basic punctuations) to improve readability. Developers may integrate with either Speech SDK with Realtime Streaming API or Batch Transcription, then apply all shipped languages with same set of configs, and experience benefit unified model brings. On TTS, all new languages are built with the latest Neural technology to bring the most natural & unified voice experience. Since last December, Azure Neural TTS has been updated with UniTTSv4 model which shows no significant difference to natural human recording at sentence level using MOS as metrics. On customization & personalization, STT all new languages support adaptation from Custom Speech, which allows further boost model performance from specific domain. Custom Neural Voice offers unique brand voices in multiple languages and styles for every organization and individual. Across STT & TTS, languages parity delivers a unified speech input and output experience.
Azure Speech Service transcribes millions of hours of speech and generates over 400 neural voices with tens of billions of characters per day. Get started with a free trial today with STT Sample Tool & TTS Voice Gallery from Speech Studio. Developers can also get an Azure subscription and integrate code from Speech SDK (github.com). For more speech features, please visit our official documentation.