By Gang Wang, Xi Wang, Lihui Wang, Qinying Liao, Garfield He, Lei He, Binggong Ding, and Sheng Zhao
Now, in human-bot conversational interactions, AI can produce more natural, fluent, and high-quality responses than ever before, thanks to the power of Large Language Models (LLMs) such as Azure OpenAI GPT. Consequently, when engaging in verbal conversations, the demand for naturalness and expressiveness in Text-to-Speech (TTS) voices is higher than ever. We are introducing these new voices specifically designed for conversational scenarios. Whether you are creating a speech-based chatbot, a voice assistant, or a conversational agent, these new voices will ensure your interactions are more realistic, lifelike, and engaging.
The new realistic conversational voices are perfect matches for any application necessitating lifelike speech interactions, including chatbots, voice assistants, gaming, e-learning, entertainment, and more.
Following the introduction of 3 English (United States) voices last month, we are introducing 7 more new voices for more locales on region East US/Southeast Asia/West Europe: French (Canada), French (France), German (Germany), Italian (Italy), Korean (Korea), Portuguese (Brazil), Spanish (Spain). In August 2024 announcement , we also feature 16 brand new multilingual voices optimized for conversation and enhancements to 14 existing ones such as upgrading es-ES female voice Ximena as multilingual voice , bringing the number of all multilingual voices to 61.
Examples of new voices
Locale |
Language Region |
Gender |
Voice name |
Script |
Audio |
de-DE |
German (Germany) |
Female |
Seraphina |
Wenn Sie die Informationen nicht finden können, sollten Sie vielleicht die Versicherungsabteilung Ihres Bundesstaates kontaktieren. Sie könnten Ihnen helfen, Lebensversicherungspolicen zu finden, die Ihr Mann abgeschlossen hat. Ich hoffe, das hilft Ihnen. Lassen Sie es mich wissen, wenn Sie weitere Fragen haben. |
|
es-ES |
Spanish (Spain) |
Female |
Ximena |
Entiendo. Parece ser un lugar impresionante y aterrador al mismo tiempo. Me pregunto qué tipo de té sirven allí. ¿Está hecho con los rayos del sol o con algo más? ¿Y quiénes son las personas que viven allí? ¿Son leales al Imperio o tienen sus propias agendas? |
|
fr-CA |
French (Canada) |
Male |
Thierry |
Je comprends. Cela semble être un endroit à la fois impressionnant et terrifiant. Je me demande quel type de thé ils servent là-bas. Est-il fait avec les rayons du soleil ou avec autre chose ? Et qui sont les personnes qui y vivent ? Sont-elles loyales à l'Empire ou ont-elles leurs propres agendas ? |
|
fr-FR |
French (France) |
Female |
Vivienne |
Oui, c'est ce que j'ai dit. Une stratégie maximin est celle qui maximise le paiement minimum d'un joueur, peu importe ce que font les autres joueurs. C'est une façon de garantir que le joueur obtienne au moins un certain montant de paiement, même dans le pire des cas. |
|
it-IT |
Italian (Italy) |
Male |
Giuseppe |
Capisco. Sembra essere un luogo impressionante e terrificante allo stesso tempo. Mi chiedo che tipo di tè servano lì. È fatto con i raggi del sole o con qualcos'altro? E chi sono le persone che ci vivono? Sono fedeli all'Impero o hanno le proprie agende? |
|
ko-KR |
Korean (Korea) |
Male |
Hyunsu |
이해합니다. 인상적이고 동시에 무서운 곳인 것 같습니다. 그곳에서는 어떤 차를 파는지 궁금하네요. 태양 광선이나 다른 것으로 만들어진 것입니까? 그리고 그곳에 사는 사람들은 누구입니까? 그들은 제국에 충성합니까, 아니면 그들만의 계획을 가지고 있습니까? |
|
pt-BR |
Portuguese (Brazil) |
Female |
Thalita |
Se você não conseguir encontrar as informações, talvez queira considerar entrar em contato com o departamento de seguros do seu estado. Eles podem ajudá-lo a localizar qualquer apólice de seguro de vida que tenha sido feita por seu marido. Espero que isso ajude. Por favor, me avise se tiver outras perguntas. |
And also updated zh-CN Xiaoxiao voice with more natural speaking style:
Zh-CN |
Mandarin Chinese (China) |
Female |
Xiaoxiao |
当然可以,那我们来聊一聊音乐吧。音乐是一种无国界的艺术,可以跨越文化和语言的障碍触动人心。你有没有什么特别喜欢的歌手,乐队或者音乐风格呢?或者说你最近有没有听到什么新歌让你感觉到特别喜欢的? 嗯,虽然我没有情感和喜好,但是我知道很多的音乐和歌手。不同的歌手和风格呢,也代表了各种各样的文化和情感。
|
Additional Updates
Besides these new voices, we also updated 3 current voices with more expressive prosody.
Locale |
Language Region |
Gender |
Voice name |
Script |
Current version |
New Version |
Es-ES |
Spanish (Spain) |
Male |
Alvaro |
La temperatura máxima de hoy será de 30 grados. |
||
En-GB |
English (United Kingdom) |
Male |
Ryan |
I took the evening to work more on my business and work on my personal goals. |
||
Ko-KR |
Korean (Korea) |
Male |
Injoon |
유기체론적 생각을 발전시켜 생물학에서의 시스템 이론을 개발하였다. |
Integrate Azure TTS and Azure Open AI with low latency.
To minimize latency during the integration of Large Language Models (LLMs) like Azure Open AI Service and Azure TTS, it is advised to send text to the TTS service while the LLM is still generating a response. You can find a demo sample here that demonstrates generating TTS responses in a streaming manner. Also, you can use best practice in this article to reduce latency in general. How to lower speech synthesis latency using Speech SDK - Azure AI services | Microsoft Learn
Get started.
Microsoft offers over 400 neural voices covering more than 140 languages and locales. With these Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users. In addition, with the Custom Neural Voice capability, you can easily create a brand voice for your business with professional voice clone.
For more information
- Try our demo to listen to existing neural voices
- Add Text-to-Speech to your apps today
- Apply for access to Custom Neural Voice
- Join Discord to collaborate and share feedback