Introducing super realistic AI voices optimized for conversations
Published Sep 21 2023 01:48 AM 16.5K Views
Microsoft

Now, in human-bot conversational interactions, AI can produce more natural, fluent, and high-quality responses than ever before, thanks to the power of Large Language Models (LLMs) such as Azure OpenAI GPT. Consequently, when engaging in verbal conversations, the demand for naturalness and expressiveness in Text-to-Speech (TTS) voices is higher than ever. We are introducing these new voices specifically designed for conversational scenarios. Whether you are creating a speech-based chatbot, a voice assistant, or a conversational agent, these new voices will ensure your interactions are more realistic, lifelike, and engaging.

  

The new realistic voices are perfect matches for any application necessitating lifelike speech interactions, including chatbots, voice assistants, gaming, e-learning, entertainment, and more.

 

Meet new voices today optimized for conversational scenarios: zh-CN-YunjieNeural is in public preview in three regions: East US, South East Asia and West Europe. en-US-AndrewNeural, en-US-BrianNeural, en-US-EmmaNerual are now generally available on all regions, their multilingual versions are also generally available on March 2024 announcement.

 

Check out the voice samples

 

Demo of new voices in comparison with other voices

 

Hear how these voices sound in conversations, compared to other voices in the stock that are designed for more general purposes. 

 

Script

New voices optimized for conversations

Existing voices designed for general purpose

I can help you with a lot of things! I can answer questions, provide information on a wide range of topics, help you find things on the web, and more. If you have a specific question or task in mind, feel free to ask me and I'll do my best to assist you.

Emma

Jenny

I'm not sure what you're asking. If you're asking for a paraphrase of the sentence "I learn about myself that I can lead a team", then it means that the speaker has discovered that they have the ability to lead a team. Is there anything else I can help you with?

Andrew

Guy

风筝有风,海豚有海 ,而您有我,感谢您的光临。么么哒!

Yunjie

Yunxi

 

More samples

Script

New voice

I understand. It sounds like a place that is both impressive and terrifying. I wonder what kind of tea they serve there. Is it made from the sun's rays or from something else? And who are the people who live there? Are they loyal to the Empire or do they have their own agendas?

 Emma

Yes, that is what I said. A maximin strategy is the one that maximizes the minimum payoff of a player, regardless of what the other players do. It is a way of ensuring that the player gets at least a certain amount of payoff, even in the worst case scenario.

Andrew

If you can't find the information, you may want to consider contacting your state's insurance department. They may be able to help you locate any life insurance policies that were taken out on your husband. I hope this helps. Please let me know if you have any other questions.

Brian

好的,让我为您创建一个新的理赔单。请稍等。我已经为您创建了一个新的理赔单。我们会联系您安排修理您的车子。我们还会通过电子邮件给您发送一个链接,以便您可以上传您拍摄的照片。还有什么其他我可以帮助您的吗?

Yunjie

 

Demo of full conversation

 

Conversations between Andrew and Emma (in English):

 

Conversations between Yunjie and Xiaochen (in Chinese):

 

 

Integrate these new voices with Azure OpenAI

 

You can effortlessly incorporate these new neural Text-to-Speech (TTS) voices into your applications using the Azure Speech SDK or REST API. Additionally, you can employ the Azure Bot Framework to develop intelligent bots capable of utilizing these new neural TTS voices for speech synthesis.

To minimize latency during the integration of Large Language Models (LLMs) and TTS, it is advised to send text to the TTS service while the LLM is still generating a response. You can find a demo sample here that demonstrates generating TTS responses in a streaming manner.

 

Technology behind

 

We began by crafting the persona of each voice as if it were a real person who is friendly and optimistic about life, always eager to assist others and share intriguing or practical knowledge. The speaking style of the voice resembles a conversation with an acquaintance over a cup of tea, maintaining a natural and unexaggerated tone.

Furthermore, we continuously enhance our Text-to-Speech (TTS) modeling techniques to improve the quality of our AI voices. Our most recent projects, such as DelightfulTTS 2,  and MuLanTTS, have significantly narrowed the quality gap between AI voices and professional human recordings, producing more natural and realistic voices than ever before. These technological advancements serve as the foundation upon which these new AI voices are built.

 

Get started

 

Microsoft offers over 400 neural voices covering more than 140 languages and locales. With these Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users. In addition, with the Custom Neural Voice capability, you can easily create a brand voice for your business.

 

For more information

 

2 Comments
Version history
Last update:
‎Mar 29 2024 10:55 PM
Updated by: