Blog Post

AI - Azure AI services Blog
4 MIN READ

Introducing 7 new realistic AI voices optimized for conversations in 7 languages for public preview

GarfieldHe's avatar
GarfieldHe
Icon for Microsoft rankMicrosoft
Nov 03, 2023

By Gang Wang, Xi Wang, Lihui Wang, Qinying Liao, Garfield He, Lei He, Binggong Ding, and Sheng Zhao

 

Now, in human-bot conversational interactions, AI can produce more natural, fluent, and high-quality responses than ever before, thanks to the power of Large Language Models (LLMs) such as Azure OpenAI GPT. Consequently, when engaging in verbal conversations, the demand for naturalness and expressiveness in Text-to-Speech (TTS) voices is higher than ever. We are introducing these new voices specifically designed for conversational scenarios. Whether you are creating a speech-based chatbot, a voice assistant, or a conversational agent, these new voices will ensure your interactions are more realistic, lifelike, and engaging.

  

The new realistic conversational voices are perfect matches for any application necessitating lifelike speech interactions, including chatbots, voice assistants, gaming, e-learning, entertainment, and more.

 

Following the introduction of 3 English (United States) voices last month, we are introducing 7 more new voices for more locales on region East US/Southeast Asia/West Europe: French (Canada), French (France), German (Germany), Italian (Italy), Korean (Korea), Portuguese (Brazil), Spanish (Spain). In August 2024 announcement , we also feature 16 brand new multilingual voices optimized for conversation and enhancements to 14 existing ones such as upgrading es-ES female voice Ximena as multilingual voice , bringing the number of all multilingual voices to 61.

 

Examples of new voices

Locale

Language Region

Gender

Voice name

Script

Audio

de-DE

German (Germany)

Female

Seraphina

Wenn Sie die Informationen nicht finden können, sollten Sie vielleicht die Versicherungsabteilung Ihres Bundesstaates kontaktieren. Sie könnten Ihnen helfen, Lebensversicherungspolicen zu finden, die Ihr Mann abgeschlossen hat. Ich hoffe, das hilft Ihnen. Lassen Sie es mich wissen, wenn Sie weitere Fragen haben.

es-ES

Spanish (Spain)

Female

Ximena

Entiendo. Parece ser un lugar impresionante y aterrador al mismo tiempo. Me pregunto qué tipo de té sirven allí. ¿Está hecho con los rayos del sol o con algo más? ¿Y quiénes son las personas que viven allí? ¿Son leales al Imperio o tienen sus propias agendas?

fr-CA

French (Canada)

Male

Thierry

Je comprends. Cela semble être un endroit à la fois impressionnant et terrifiant. Je me demande quel type de thé ils servent là-bas. Est-il fait avec les rayons du soleil ou avec autre chose ? Et qui sont les personnes qui y vivent ? Sont-elles loyales à l'Empire ou ont-elles leurs propres agendas ?

fr-FR

French (France)

Female

Vivienne

Oui, c'est ce que j'ai dit. Une stratégie maximin est celle qui maximise le paiement minimum d'un joueur, peu importe ce que font les autres joueurs. C'est une façon de garantir que le joueur obtienne au moins un certain montant de paiement, même dans le pire des cas.

it-IT

Italian (Italy)

Male

Giuseppe

Capisco. Sembra essere un luogo impressionante e terrificante allo stesso tempo. Mi chiedo che tipo di tè servano lì. È fatto con i raggi del sole o con qualcos'altro? E chi sono le persone che ci vivono? Sono fedeli all'Impero o hanno le proprie agende?

ko-KR

Korean (Korea)

Male

Hyunsu

이해합니다. 인상적이고 동시에 무서운 곳인 것 같습니다. 그곳에서는 어떤 차를 파는지 궁금하네요. 태양 광선이나 다른 것으로 만들어진 것입니까? 그리고 그곳에 사는 사람들은 누구입니까? 그들은 제국에 충성합니까, 아니면 그들만의 계획을 가지고 있습니까?

pt-BR

Portuguese (Brazil)

Female

Thalita

Se você não conseguir encontrar as informações, talvez queira considerar entrar em contato com o departamento de seguros do seu estado. Eles podem ajudá-lo a localizar qualquer apólice de seguro de vida que tenha sido feita por seu marido. Espero que isso ajude. Por favor, me avise se tiver outras perguntas.

 

 

And also updated zh-CN Xiaoxiao voice with more natural speaking style:

Zh-CN

Mandarin Chinese (China)

Female

Xiaoxiao

当然可以,那我们来聊一聊音乐吧。音乐是一种无国界的艺术,可以跨越文化和语言的障碍触动人心。你有没有什么特别喜欢的歌手,乐队或者音乐风格呢?或者说你最近有没有听到什么新歌让你感觉到特别喜欢的? 嗯,虽然我没有情感和喜好,但是我知道很多的音乐和歌手。不同的歌手和风格呢,也代表了各种各样的文化和情感。

 

 

Additional Updates

Besides these new voices, we also updated 3 current voices with more expressive prosody.

 

Locale

Language Region

Gender

Voice name

Script

Current version

New Version

Es-ES

Spanish (Spain)

Male

Alvaro

La temperatura máxima de hoy será de 30 grados.

En-GB

English (United Kingdom)

Male

Ryan

I took the evening to work more on my business and work on my personal goals.

Ko-KR

Korean (Korea)

Male

Injoon

유기체론적 생각을 발전시켜 생물학에서의 시스템 이론을 개발하였다.

 

Integrate Azure TTS and Azure Open AI with low latency.

 

To minimize latency during the integration of Large Language Models (LLMs) like Azure Open AI Service and Azure TTS, it is advised to send text to the TTS service while the LLM is still generating a response. You can find a demo sample here that demonstrates generating TTS responses in a streaming manner. Also, you can use best practice in this article to reduce latency in general.  How to lower speech synthesis latency using Speech SDK - Azure AI services | Microsoft Learn

 

Get started.

 

Microsoft offers over 400 neural voices covering more than 140 languages and locales. With these Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users. In addition, with the Custom Neural Voice capability, you can easily create a brand voice for your business with professional voice clone.

 

For more information

Updated Nov 26, 2024
Version 11.0
  • LU0038's avatar
    LU0038
    Copper Contributor

    I honestly like these new ultra-realistic voices a lot and I hope you can add them some less spoken languages such as Russian, Mexican Spanish, Finnish, Arabic and others.

  • Paroissien's avatar
    Paroissien
    Copper Contributor

    Thank you! The possibilities are endless.

    Please note:

    Vivienne or Seraphina (those I use) switch randomly from one separator to the next between English, Spanish and French when reading single or small groups of words.

    Vivienne missreads  猕 mi2 for 猴 hou4 in Mandarin (maybe because they come together in usage)

    Both Seraphina and Vivienne can't read more than a couple of thousand (single) Chinese characters (many more compounds with great accuracy).

    In Edge they suddenly stop on "&" or jump a line.

    Please post the metalanguage instructions, various reserved signs and their usage.

    Your doing a great job and progressing at a dizzying speed!

  • LU0038's avatar
    LU0038
    Copper Contributor

    GarfieldHe I'm afraid there are some problems with the following voices: Giuseppe (Italian) and Hyunsu (Korean), when I tested them on the Speech Studio page, it just gives out a random female voice. I think the voice is corrupted. Could you please double-check the issue with these voices and fix them? Thanks.

  • LU0038 thank for the report, there is a bug and will be fixed soon, thanks again for sharing with us!