Azure Text-to-Speech updates at //Build 2021
Published May 25 2021 08:11 AM 182K Views
Microsoft

By: Garfield He, Melinda Ma, Melissa Ma, Bohan Li, Qinying Liao, Sheng Zhao, Yueying Liu

 

Text to Speech (TTS), part of Speech in Azure Cognitive Services, enables developers to convert text to lifelike speech for more natural interfaces with a rich choice of prebuilt voices and powerful customization capabilities. At the //Build 2021 conference, we are excited to announce several new features and improvements to TTS that address a variety of needs from customers globally.

 

Cross-lingual adaptation (preview) enables the same voice to speak multiple languages

 

Now more than ever, developers are expected to build voice-enabled applications that can reach a global audience. With the same voice persona across languages, organizations can keep their brand image more consistent. To support the growing need for a single voice to speak multiple languages, particularly in scenarios such as localization and translation, a multi-lingual neural TTS voice is brought out in public preview.

 

This new Jenny Multilingual voice (preview), with US English as the primary/default language, can speak 13 secondary languages, each at the fluent level: German (Germany), English (Australia), English (Canada), English (Canada), Spanish (Spain), Spanish (Mexico), French (Canada), French (France), Italian (Italy), Japanese (Japan), Korean (Korea), Portuguese (Brazil), Chinese (Mandarin, Simplified).

 

Hear how Jenny Multilingual speaks different languages in the samples below:

 

Locale

Language

Sample script

Audio

en-US

English (United States) – the default language

We look forward to working with you!

de-DE

German (Germany)

Wir freuen uns auf die Zusammenarbeit mit Ihnen!

en-AU

English (Australia)

We look forward to working with you!

en-CA

English (Canada)

We look forward to working with you!

en-GB

English (United Kingdom)

We look forward to working with you!

es-ES

Spanish (Spain)

¡Esperamos trabajar con usted!

es-MX

Spanish (Mexico)

¡Esperamos trabajar con usted!

fr-CA

French (Canada)

Nous avons hâte de travailler avec vous !

fr-FR

French (France)

Nous avons hâte de travailler avec vous !

it-IT

Italian (Italy)

Non vediamo l'ora di lavorare con voi!

ja-JP

Japanese (Japan)

私たちはあなたと一緒に働くことを楽しみにしています!

ko-KR

Korean (Korea)

우리는 당신과 함께 협정 하는 거를 기대합니다.

pt-BR

Portuguese (Brazil)

Será um prazer trabalhar com você!

zh-CN

Chinese (Mandarin, Simplified)

我们期待与您合作!

 

With this new voice, developers can easily enable their applications to speak multiple languages, without changing the persona. Learn how to use the multi-lingual capability of the voice with SSML.

 

What’s more, this powerful feature is also available in public preview on Custom Neural Voice, allowing customers to build a natural-sounding one-of-a-kind voice that speaks different languages. Custom Neural Voice has enabled a number of global companies to build realistic voices that resonate with their brands. For example, BBC, Swisscom,  AT&T and Duolingo.

 

This cross-lingual adaptation feature (preview) brings new opportunities to light up more compelling scenarios. For example, developers can enable an English virtual assistant’s voice to speak German fluently so the bot can read movie titles in German; or, create a game with the same non-player characters speaking different languages to users from different geographies. In this demo, Julia White presents a keynote in a mixed-reality world, using her virtual voice in Japanese, trained from English data.

 

Below samples show a custom neural voice in different languages, trained from the same speaker’s voice data in UK English.

 

Human recording as the training data (UK English): 

Data language

Recording sample

Audio

English (United Kingdom)

Is it a copy, or do you need it back?

 

TTS output samples in other 13 other languages:  

TTS locale

Language

TTS sample

Audio

de-DE

German (Germany)

Zwei der vier Eingänge sind nun geöffnet.

en-AU

English (Australia)

I've seen this movie already.

en-CA

English (Canada)

I am looking forward to the exciting things.

en-GB

English (United Kingdom)

The docking was a fully automated process.

en-US

English (United States)

I've seen this movie already.

es-ES

Spanish (Spain)

El acoplamiento era un proceso totalmente automatizado.

es-MX

Spanish (Mexico)

Estoy deseando que lleguen las cosas emocionantes.

fr-CA

French (Canada)

Deux des quatre entrées sont maintenant ouvertes.

fr-FR

French (France)

Deux des quatre entrées sont maintenant ouvertes.

it-IT

Italian (Italy)

Due dei quattro ingressi sono ora aperti.

ja-JP

Japanese (Japan)

私はすでにこの映画を見てきました。

ko-KR

Korean (Korea)

네개의 입구 중 두개가 이제 열려 있습니다.

pt-BR

Portuguese (Brazil)

Eu já vi esse filme.

zh-CN

Chinese (Mandarin, Simplified)

我已经看过这部电影了。

 

The cross-lingual adaptation feature (preview) on Custom Neural Voice, is available in the latest Speech Studio,  a UI-based experience that allows developers to explore the Speech service with no-code tools and enables them to customize various aspects of the Speech service in a guided experience.

 

As part of Microsoft’s commitment to responsible AI, Custom Neural Voice is available with limited access. Check more details on how to apply and use Custom Neural Voice in this video.

 

Neural text-to-speech supports 10 more languages

 

We are glad to announce that neural TTS is extended to support 10 more languages and 32 new voices. With this update, Azure neural TTS now provides developers with more than 250 voices available across 70+ languages and variances. Check the full languages and voices.

GarfieldHe_2-1621932439143.png

GarfieldHe_3-1621932575567.png

 

The 10 languages newly released are: `en-HK` English (Hongkong), `en-NZ` English (New Zealand), `en-SG` English (Singapore), `en-ZA` English (South Africa), `es-AR` Spanish (Argentina), `es-CO` Spanish (Columbia), `es-US` Spanish (US), `gu-IN` Gujarati (India), `mr-IN` Marathi (India) and `sw-KE` Swahili (Kenya).

 

Locale

Language

Gender

Voice

Sample script

Audio

en-HK

English (Hongkong)

Male

Sam

The time is 12:05 PM.

en-HK

English (Hongkong)

Female

Yan

We discussed buying a motorhome last night.

en-NZ

English (New Zealand)

Male

Mitchell

The development site is situated within a new 12 kilometre touristic zone.

en-NZ

English (New Zealand)

Female

Molly

You need to use about 10 grammes of sugar.

en-SG

English (Singapore)

Male

Wayne

How long does it take to reheat pot roast?

en-SG

English (Singapore)

Female

Luna

For the two friends, this was their first mission together.

en-ZA

English (South Africa)

Female

Leah

We have to be there at 6 in the morning.

en-ZA

English (South Africa)

Male

Luke

The role of business has never been as important as it is today.

es-AR

Spanish (Argentina)

Male

Tomas

Estoy leyendo un blog de viajes.

es-AR

Spanish (Argentina)

Female

Elena

El fin de amar es sentirse más vivo.

es-CO

Spanish (Columbia)

Male

Gonzalo

Hoy me voy de rumba con mis compañeros.

es-CO

Spanish (Columbia)

Female

Salome

¿Usted conoce a la profesora del salón 101?

es-US

Spanish (US)

Male

Alonso

Vamos por unos tequilas mi hermano.

es-US

Spanish (US)

Female

Paloma

Quiero comer frijoles con bistec.

gu-IN

Gujarati (India)

Female

Dhwani

ગુજરાતના સૌરાષ્ટ્ર વિસ્તારમાં ગીરનું જંગલ આવેલું છે, જે એશીયાઇ સિંહો માટે પ્રખ્યાત છે.

gu-IN

Gujarati (India)

Male

Niranjan

ગુજરાત ભારતના પશ્ચિમ તટે આવેલું રાજ્ય છે અને તે પશ્ચિમે અરબી સમુદ્રથી ઘેરાયેલું છે.

mr-IN

Marathi (India)

Female

Aarohi

गडचिरोली जिल्हा महाराष्ट्र राज्याच्या उत्तर-पूर्व दिशेला वसलेला असून, तेलंगणा आणि छत्तीसगड राज्याच्या सीमेला लागून आहे.

mr-IN

Marathi (India)

Male

Manohar

गोदावरी नदीची गणना भारतातील प्रमुख नद्यांमध्ये केली जाते. या नदीला दक्षिण गंगा असे ही म्हंटले जाते.

sw-KE

Swahili (Kenya)

Female

Zuri

Usiwe na wasiwasi; nitakuwa pamoja nawe siku zote.

sw-KE

Swahili (Kenya)

Male

Rafiki

Starehe ni mojawapo ya shule zinazofanya vyema zaidi barani Afrika.

 

You can also check out these voices in our demo on Azure or through the Audio Content Creation tool with your own text.

 

More neural voices are available in English

 

Different voice characteristics are often expected in different use cases. For example, when creating a customer service bot, developers may prefer a voice that sounds professional, mature, and experienced. While building an app that reads stories to kids, developers may want to use a kid voice, so it better resonate with the audience.

 

Here we introduce 11 new neural voices in public preview that are recently added to the US English portfolio, enabling developers to create even more appealing read-aloud and conversational experiences in different voices. These new voices are distributed across different age groups, including a kid voice, and with different voice timbre, to meet customers’ requirements on the voice variety. Together with Aria, Jenny, and Guy, we now offer 14 neural TTS voices in US English.

 

Gender

Voice

Sample script

Audio

Female

Ashley

The forecast for tomorrow in Austin shows partly sunny skies with a high of 92 and a low of 76.

Male

Brandon

It seems clear that SpaceX has a significant lead over its competitors in the commercial space industry.

Female

Michelle

Cooking is not about fast or slow, it is about truth.

Male

Eric

The latest round of stimulus checks was issued in the form of prepaid debit cards.

Female

Cora

We do think this is a substantial change, over time.

Female

Elizabeth

Attention Please, Passengers for Delta Airlines flight 6, 3, 0, 1, to Atlanta, now boarding at gate 16.

Male

Christopher

To recoup revenue losses from playing without fans, the league proposed a sliding scale of pay cuts for players.

Male

Jacob

Scientific studies have evaluated surgical masks, but relatively few have looked at whether cloth masks can stop virus transmission.

Female

Ana

For the two friends, this was their first mission together.

Female

Monica

Sometimes the notion of going out to a movie theater seems like a dream from previous life.

Female

Amber

This process of evolution is fascinating, trying to define what is a motion picture and what is a streamed movie.

 

Five more Chinese voices are generally available

 

Five Chinese (Mandarin) voices - Yunxi, Xiaomo, Xiaoxuan, Xiaohan and Xiaorui, were released as public preview in November 2020, optimized for conversational and audio book scenarios. During the preview, these voices have been widely used by many customers in various scenarios. Today we are glad to announce the general availability of these voices across more regions. Together with Xiaoxiao, Xiaoyou Yunyang, and Yunye, 9 neural voices are supported in Chinese (Mandarin). 

 

Gender

Voice

Sample script

Audio

Male

Yunxi

云希

要不我们先回顾下发生在这里的案子吧,犯罪嫌疑人是什么时候进入的公寓,又是什么时候离开的呢?

Female

Xiaomo

晓墨

在不同的人那里,对快乐的看法是那么的不一样。

Female

Xiaoxuan

晓萱

这是一个瞬息万变的时代,我们每一个人都面临着很大的挑战

Female

Xiaohan

晓涵

小人鱼为了能和自己所爱的王子在一起,用自己美妙的嗓音和三百年的生命换来了巫婆的药酒

Female

Xiaorui

晓睿

孩子们,你们玩的时候,别去马路上

 

Voice quality is further improved for various languages

 

A TTS voice personifies an application. The more natural the voice is, the more convincing it can be. While continuing to support more languages and offering more voice choices, we also keep improving the quality of existing voices, so we continue to help customers bring better voice experience to their users.

 

One challenging area of the continuous quality improvement is the naturalness of the question tone. When pronounced incorrectly (e.g., performing a rising tone to a falling tone at the end of the sentence), a question may not be understood properly. We have recently improved the question tones for the following voices: Mia and Ryan in English (United Kingdom), Denise and Henri in French (France), Isabella in Italian (Italy), Conrad in German (Germany), Alvaro in Spanish (Spain),and Dalia and Jorge in Spanish (Mexico).

 

Locale

Language

Voice

Sample script

Old

New

en-GB

English (UK)

Mia

Can it really have entered British English from an Australian soap opera?

en-GB

English (UK)

Ryan

Do you find it difficult as well?

fr-FR

French (France)

Denise

Comment vous sentez-vous à 24 heures de l’élection ?

fr-FR

French (France)

Henri

Pourquoi une maison de l’eau mobile ?

it-IT

Italian (Italy)

Isabella

Basta Netflix quindi, ma per fare cosa?

de-DE

German (Germany)

Conrad

Was sind weitere Schwerpunkte Ihrer Arbeit?

es-ES

Spanish (Spain)

Alvaro

Ser joven mola, me gusta ser joven, ¿sabes?

es-MX

Spanish (Mexico)

Dalia

¿Encontrará alguien que le haga sombra en Italia?

es-MX

Spanish (Mexico)

Jorge

¿En esos casos no habría carpetazo?

 

Text-to-Speech is part of Speech, a Cognitive Service on Azure. To learn more about the Speech service updates, go to this blog

 

Get started

 

By offering more voices across more languages and locales, we anticipate developers across the world will be able to build applications that change experiences for millions. Whether you are building a voice-enabled chatbot or IoT device, an IVR solution, adding read-aloud features to your app, converting e-books to audio books, or even adding Speech to a translation app, you can make all these experiences natural sounding and fun with Neural TTS.

 

If you find that the language which you are looking for is not supported by Azure TTS, reach out to your sales representative, or file a support ticket on Azure. We'd be happy to engage and discuss how to support the languages you need. You can also customize and create a brand voice with your speech data for your apps using the Custom Neural Voice feature. 

 

Let us know how you are using or plan to use Neural TTS voices in this form. If you prefer, you can also contact us at mstts [at] microsoft.com. We look forward to hearing about your experience and look forward to developing more compelling services together with you for the developers around the world.

 

Add voice to your app in 15 minutes

Explore the available voices in this demo

Build a voice-enabled bot

Deploy Azure TTS voices on prem with Speech Containers

Build your custom voice

Apply access to Custom Neural Voice

5 Comments
Version history
Last update:
‎May 25 2021 09:08 AM
Updated by: