Ignite 2020 Neural Text-to-Speech updates: new language support, more voices and flexible deployment options
This post was co-authored by Garfield He, Melinda Ma, Yueying Liu and Yinhe Wei
Neural Text to Speech (Neural TTS), a powerful speech synthesis capability of Cognitive Services on Azure, enables you to convert text to lifelike speech which is close to human-parity. Since its launch, we have seen it widely adopted in a variety of scenarios by many Azure customers, from voice assistants to audio content creation. We continue to push the envelope to enable more developers to add natural-sounding voices to their applications and solutions.
Today, we are happy to announce a series of updates to Neural TTS that extends its reach globally and allows developers to deploy it anywhere the data resides. This includes new languages available, new voices with rich personas, and on-prem deployment through docker containers.
18 new languages/locales supported
Neural TTS has now been extended to support 18 new languages/locales. They are Bulgarian, Czech, German (Austria), German (Switzerland), Greek, English (Ireland), French (Switzerland), Hebrew, Croatian, Hungarian, Indonesian, Malay, Romanian, Slovak, Slovenian, Tamil, Telugu and Vietnamese.
You can hear samples of these voices below.
Locale |
Language |
Gender |
Voice |
Sample |
bg-BG |
Bulgarian |
Female |
Kalina |
Архитектурното културно наследство в България е в опасност. |
cs-CZ |
Czech |
Female |
Vlasta |
Policisté většinou chodí v uniformě a jsou označeni hodnostmi. |
de-AT |
German (Austria) |
Female |
Ingrid |
Ab Herbst werden Lehrer, die sich dafür interessieren, eigens ausgebildet. |
de-CH |
German (Switzerland) |
Female |
Leni |
Dreizehn Millionen Liter mehr als im Vorjahr. |
el-GR |
Greek |
Female |
Athina |
Για να βρεις ποιος σε εξουσιάζει, απλώς σκέψου ποιος είναι αυτός που δεν επιτρέπεται να κριτικάρεις . |
en-IE |
English (Ireland) |
Female |
Emily |
Now we have seventy members and two dragon boats. |
fr-CH |
French (Switzerland) |
Female |
Ariane |
Chaque équipe jouera donc 5 matchs de 20 minutes dans sa poule. |
he-IL |
Hebrew (Israel) |
Female |
Hila |
הכל פתוח במאבק על המקום האחרון לפלייאוף העליון של ליגת העל בכדורגל. |
hr-HR |
Croatian |
Female |
Gabrijela |
Idemo na pobjedu u Maksimiru, pred našem publikom dat ćemo sto posto. |
hu-HU |
Hungarian |
Female |
Noemi |
A macska felmászott a tetőre és leugrott. |
id-ID |
Indonesian |
Male |
Ardi |
Inflasi dapat digolongkan menjadi empat golongan, yaitu inflasi ringan, sedang, berat, dan hiperinflasi. |
ms-MY |
Malay |
Female |
Yasmin |
Beg berkenaan dibawa ke hospital untuk menjalankan proses pengenalan. |
ro-RO |
Romanian |
Female |
Alina |
Temperaturile maxime se vor încadra între 15 şi 23 de grade Celsius. |
sk-SK |
Slovak |
Female |
Viktoria |
Kúzelné miesta nájdete aj za jej hranicami, v malebnej prírode. |
sl-SI |
Slovenian |
Female |
Petra |
Predlagani zakon vključuje tudi načrt nadaljnjega ukrepanja. |
ta-IN |
Tamil |
Female |
Pallavi |
உச்சிமீது வானிடிந்து வீழுகின்ற போதினும், அச்சமில்லை அச்சமில்லை அச்சமென்பதில்லையே |
te-IN |
Telugu |
Female |
Shruti |
అందం ముఖంలో ఉండదు. సహాయం చేసే మనసులో ఉంటుంది |
vi-VN |
Vietnamese |
Female |
HoaiMy |
Hà Nội là thủ đô của Việt Nam. |
With these new voices, Microsoft Azure Neural TTS supports 49 languages/locales in total.
14 additional voices released to enrich the variety
Customers use TTS for different scenarios and their requirements for voice personas can vary. To provide more options to developers, we continue to create more voices in each language. Besides the extension to support new locales, we’ve announced 14 new voices to enrich the variety in the existing languages.
Hear samples of these voices below.
Locale |
Language |
Gender |
Voice |
Sample |
de-DE |
German |
Male |
Conrad |
Je würziger das Fleisch, desto würziger und kräftiger sollte auch der Wein sein. |
en-AU |
English (Australia) |
Male |
William |
They have told me nothing, and probably cannot tell me anything to the purpose. |
en-GB |
English (UK) |
Male |
Ryan |
Today’s temperature was a record 26.5 degrees Celsius. |
en-US |
English (US) |
Female |
Jenny |
For example, we place a session cookie on your computer each time you visit our Website. |
es-ES |
Spanish (Spain) |
Male |
Alvaro |
Dos helicópteros medicalizados tuvieron que acudir al lugar a rescatar a los heridos. |
es-MX |
Spanish (Mexico) |
Male |
Jorge |
El niño mencionó que si pudiera caminar, pediría un balón para poder patearlo o una cuerda para poder saltar. |
fr-CA |
French (Canada) |
Male |
Jean |
Ce jour tant attendu arrive enfin! |
fr-FR |
French (France) |
Male |
Henri |
Jusqu'ici, nous vous avons toujours fait confiance et accordé le bénefice du doute. |
it-IT |
Italian |
Female |
Isabella |
I gel igienizzanti sono aumentati di prezzo. |
it-IT |
Italian |
Male |
Diego |
Domani preparerò dei biscotti con le gocce di cioccolato. |
ja-JP |
Japanese |
Male |
Keita |
キャッシュレス決済を利用して、支払いを簡単にする。 |
ko-KR |
Korean |
Male |
InJoon |
규모가 더욱 확대되었다. |
pt-BR |
Portuguese (Brazil) |
Male |
Antonio |
O que você quer ganhar de presente de natal? |
th-TH |
Thai |
Female |
Premwadee |
วิกฤตแบบนี้บริษัทยิ่งต้องการคนที่พร้อมเผชิญปัญหา |
With these updates, Microsoft Azure Text-to-Speech service offers 68 neural voices. Hear all these neural voices saying 'Thank you' in 49 languages/locales in the video below.
Across standard and neural TTS capabilities, we now offer 140+ voices in total. Check the 70+ standard voices.
More than 15 speaking styles available in en-US and zh-CN voices
Today, we’re building upon our Neural TTS capabilities in English (US) and Chinese (CN) with new voice styles. By default, the Text-to-Speech service synthesizes text using a neutral speaking style. With neural voices, you can adjust the speaking style to express different emotions like cheerfulness, empathy, and calm, or optimize the voice for different scenarios like customer service, newscasting and voice assistant that fit your need.
With the English (US) new voice, Jenny, which is created with a friendly, warm and comforting voice persona focusing on conversational scenarios, we provide additional speaking styles including chatbot, customer service, and assistant.
You can hear the different speaking styles in Jenny’s voice below:
Style |
Style description |
Sample |
General |
Expresses a neutral tone and available for general use |
Valentino Lazaro scored a late winner for Austria to deny Northern Ireland a first Nations League point. |
Chat |
Expresses a casual and relaxed tone in conversation |
Oh, well, that's quite a change from California to Utah. |
Customer service |
Expresses a friendly and helpful tone for customer support |
Okay, great. In the meantime, see if you can reach out to Verizon and let them know your issue. And Randy should be calling you back shortly. |
Assistant |
Expresses a warm and relaxed tone for digital assistants |
United States spans 2 time zones. In Nashville, it's 9:45 PM. |
A new speaking style is also available for the en-US male voice, Guy. Guy’s newscast style can be a great choice for a male voice that can read professional and news related content.
In addition, 10 new speaking styles are available with our zh-CN voice, Xiaoxiao. These new styles are optimized for audio content creators and intelligent bot developers to create more engaging interactive audios that express rich emotions.
You can hear the new speaking styles in Xiaoxiao’s voice below:
Calm |
Affectionate |
Angry |
那,那我再问你,你之前有养过宠物嘛? |
老公,把灯打开好吗,好黑呀,我很怕。 |
没想到,我们八年的感情真的完了! |
Disgruntled |
Fearful |
Gentle |
这你都不明白吗?真是个榆木脑袋。 |
先生,你没事吧?要不要我叫医生过来? |
我今天运气特别好,如果没有遇到您,还不知道会怎么样呢! |
Cheerful |
Serious |
Sad |
太好了,恭喜你顺利通过考核。 |
不要恋战,等待时机,随时准备突围。 |
没想到,你居然是这么一个无情无义的的人! |
For the Chinese voice Xiaoxiao, the intensity (‘style degree’) of speaking style can be further adjusted to better fit your use case. You can specify a stronger or softer style with 'style degree' to make the speech more expressive or subdued.
没想到,你居然是这么一个无情无义的的人! |
|
Sad=0.5 |
Sad=1.0 |
Sad=1.5 |
Sad=2.0 |
The style degree can be adjusted from 0.01 to 2 inclusive. The default value is 1 which means the predefined style intensity will be applied. The minimum unit is 0.01, which softens the style with a flatter tone. The value of 2 is the highest, which makes the style intensity obviously stronger than the default.
The SSML snippet below illustrates how the 'style degree' attribute is used to change the intensity of a speaking style.
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN"> <voice name="zh-CN-XiaoxiaoNeural"> <mstts:express-as style="sad" styledegree="2"> 快走吧,路上一定要注意安全,早去早回。 </mstts:express-as> </voice> </speak> |
The 'style degree' feature currently only applies to the Chinese voice Xiaoxiao and will come to more languages and voices later soon.
Check SSML for the details on how to use these speaking styles, together with other rich voice tuning capabilities.
Neural TTS Container is in public preview with 16 voices available in 14 languages
We have launched Neural TTS Container in public preview, as we are seeing a clear trend towards a future powered by the intelligent cloud and intelligent edge. With Neural TTS Container, developers can run speech synthesis with the most natural digital voices in their own environment for specific security and data governance requirements. Their Speech apps are portable and scalable with greater consistency whether they run on the edge or in Azure.
Currently 14 languages/locales are supported with 16 voices in Neural TTS Containers, as listed below.
Locale |
Voice |
de-de |
KatjaNeural |
en-au |
NatashaNeural |
en-ca |
ClaraNeural |
en-gb |
LibbyNeural |
en-gb |
MiaNeural |
en-us |
AriaNeural |
en-us |
GuyNeural |
es-es |
ElviraNeural |
es-mx |
DaliaNeural |
fr-ca |
SylvieNeural |
fr-fr |
DeniseNeural |
it-it |
ElsaNeural |
ja-jp |
NanamiNeural |
ko-kr |
SunHiNeural |
pt-br |
FranciscaNeural |
zh-cn |
XiaoxiaoNeural |
To get started, fill out and submit the request form to request access to the container. Currently Neural TTS Containers are gated and only approved for enterprises (EA customers) and Microsoft partners, and to an extent only for qualified customers.
Azure Cognitive Services Containers including Neural TTS Containers aren't licensed to run without being connected to the metering / billing endpoint. You must enable the containers to communicate billing information with the billing endpoint at all times. Cognitive Services containers don't send customer data, such as the image or text that's being analyzed, to Microsoft. Queries to the container are billed at the pricing tier of the Azure resource that's used for the ApiKey.
Here are the steps of how to install and run the container:
- Make sure your machine to host the container meets the hardware requirements.
- Get the container image with docker pull. For all the supported locales and corresponding voices of the neural text-to-speech container, please see Neural Text-to-speech image tags.
- Run the container with docker run.
- Validate that the container is running.
- Query the container’s endpoint. Take AriaNeural voice for example, you can run below HTTP post method to get the TTS output audio:
curl -s -v -X POST http://localhost:5000/speech/synthesize/cognitiveservices/v1 \ -H 'Accept: audio/*' \ -H 'Content-Type: application/ssml+xml' \ -H 'X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm' \ -d '<speak version="1.0" xml:lang="en-US"><voice name="en-US-AriaNeural">This is a test, only a test.</voice></speak>' > output.wav |
Learn more about Container support in Cognitive Services and visit the Frequently Asked Questions on Azure Cognitive Services Containers.
Get started
With these updates, we’re excited to be powering natural and intuitive voice experiences for more customers globally with flexible deployment options. For more information, visit below.
- Try the TTS demo
- See our documentation
- Check out our sample code
- Learn about Speech containers