Blog Post

AI - Azure AI services Blog
4 MIN READ

Introducing new voice styles in Azure Cognitive Services

Melinda Ma's avatar
Melinda Ma
Icon for Microsoft rankMicrosoft
Apr 02, 2020

 

This post was co-authored by QinyingLiao, Anny Dow , Yueying Liu, and Peter Pan.  

 

Neural TTS enables fluid, natural-sounding speech that matches the patterns and intonation of human voices, helping developers bring their solutions to life.

 

Today, we’re building upon our Neural Text to Speech (Neural TTS) capabilities in Azure Cognitive Services with new voice styles. With the new styles—newscast, customer service, and digital assistant—developers can tailor the voice of their apps and services to fit their brand or unique scenario.

 

Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive. Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices.

 

In addition to our new voice styles optimized for specific scenarios, we are also releasing new emotion styles. These styles allow you to adjust voices to express different emotions to fit the context, like cheerfulness or empathy. Let’s dive in.

 

Introducing Newscast, Customer Service, and Digital Assistant styles

 

Newscast

With neural TTS voices in the newscast style, your users can enjoy listening to news or articles in a professional tone that reflects what you might hear on TV or radio newscasts.

 

Hear Aria's (English – Female) and Xiaoxiao’s (Chinese – Female) voices in the newscast style:

Text

Newscast style

Default

Heavy snow and strong winds hammered parts of the central U.S. on Thursday and began moving into the Great Lakes region, knocking out power to tens of thousands of people and creating hazardous travel conditions a day after pummeling Colorado.

 

 

 

 

现今,大批企业以数字化转型为战略目标,数字化转型可赋能企业重构竞争环境、满足客户期望、增强服务运营。为了真正实现“ being digital ”, 许多企业将人工智能视作实现数字化转型目标的首选技术工具之一。

 

 

 

 

 

 

 

Check out the newscast style in the Bing mobile app. When you search news with the voice search feature, you can hear news briefs using Aria’s newscast style voice.

 

You can also check out Xiaoxiao’s newscast style voice, which has been adopted in WeChat through the Microsoft Listening Docs app. In Microsoft Listening Docs, users can hear Xiaoxiao’s voice read out multiple document types such as Word, PowerPoint, Excel, as well as images. Users can easily generate audio content for online trainings, news podcasts and more, and share with their social circles.

 

Customer Service

The customer service style features a friendly and engaging tone and is suitable for scenarios involving customer support, such as an individual checking into their flight, making a restaurant reservation, or reporting a claim.

 

Hear Aria's and Xiaoxiao’s voices in the customer service style:

Text

Customer Service style 

Default

Alright, it's going to be right in front of your door, within 30 minutes. Thanks for calling  Pizza Loco! Have a great night!

 

 

 

 

 

客服:您好,欢迎致电智慧银行,我是您的智能客服晓晓,请问有什么可以帮您?

客户:你好,我想调整信用卡的额度。

客服:嗯,请稍等,我查询一下状态。请问您要调整到多少额度?

客户:帮我调到三万人民币吧。

客服:好的,已经给您变更成功,稍后您会收到短信提醒。

客户:好的,谢谢。

客服:感谢您的来电,祝您生活愉快,再见。

 

 

 

 

 

Digital Assistant

Many customers have been using neural TTS voices for their digital assistant solutions. We are introducing two styles in this area: a chat style for more casual, conversational bots, and a more professional style for scenarios such as in-car digital assistants.

 

The chat style features a conversational tone, simulating casual dialogue.

 

Hear Aria’s voice in the chat style:

Style

Text

Chat style

Default

Chat

Oh, well that's quite a change from California to Utah.

 

 

 

 

 

The assistant style features a friendly and helpful tone, which is suitable in scenarios such as smart speakers or in-car assistants. Use the digital assistant voice to hear the weather forecast, search for information, navigate directions, set reminders, and more.

 

Hear Xiaoxiao’s voice in the assistant style:

Text

Assistant style

Default

 

没听到你说话,请再说一次。

 

现在听的是:FM88.8,江苏音乐台的节目,滴滴叭叭早上好。

 

 

 

Bringing new emotions to Neural Text to Speech

To enable you to build nuanced voices for your unique scenario, Neural Text to Speech also offers different emotion styles. You can access cheerful and empathetic styles for Aria’s voice, lyrical style for Xiaoxiao’s voice—which sounds heartfelt and is optimized to read prose or poetry, and cheerful style for Francisca’s voice (Brazilian Portuguese).

 

Hear the new styles below:

Style

Text

Style

Default

Cheerful

 

Great, I hope she will like it! 

 

A canadense postou uma música nova no seu perfil oficial do Twitter.

 

 

 

 

Empathetic

I want to let you know that you’re loved. I know things are hard right now and it’s OK. You don’t have to do this alone

 

 

 

 

Lyrical

大家晚上好,我是晓晓。在每一个夜晚来临的时候,我都在这里陪你入睡。忙碌的一天又过去了,现在的你是窝在沙发上看着窗外发呆,还是倒了一杯咖啡继续解决白天没有做完的工作呢?时间过得真快呀,在学校里咬着早餐上课,和同学们嬉戏打闹的日子,仿佛就在昨天。但一转眼,我们都穿着西装变成了大人。 

 

 

 

These new voice styles are also available for customized brand voices through our Custom Neural Voice capability, allowing you to build a unique voice that can also benefit from our new scenario and emotion styles. As part of Microsoft's commitment to designing AI responsibly, we have developed guidelines for customers in using Custom Neural Voice, in alignment with Microsoft's principles for responsible innovation in AI. Learn more about the process for getting started with Custom Neural Voice here.   

 

Get Started

Get started with the new neural TTS voice styles available in Azure Cognitive Services. Check out our documentation to learn more.

 

Updated Feb 06, 2023
Version 3.0
  • VFXPro's avatar
    VFXPro
    Brass Contributor

    Truly and genuinely IMPRESSIVE!!! I have hundreds of hours of voice recording work with humans that are no longer needed!!! Thank you for making this technology sound so perfect.

  • loran2020's avatar
    loran2020
    Copper Contributor

    This is awesome! I'm really glad the Azure team is making these Cognitive Services available to developers and the general public. It is very impressive how far these capabilities have come, especially the new speaking style capabilities that allow users to better express emotion. I'm interested in seeing how we can apply these tools. Hirox mentioned video games. I also think about chat agents, virtual assistants, tools for the hearing impaired, and so on. Are there any posts about or information on how these Cognitive Speech Services technologies are being applied currently?

     

    For anyone that is interested, I am working on an application that uses the Microsoft Azure Cognitive Services Speech SDK. The application is type-recorder.

  • Hirox's avatar
    Hirox
    Copper Contributor

    Interesting... maybe we can adapt this in games...
    Might be a good idea to include this into game stack, just saying.

  • Fang627426's avatar
    Fang627426
    Brass Contributor

    Can you please add speaking styles for the Australian English voices "Natasha" and "William".