Blog Post

AI - Azure AI services Blog
5 MIN READ

Create personalized voices with Azure AI Speech

QinyingLiao's avatar
QinyingLiao
Icon for Microsoft rankMicrosoft
May 21, 2024

Today we're thrilled to announce that the personal voice feature of Azure AI Speech is now generally available.

 

Personal voice is designed to enable users to create and use their own AI voices in apps built by our customers. Initially released at Ignite 2023 in November (see the blog), its model quality has been greatly improved (see the model update blog), with more code samples available.

 

As part of Microsoft’s commitment to responsible AI, personal voice is designed to be transparent about human–computer interaction, and incorporate guardrails to prevent misuse. For this reason, personal voice is a Limited Access feature available by registration only, and only for certain use cases. Watermarks are added to speech output created with the personal voice feature.

 

Example customers and use cases

 

Personal voice can be used in various scenarios. In this blog, we outline a few customer examples and their use of personal voice in their applications.

 

Voice assistant: 

Create a personalized voice assistant experience. Users can now use their own voice to make conversations more engaging.

 

For example, Truecaller, a smartphone application, has integrated personal voice into their Truecaller Assistant app to enhance users' experience. The Truecaller Assistant answers users’ calls and asks questions for the users, detecting spam and letting the users know if the call is worth answering. With the personal voice capability, the users can now select to create and use their own voice for the AI assistant, providing the caller a more engaging experience when interacting with the virtual assistant. 

 

"At Truecaller, we're constantly seeking innovative ways to enhance our users' experience and provide them with cutting-edge features. By integrating Microsoft Azure AI Speech’s personal voice capability into Truecaller, we've taken a significant step towards delivering a truly personalized and engaging communication experience.

 

The personal voice feature allows our users to use their own voice, enabling the digital assistant to sound just like them when handling incoming calls. This groundbreaking capability not only adds a touch of familiarity and comfort for the users but also showcases the power of AI in transforming the way we interact with our digital assistants.

 

We're thrilled to collaborate with Microsoft on this exciting project and leverage their expertise in text-to-speech and AI technologies. The seamless integration process and the exceptional support provided by the Microsoft team have been instrumental in bringing this feature to life.

 

We firmly believe that the personal voice feature will revolutionize the way our users manage their calls and elevate their overall experience with Truecaller Assistant. We look forward to further exploring the potential of AI-powered voice technologies in partnership with Microsoft and delivering even more innovative solutions to our global user base."

 

— Raphael Mimoun, Product Director & General Manager, Truecaller Israel

 

Check out the demo from Truecaller.


 

Speech translation:

Personal voice can be used to enable real-time speech translation scenarios with the speaker’s own voice speaking in another language.

 

Skype’s truvoice translation feature has been upgraded to the latest Azure AI Speech personal voice model. With Skype, you can have a conversation with someone who speaks a different language, and the translation will happen automatically, in real time, as the conversation is taking place, using your natural voice. The new personal voice model has been vastly improved so that the translated voice sounds more like you.

 

Check out the demo from Skype.

 

Creativity and productivity:

With personal voice, your users can create and use their own voices for video content creation, for example, to create stories, audio books, podcasts, videos, and more, making the content more relatable and immersive than ever before.

 

Wondershare, a digital creativity company, is integrating personal voice into their video editing tools to improve productivity.

 

"We're excited to expand our partnership with Microsoft's Azure OpenAI and Speech services. With a focus on creativity and productivity, we're continually staying at the forefront of cutting-edge software solutions. We're specifically leveraging the AI trend to unlock a new chapter in content creation. Integrating Filmora with Microsoft Azure AI Speech’s personal voice capability marks a major milestone, providing Wondershare users with a breakthrough experience in video editing. In the future of content creation, AI technology will play a pivotal role, empowering influencers and creators to thrive. Together with Microsoft, Wondershare is shaping a world where personal expression flourishes."

 

— Shaan Jahagirdar, Deputy General Manager of Design Center, Wondershare

 

Check out the story here:

 

The content creation use case is restricted and available only to customers who meet Limited Access eligibility criteria and for specific scenarios. Contact Microsoft Azure AI Speech team at mstts[at]microsoft.com if you would like to create personal voices for synthetic media content creation or have inquiries about its eligibility criteria.

 

Building personal voices responsibly

 

All customers must agree to our usage policies, which include requiring explicit consent from the original speaker, disclosing the synthetic nature of the content created, and prohibiting impersonation of any person or deceiving people using the personal voice service. The full code of conduct guides integrations of synthetic speech and personal voice to ensure consistency with our commitment to responsible AI.

 

Watermarks are automatically added to the speech output generated with personal voices. As the personal voice feature enters general availability, we have updated the watermark technology with enhanced robustness and stronger capabilities for identifying watermark existence. To measure the robustness of the new watermark, we have evaluated the accuracy of watermark detection with audio samples generated using personal voice. Our results showed an average accuracy rate higher than 99.7% for detecting the existence of watermarks in various audio editing scenarios. This improvement provides us stronger mitigations to prevent potential misuse.  

 

In addition, with watermarks and the watermark detection service, eligible customers can enable their apps to identify whether speech has been synthesized using Azure AI Speech personal voice capabilities. To request to add watermark detection to your applications please contact the Azure AI Speech team (mstts[at]microsoft.com).

 

Get started

 

Try the personal voice feature on Speech Studio as a test, or apply for full access to the API for business use.

 

In addition to creating a personal voice, eligible customers can create a brand voice for your business with Custom Neural Voice’s professional voice feature. Azure AI Speech also offers over 500 neural voices covering more than 140 languages and locales. With these pre-built Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users.

 

Updated Jun 28, 2024
Version 3.0
  • IntvPrime's avatar
    IntvPrime
    Copper Contributor

    We are looking forward to developing speech for some upcoming Intellivision games, thank you for advancing in this space!