We are excited to announce the public preview release of Azure AI Speech text to speech avatar, a new feature that enables users to create talking avatar videos with text input, and to build real-time interactive bots trained using human images. In this blog post, we will introduce the features, benefits, and technical details of this feature, and show you some examples of how you can use it for various scenarios.
The text to speech avatar system is a text to speech feature with vision capabilities, that allow customers to create synthetic videos of a 2D photorealistic avatar speaking. The Neural text to speech Avatar models are trained by deep neural networks based on the human video recording samples, and the voice of the avatar is provided by text to speech voice model.
Why do we build avatars? There are two main reasons:
There are three components in an avatar content generation workflow: text analyzer, the TTS audio synthesizer, and TTS avatar video synthesizer. To generate avatar video, text is first input into the text analyzer, which provides the output in the form of phoneme sequence. Then, the TTS audio synthesizer predicts the acoustic features of the input text and synthesize the voice. These two parts are provided by text to speech voice models. Next, the Neural text to speech Avatar model predicts the image of lip sync with the acoustic features, so that the synthetic video is generated.
Below is an overview of the workflow:
We offer two separate text to speech avatar features at this time: prebuilt text to speech avatar and custom text to speech avatar.
Microsoft offers prebuilt text to speech avatars as out of box products on Azure for its subscribers. These avatars can speak different languages and voices based on the text input. Customers can select an avatar from a variety of options and use it to create video content or interactive applications with real time avatar responses.
A custom text to speech avatar feature enables customers to create a personalized avatar for their product or brand. Customers can upload their own video recording of avatar talent, which the feature uses to train a synthetic video of the custom avatar speaking. Customers can choose either a prebuilt or a custom neural voice for their avatar. If the same person's voice and likeness are used for both the custom neural voice and the custom text to speech avatar, the avatar will closely resemble that person.
As part of Microsoft's commitment to responsible AI, text to speech avatar is designed with the intention of protecting the rights of individuals and society, fostering transparent human-computer interaction, and counteracting the proliferation of harmful deepfakes and misleading content. For this reason, custom avatar is a Limited Access feature available by registration only, and only for certain use cases. To access and use the feature in your business applications, register your use case here and apply for the access.
We support both UI tool on the Azure AI Speech Studio and API access.
The Text to speech Avatar tool for video content creation on Speech Studio
A Live chat avatar demo tool on Speech Studio
With text to speech avatar, you are enabled to create engaging videos with prebuilt or your custom avatar, such as training video, presentation video, etc.
You can also create engaging experiences for customers, employees, and other audiences by providing applications enriched with an interactive avatar.
Batch video content creation | Real time interaction application |
|
|
Here are examples of video content creation with a custom avatar and a virtual sales application powered by text to speech avatar and Azure Open AI. In each sample, we provide an introduction of how to create, the result video demo, as well as the sample code.
Engaging avatar video experiences are typically composed of several elements including the talking avatar video, background images or videos, ambient music and other elements to make the video fancy.
Here is a simple workflow of creating rich avatar videos:
The following video was generated using the above workflow with a custom text to speech avatar.
Check out our notebook to create your avatar video today: https://github.com/Azure/gen-cv/tree/main/avatar/video
Here is an example with an avatar acting as a virtual sales agent of an outdoor equipment online shop. She answers customer questions in real time about products or customer accounts and can also place an order.
This outdoor demo harnesses the capabilities of the text to speech avatar, Azure OpenAI Service, Azure AI Search, and Azure SQL Database to offer the following features:
The demo application is a static Azure Web App with a JavaScript user interface that communicates with Azure AI Speech and other components. The Python-based backend orchestrates the communication between Azure OpenAI Service and Azure AI Search, which serves as the product knowledge base, as well as Azure Storage for product images and Azure SQL Database for customer data management.
Here's a glimpse into the outdoor shopping demo experience, showcasing the multilingual capabilities of the avatar feature:
You can find the required resources for creating your own application based on the outdoor shop example here:https://github.com/Azure/gen-cv/tree/main/avatar/interactive. You can customize the solution for your specific needs.
We are happy to have a number of customers working with text to speech avatar, and that we can share their testimonials at public preview.
“We are using Azure AI Services for our AI Banking Avatar due to the unique combination of leading-edge AI and Visualization services in one platform. By using different Azure AI Speech text to speech avatar we will be able to generate a next level customer experience and really simplify banking and banking interactions.” - Gerald Ertl, Managing Director, Commerzbank AG
“We believe that AI-powered brand assistants will transform the way businesses interact with their customers and manage their brands. It’s for this reason that we are excited by the potential of the text to speech avatar. Whether it’s providing the answers to customer questions, assisting a transaction, or providing entertaining content, the use cases that this technology unlocks are numerous. We’re privileged to be working with Microsoft on this program, as we shape the future of digital experiences together.” - Alex Hamilton, Head of Innovation, UK, Dentsu
To learn more and get started, you can first try out text to speech avatar prebuilt avatars with the no-code tool provided in Speech Studio (microsoft.com) which allows you to explore the avatar feature with an intuitive user interface. You need an Azure account and an Azure AI Speech resource before you can use Speech Studio (microsoft.com). Please refer to Quick Start to set up.
We are committed to ensuring that our AI solutions are used in a responsible manner, as this is essential for our and our customers' long-term success. Please read the Responsible AI introduction for text to speech avatar on https://aka.ms/TTS-TN
For more information
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.