Create Custom Video Avatar

Question

I am attempting to build a real-time text to speech application using a custom video avatar. I am struggling to determine how to actually create the avatar.

I have located the following two resources which include an overview of the creation process and instructions for the recording process.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create#get-consent-file-from-the-avatar-talent

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples

However, I cannot determine how to actually create the avatar from the recorded video samples. I assume it is not as simple as submitting them as a training file. What are the next steps?

Any help would be greatly appreciated.

ayan_chawla · Answer

Hi Andy,

In the preview stage of this service, the training will be done manually by Microsoft. You'll be notified after the model is successfully trained.

You can fill the form for that - https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xURFZNMk5NQzVHNFNQVzJIWDVWTDZVVVEzMSQlQCN0PWcu

Basic steps to do that -
1. Get the consent video with the given statements- https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/customavatar/verbal-statement-all-locales.txt
2. Prepare the training data - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples
3. Train the model - Done by Microsoft as of 24th April, 2024. Please fill the above form.
4. Deploy and use the Avatar - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-endpoint

Reference Link - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/what-is-custom-text-to-speech-avatar#how-does-it-work

For the real time, we need to install the speech SDK and use it - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar

** Please mark the answer as solved

user089 · Answer

Apart from the upfront custom avatar generation cost and the synthesis cost, are there any other costs involved? We are planning to occasionally use the custom avatar for batch synthesis.

Forum Discussion

Create Custom Video Avatar

2 Replies

Resources