Forum Discussion
Andy_LoCascio
Feb 28, 2024Copper Contributor
Create Custom Video Avatar
I am attempting to build a real-time text to speech application using a custom video avatar. I am struggling to determine how to actually create the avatar.
I have located the following two resources which include an overview of the creation process and instructions for the recording process.
However, I cannot determine how to actually create the avatar from the recorded video samples. I assume it is not as simple as submitting them as a training file. What are the next steps?
Any help would be greatly appreciated.
- Ayan_ChawlaMicrosoftHi Andy,
In the preview stage of this service, the training will be done manually by Microsoft. You'll be notified after the model is successfully trained.
You can fill the form for that - https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xURFZNMk5NQzVHNFNQVzJIWDVWTDZVVVEzMSQlQCN0PWcu
Basic steps to do that -
1. Get the consent video with the given statements- https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/customavatar/verbal-statement-all-locales.txt
2. Prepare the training data - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples
3. Train the model - Done by Microsoft as of 24th April, 2024. Please fill the above form.
4. Deploy and use the Avatar - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-endpoint
Reference Link - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/what-is-custom-text-to-speech-avatar#how-does-it-work
For the real time, we need to install the speech SDK and use it - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar
** Please mark the answer as solved- user089Copper ContributorApart from the upfront custom avatar generation cost and the synthesis cost, are there any other costs involved? We are planning to occasionally use the custom avatar for batch synthesis.