Microsoft Azure TTS Cognitive Service Voice Limit Issue

Copper Contributor
I am very new to learn cognitive services of Text-to-Speech (TTS) of Microsoft Azure. I successfully able to convert the given text into an audio file by using TTS services of Azure.It works fine when I'm having a single voice element in my SSML XML document. The example of working SSML is;
<speak version="1.0" xml:lang="en-US">
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Jessa24kRUS"> 
       Hello, this is my sample text to convert into audio? 
  </voice>
</speak>

But, when I'm having multiple voice tags(on gender base), then it causes an error. The SSML of it is:

<speak version="1.0" xml:lang="en-US">
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> What’s your name? </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> My name is Cindy Smith. Do you know John Silver?</voice>
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> John and I are old friends. </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> John just joined our company as a salesperson. </voice>
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> That’s good news. John has been a salesperson for chemical products for many years. </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> I head he really likes his new job.</voice>
</speak>

And the error is:

Response status code does not indicate success: 400 (SSML must contain a maximum of 5 voice elements. Actual 6.).

It'll be a great help for me if someone explain that why its limiting me to five voice tags, while there's no limitation mentioned in documentation.

1 Reply

@ArsmanAhmad this limit has been detailed in the documentation, under "Quotas and limits" section:

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-...

 

Now, for HTTP-specific quotas, the limit is:

"Max number of distinct <voice> tags in SSML: 50"

 

I face the same 5 voices limit previously this year when manipulating SSML. I guess they improved the limit since your tests