Cognitive services speech sdk gives "The recordings URI contains invalid data" error

Question

Hi,&nbsp;I am using the code provided here:https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/batch/python/python-client/main.py&nbsp;It works with a small m4a file, but if I try to transcribe the same audio but in .wav format, in throws the error&nbsp;Transcription failed: The recordings URI contains invalid data.&nbsp;It also fails with a large .wav or .m4a file.&nbsp;the .wav files are obtained by extracting the audio from video using moviepy with this specifics:codec='pcm_s16le', bitrate='256k', fps=16000&nbsp;Any help would be appreciated, thanks.

rodger_blom · Answer

I am running into the same problem using m4a files (about 32 MB big). Using the 3.2 preview batch speech-to-text api using base model Whisper westeurope (5e075808-d616-4e6b-bd44-2d965db08b99).

federicofkt · Answer

I tried using azure.cognitiveservices.speech library in python and it works with large files, the problem is that it performs quite bad both in recognizing the text and, most of all, the speakers (I have an audo with 2 speakers, and it transcribes it with 3 speakers LoL). If you're able to figure how to make the API work out let me know!

rodger_blom · Answer

Yeah, I only want to use OpenAI's Whisper because of its superiority. The other option is to move over to the OpenAI's api's in Azure. It can handle m4a and wav.

Forum Discussion

Cognitive services speech sdk gives "The recordings URI contains invalid data" error

3 Replies

Resources