Forum Discussion
Federicofkt
Dec 19, 2023Copper Contributor
Cognitive services speech sdk gives "The recordings URI contains invalid data" error
Hi,
I am using the code provided here:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/batch/python/python-client/main.py
It works with a small m4a file, but if I try to transcribe the same audio but in .wav format, in throws the error Transcription failed: The recordings URI contains invalid data.
It also fails with a large .wav or .m4a file.
the .wav files are obtained by extracting the audio from video using moviepy with this specifics:
codec='pcm_s16le', bitrate='256k', fps=16000
Any help would be appreciated, thanks.
3 Replies
- Rodger_BlomCopper ContributorI am running into the same problem using m4a files (about 32 MB big). Using the 3.2 preview batch speech-to-text api using base model Whisper westeurope (5e075808-d616-4e6b-bd44-2d965db08b99).
- FedericofktCopper ContributorI tried using azure.cognitiveservices.speech library in python and it works with large files, the problem is that it performs quite bad both in recognizing the text and, most of all, the speakers (I have an audo with 2 speakers, and it transcribes it with 3 speakers LoL). If you're able to figure how to make the API work out let me know!
- Rodger_BlomCopper ContributorYeah, I only want to use OpenAI's Whisper because of its superiority. The other option is to move over to the OpenAI's api's in Azure. It can handle m4a and wav.