Forum Discussion
Whisper-1 Model Transcribes English Audio Incorrectly
Hi everyone,
I'm currently working with the gpt-4o-realtime-preview model from Azure OpenAI and using the whisper-1 model for audio-to-text transcription. However, I'm encountering a recurring issue where the transcription frequently fails to detect the correct language.
Even though I provide clear English audio, the output is often transcribed in other languages such as Hindi, Urdu, or Chinese. This inconsistency is affecting the reliability of the transcription process.
Here’s a snippet of the code I’m using:
ConversationSessionOptions sessionOptions = new()
{
Voice = ConversationVoice.Alloy,
InputAudioFormat = ConversationAudioFormat.Pcm16,
OutputAudioFormat = ConversationAudioFormat.Pcm16,
Instructions = instructions,
InputTranscriptionOptions = new()
{
Model = "whisper-1",
},
};
Is there a way to explicitly specify or prompt the whisper-1 model to prioritize or lock in English as the transcription language? Any guidance on how to improve language detection accuracy would be greatly appreciated.
Thanks in advance!
4 Replies
- hazemBrass Contributor
Well, in Azure.AI.OpenAI v2.2.0-beta.4, the ConversationInputTranscriptionOptions class does not yet expose a Language property, even though Whisper itself (at the API level) does support it.
However, Since the underlying API does support specifying the language, you can bypass the SDK and make a direct POST request with the language parameter in the payload.https://{your-resource-name}.openai.azure.com/openai/deployments/whisper-1/audio/transcriptions?api-version=2024-02-15-preview
POST Body:
{ "file": "<your-audio-file>", "model": "whisper-1", "language": "en" }
This will lock the transcription to English and eliminate the language auto-detection issue.
TL;DR:
- SDK version v2.2.0-beta.4 doesn't yet support the Language field.
- Use REST API to explicitly set language: "en" for now.
- PrathameshDeshmukhCopper Contributor
Thanks for your suggestion! I appreciate the workaround using the REST API to explicitly set the "language" parameter for Whisper transcription.
However, in my case, I’m using the Azure.AI.OpenAI SDK (v2.2.0-beta.4), which handles API calls internally. Unfortunately, the ConversationInputTranscriptionOptions class in this SDK doesn’t currently expose a Language property, so I can't directly pass the "language" parameter through the SDK.
Since the SDK abstracts away the HTTP layer, I can't inject custom parameters like "language" into the request body unless the SDK itself supports it. So it’s not applicable in my current setup.
In your InputTranscriptionOptions, you can specify the language using the Language property:
InputTranscriptionOptions = new() { Model = "whisper-1", Language = "en" }
This tells the model to skip language detection and transcribe the audio as English, which should eliminate the issue of it defaulting to Hindi, Urdu, or Chinese.
- PrathameshDeshmukhCopper Contributor
Thanks for the reply
I tried this, but this gave me a compile time error as :
Error (active) CS0117 'ConversationInputTranscriptionOptions' does not contain a definition for 'Language'
I am using Azure.AI.OpenAI with Version="2.2.0-beta.4".