Whisper-1 Model Transcribes English Audio Incorrectly

Question

Hi everyone,I'm currently working with the&nbsp;gpt-4o-realtime-preview&nbsp;model from Azure OpenAI and using the&nbsp;whisper-1&nbsp;model for audio-to-text transcription. However, I'm encountering a recurring issue where the transcription frequently fails to detect the correct language.Even though I provide clear English audio, the output is often transcribed in other languages such as Hindi, Urdu, or Chinese. This inconsistency is affecting the reliability of the transcription process.Here’s a snippet of the code I’m using:ConversationSessionOptions sessionOptions = new(){&nbsp;&nbsp;&nbsp; Voice = ConversationVoice.Alloy,&nbsp;&nbsp;&nbsp; InputAudioFormat = ConversationAudioFormat.Pcm16,&nbsp;&nbsp;&nbsp; OutputAudioFormat = ConversationAudioFormat.Pcm16,&nbsp;&nbsp;&nbsp; Instructions = instructions,&nbsp;&nbsp;&nbsp; InputTranscriptionOptions = new()&nbsp;&nbsp;&nbsp; {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Model = "whisper-1",&nbsp;&nbsp;&nbsp; },};Is there a way to explicitly specify or prompt the&nbsp;whisper-1&nbsp;model to prioritize or lock in English as the transcription language? Any guidance on how to improve language detection accuracy would be greatly appreciated.Thanks in advance!

kidd_ip · Answer

In your InputTranscriptionOptions, you can specify the language using the Language property:
&nbsp;
InputTranscriptionOptions = new()
{
    Model = "whisper-1",
    Language = "en"
}
&nbsp;
This tells the model to skip language detection and transcribe the audio as English, which should eliminate the issue of it defaulting to Hindi, Urdu, or Chinese.

prathameshdeshmukh · Answer

Thanks for the replyI tried this, but this gave me a compile time error as :&nbsp;Error (active) &nbsp; &nbsp;CS0117 &nbsp; &nbsp;'ConversationInputTranscriptionOptions' does not contain a definition for 'Language'I am using Azure.AI.OpenAI with Version="2.2.0-beta.4".

hazem · Answer

Well, in Azure.AI.OpenAI v2.2.0-beta.4, the ConversationInputTranscriptionOptions class does not yet expose a Language property, even though Whisper itself (at the API level) does support it.However, Since the underlying API does support specifying the language, you can bypass the SDK and make a direct POST request with the language parameter in the payload.https://{your-resource-name}.openai.azure.com/openai/deployments/whisper-1/audio/transcriptions?api-version=2024-02-15-preview&nbsp;POST Body:{
  "file": "&lt;your-audio-file&gt;",
  "model": "whisper-1",
  "language": "en"
}This will lock the transcription to English and eliminate the language auto-detection issue.&nbsp;TL;DR: SDK version v2.2.0-beta.4 doesn't yet support the Language field.&nbsp;Use REST API to explicitly set language: "en" for now.

prathameshdeshmukh · Answer

Thanks for your suggestion! I appreciate the workaround using the REST API to explicitly set the "language" parameter for Whisper transcription.

However, in my case, I’m using the Azure.AI.OpenAI SDK (v2.2.0-beta.4), which handles API calls internally. Unfortunately, the ConversationInputTranscriptionOptions class in this SDK doesn’t currently expose a Language property, so I can't directly pass the "language" parameter through the SDK.

Since the SDK abstracts away the HTTP layer, I can't inject custom parameters like "language" into the request body unless the SDK itself supports it. So it’s not applicable in my current setup.

Forum Discussion

Whisper-1 Model Transcribes English Audio Incorrectly

4 Replies

Resources