Forum Discussion

PrathameshDeshmukh's avatar
PrathameshDeshmukh
Copper Contributor
Jun 24, 2025

Whisper-1 Model Transcribes English Audio Incorrectly

Hi everyone,

I'm currently working with the gpt-4o-realtime-preview model from Azure OpenAI and using the whisper-1 model for audio-to-text transcription. However, I'm encountering a recurring issue where the transcription frequently fails to detect the correct language.

Even though I provide clear English audio, the output is often transcribed in other languages such as Hindi, Urdu, or Chinese. This inconsistency is affecting the reliability of the transcription process.

Here’s a snippet of the code I’m using:

ConversationSessionOptions sessionOptions = new()

{

    Voice = ConversationVoice.Alloy,

    InputAudioFormat = ConversationAudioFormat.Pcm16,

    OutputAudioFormat = ConversationAudioFormat.Pcm16,

    Instructions = instructions,

    InputTranscriptionOptions = new()

    {

        Model = "whisper-1",

    },

};

Is there a way to explicitly specify or prompt the whisper-1 model to prioritize or lock in English as the transcription language? Any guidance on how to improve language detection accuracy would be greatly appreciated.

Thanks in advance!

4 Replies

  • hazem's avatar
    hazem
    Brass Contributor

    Well, in Azure.AI.OpenAI v2.2.0-beta.4, the ConversationInputTranscriptionOptions class does not yet expose a Language property, even though Whisper itself (at the API level) does support it.

    However, Since the underlying API does support specifying the language, you can bypass the SDK and make a direct POST request with the language parameter in the payload.

    https://{your-resource-name}.openai.azure.com/openai/deployments/whisper-1/audio/transcriptions?api-version=2024-02-15-preview

     

    POST Body:

    {
      "file": "<your-audio-file>",
      "model": "whisper-1",
      "language": "en"
    }

    This will lock the transcription to English and eliminate the language auto-detection issue.

     

    TL;DR:

    • SDK version v2.2.0-beta.4 doesn't yet support the Language field. 
    • Use REST API to explicitly set language: "en" for now.
    • PrathameshDeshmukh's avatar
      PrathameshDeshmukh
      Copper Contributor

      Thanks for your suggestion! I appreciate the workaround using the REST API to explicitly set the "language" parameter for Whisper transcription.

      However, in my case, I’m using the Azure.AI.OpenAI SDK (v2.2.0-beta.4), which handles API calls internally. Unfortunately, the ConversationInputTranscriptionOptions class in this SDK doesn’t currently expose a Language property, so I can't directly pass the "language" parameter through the SDK.

      Since the SDK abstracts away the HTTP layer, I can't inject custom parameters like "language" into the request body unless the SDK itself supports it. So it’s not applicable in my current setup.

  • In your InputTranscriptionOptions, you can specify the language using the Language property:

     

    InputTranscriptionOptions = new()
    {
        Model = "whisper-1",
        Language = "en"
    }

     

    This tells the model to skip language detection and transcribe the audio as English, which should eliminate the issue of it defaulting to Hindi, Urdu, or Chinese.

    • PrathameshDeshmukh's avatar
      PrathameshDeshmukh
      Copper Contributor

      Thanks for the reply
      I tried this, but this gave me a compile time error as : 
      Error (active)    CS0117    'ConversationInputTranscriptionOptions' does not contain a definition for 'Language'

      I am using Azure.AI.OpenAI with Version="2.2.0-beta.4".

Resources