Blog Post

AI - Azure AI services Blog
3 MIN READ

Announcing the Preview of OpenAI Whisper in Azure OpenAI service and Azure AI Speech

HeikoRa's avatar
HeikoRa
Icon for Microsoft rankMicrosoft
Sep 15, 2023

In July we shared with this audience that OpenAI Whisper would be coming soon to Azure AI services, and today – we are very happy to announce – is the day! Customers of Azure OpenAI service and Azure AI Speech can now use Whisper.

 

The OpenAI Whisper model is an encoder-decoder Transformer that can transcribe audio into text in 57 languages. Additionally, it offers translation services from those languages to English, producing English-only output. Furthermore, it creates transcripts with enhanced readability.

 

OpenAI Whisper model in Azure OpenAI service

Azure OpenAI Service enables developers to run OpenAI’s Whisper model in Azure, mirroring the OpenAI Whisper API in features and functionality, including transcription and translation capabilities.

The Whisper model's REST APIs for transcription and translation are available from the Azure OpenAI Service portal.

See details on how to use the Whisper model with the Azure OpenAI Service here: Speech to text with Azure OpenAI Service - Azure OpenAI | Microsoft Learn

 

OpenAI Whisper model in Azure AI Speech

Users of Azure AI Speech can leverage OpenAI’s Whisper model in conjunction with the Azure AI Speech batch transcription API. This enables customers to easily transcribe large volumes of audio content at scale. This capability is particularly useful for processing extensive collections of audio data stored within the Azure platform.

Users of Whisper in Azure AI Speech benefit from existing features including async processing, speaker diarization, customization (available soon), and larger file sizes.

  • Large file sizes: Azure AI Speech enhances Whisper transcription by enabling files up to 1GB in size and the ability to process large amounts of files by allowing you to batch up to 1000 files in a single request.
  • Time stamps: Using Azure AI Speech, the recognition result includes word-level timestamps, providing the ability to identify where in the audio each word is spoken.
  • Speaker diarization: This is another beneficial feature of Azure AI Speech that identifies individual speakers in an audio file and labels their speech segments. This feature allows customers to distinguish between speakers, accurately transcribe their words, and create a more organized and structured transcription of audio files.
  • Customization/Finetuning (available soon): The Custom Speech capability in Azure Speech allows customers to finetune Whisper on their own data to improve recognition accuracy and consistency.

See details on how to use the Whisper model with Azure AI Speech here: Create a batch transcription - Speech service - Azure AI services | Microsoft Learn

 

Getting started

Azure AI Studio

 Users can use the Whisper model in Azure OpenAI through Azure AI Studio.

  • To gain access to Azure OpenAI Service, users need to apply for access.
  • Once approved, visit the Azure portal and create an Azure OpenAI Service resource.
  • After creating the resource, users can begin using Whisper.

 

Azure AI Speech Studio

Users can experiment with the Whisper model in Azure AI Speech Studio. 

 

In Speech Studio you can find both the Whisper Model in Azure OpenAI Service try-out as well as the Batch speech to text try-out that now includes the Whisper model.

 

Speech to text try-outs and tools in Speech Studio:

 

The Batch speech to text try-out allows you to compare the output of the Whisper model side by side with an Azure Speech model as a quick initial evaluation of which model may work better for your specific scenario.

Comparing Whisper to Azure Speech model in the Batch speech to text try-out:

 

Conclusion

The Whisper model is a great addition to the broad portfolio of capabilities the Azure AI Speech Service offers. We are looking forward to seeing the innovative ways in which developers will take advantage of this new offering to improve business productivity and to delight users.

Updated Sep 15, 2023
Version 1.0
  • rmappillai's avatar
    rmappillai
    Copper Contributor

    Yesterday, I could get a batch response in Studio after 15-20 mins for a 1.5 min audio against the Whisper Preview model (the results were the same as OpenAI's Whisper API. Yay!). Today I don't even get a response at all even after waiting 30 mins (even tried deleting and recreating my EastUS Speech Service resource)! Unfortunately the Whisper model doesn't seem ready yet for usage. Reverting back to EastUS2 and the 20230315 Batch Transcription Model. OpenAI's Whisper API responds back in 7-15 seconds for this 1.5 mins audio (and others like it).

  • jacobmichalski's avatar
    jacobmichalski
    Copper Contributor

    Super excited about this!! 

     

    I went to test it out in my Azure OpenAI Studio and do not have access to the whisper model. Is this an on-request preview?

  • Nik_Jan's avatar
    Nik_Jan
    Copper Contributor

    I understand there are 2 different ways of using the Whisper model. In the article, there is even a screenshot where Whisper model can be used within Speech Studio to preview and test. I can't see the Whisper model in the drop-down menu to select it. 

     

    I can try it in Azure OpenAI Studio, but this service is not offering anything new related to OpenAI's Whisper API - no diarization, no word-level stamping etc. 

  • JoseVAC's avatar
    JoseVAC
    Copper Contributor

    Hi,

     

    I have an important question, When will we have a version for SDK with whisper?

    And, When is the release date?

    Thank You in advance.

  • Sachin2060's avatar
    Sachin2060
    Copper Contributor

    Same here.. we would like to know when can we expect to have a version available Speech SDK. We want to use this with Call Automation in ACS. 

    Thanks

  • shashwatv's avatar
    shashwatv
    Copper Contributor

    Thanks a lot, team! I was looking to use Whisper via API for our chatbots.

  • Hi team, may I know if there is any roadmap of the AOAI Whisper Model please? Cx would love to have an overview of what's coming next for the Whisper model. Cheers!