Announcing the Preview of OpenAI Whisper in Azure OpenAI service and Azure AI Speech

Microsoft

Sep 15, 2023

In July we shared with this audience that OpenAI Whisper would be coming soon to Azure AI services, and today – we are very happy to announce – is the day! Customers of Azure OpenAI service and Azure AI Speech can now use Whisper.

The OpenAI Whisper model is an encoder-decoder Transformer that can transcribe audio into text in 57 languages. Additionally, it offers translation services from those languages to English, producing English-only output. Furthermore, it creates transcripts with enhanced readability.

OpenAI Whisper model in Azure OpenAI service

Azure OpenAI Service enables developers to run OpenAI’s Whisper model in Azure, mirroring the OpenAI Whisper API in features and functionality, including transcription and translation capabilities.

The Whisper model's REST APIs for transcription and translation are available from the Azure OpenAI Service portal.

See details on how to use the Whisper model with the Azure OpenAI Service here: Speech to text with Azure OpenAI Service - Azure OpenAI | Microsoft Learn

OpenAI Whisper model in Azure AI Speech

Users of Azure AI Speech can leverage OpenAI’s Whisper model in conjunction with the Azure AI Speech batch transcription API. This enables customers to easily transcribe large volumes of audio content at scale. This capability is particularly useful for processing extensive collections of audio data stored within the Azure platform.

Users of Whisper in Azure AI Speech benefit from existing features including async processing, speaker diarization, customization (available soon), and larger file sizes.

Large file sizes: Azure AI Speech enhances Whisper transcription by enabling files up to 1GB in size and the ability to process large amounts of files by allowing you to batch up to 1000 files in a single request.
Time stamps: Using Azure AI Speech, the recognition result includes word-level timestamps, providing the ability to identify where in the audio each word is spoken.
Speaker diarization: This is another beneficial feature of Azure AI Speech that identifies individual speakers in an audio file and labels their speech segments. This feature allows customers to distinguish between speakers, accurately transcribe their words, and create a more organized and structured transcription of audio files.
Customization/Finetuning (available soon): The Custom Speech capability in Azure Speech allows customers to finetune Whisper on their own data to improve recognition accuracy and consistency.

See details on how to use the Whisper model with Azure AI Speech here: Create a batch transcription - Speech service - Azure AI services | Microsoft Learn

Getting started

Azure AI Studio

Users can use the Whisper model in Azure OpenAI through Azure AI Studio.

To gain access to Azure OpenAI Service, users need to apply for access.
Once approved, visit the Azure portal and create an Azure OpenAI Service resource.
After creating the resource, users can begin using Whisper.

Azure AI Speech Studio

Users can experiment with the Whisper model in Azure AI Speech Studio.

In Speech Studio you can find both the Whisper Model in Azure OpenAI Service try-out as well as the Batch speech to text try-out that now includes the Whisper model.

Speech to text try-outs and tools in Speech Studio:

The Batch speech to text try-out allows you to compare the output of the Whisper model side by side with an Azure Speech model as a quick initial evaluation of which model may work better for your specific scenario.

Comparing Whisper to Azure Speech model in the Batch speech to text try-out:

Conclusion

The Whisper model is a great addition to the broad portfolio of capabilities the Azure AI Speech Service offers. We are looking forward to seeing the innovative ways in which developers will take advantage of this new offering to improve business productivity and to delight users.

Updated Sep 15, 2023

Version 1.0

HeikoRa

Microsoft

Joined May 06, 2019

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

8 Comments

samberk
Copper Contributor
Sep 25, 2024
Hi,
Is Whisper Finetuning available on Azure?
corrinechen
Former Employee
Feb 18, 2024
Hi team, may I know if there is any roadmap of the AOAI Whisper Model please? Cx would love to have an overview of what's coming next for the Whisper model. Cheers!
shashwatv
Copper Contributor
Nov 17, 2023
Thanks a lot, team! I was looking to use Whisper via API for our chatbots.
Sachin2060
Copper Contributor
Nov 09, 2023
Same here.. we would like to know when can we expect to have a version available Speech SDK. We want to use this with Call Automation in ACS.

Thanks
JoseVAC
Copper Contributor
Oct 03, 2023
Hi,

I have an important question, When will we have a version for SDK with whisper?
And, When is the release date?

Thank You in advance.
Nik_Jan
Copper Contributor
Sep 27, 2023
I understand there are 2 different ways of using the Whisper model. In the article, there is even a screenshot where Whisper model can be used within Speech Studio to preview and test. I can't see the Whisper model in the drop-down menu to select it.

I can try it in Azure OpenAI Studio, but this service is not offering anything new related to OpenAI's Whisper API - no diarization, no word-level stamping etc.
rmappillai
Copper Contributor
Sep 20, 2023
Yesterday, I could get a batch response in Studio after 15-20 mins for a 1.5 min audio against the Whisper Preview model (the results were the same as OpenAI's Whisper API. Yay!). Today I don't even get a response at all even after waiting 30 mins (even tried deleting and recreating my EastUS Speech Service resource)! Unfortunately the Whisper model doesn't seem ready yet for usage. Reverting back to EastUS2 and the 20230315 Batch Transcription Model. OpenAI's Whisper API responds back in 7-15 seconds for this 1.5 mins audio (and others like it).
jacobmichalski
Copper Contributor
Sep 19, 2023
Super excited about this!!

I went to test it out in my Azure OpenAI Studio and do not have access to the whisper model. Is this an on-request preview?

Blog Post

Announcing the Preview of OpenAI Whisper in Azure OpenAI service and Azure AI Speech