The OpenAI Whisper model is an encoder-decoder Transformer that can transcribe audio into text in 57 languages. Additionally, it offers translation services from those languages to English, producing English-only output. Furthermore, it creates transcripts with enhanced readability.
OpenAI Whisper model in Azure OpenAI service
Azure OpenAI Service enables developers to run OpenAI’s Whisper model in Azure, mirroring the OpenAI Whisper API in features and functionality, including transcription and translation capabilities.
The Whisper model's REST APIs for transcription and translation are available from the Azure OpenAI Service portal.
Users of Azure AI Speech can leverage OpenAI’s Whisper model in conjunction with the Azure AI Speech batch transcription API. This enables customers to easily transcribe large volumes of audio content at scale. This capability is particularly useful for processing extensive collections of audio data stored within the Azure platform.
Users of Whisper in Azure AI Speech benefit from existing features including async processing, speaker diarization, customization (available soon), and larger file sizes.
Large file sizes: Azure AI Speech enhances Whisper transcription by enabling files up to 1GB in size and the ability to process large amounts of files by allowing you to batch up to 1000 files in a single request.
Time stamps: Using Azure AI Speech, the recognition result includes word-level timestamps, providing the ability to identify where in the audio each word is spoken.
Speaker diarization: This is another beneficial feature of Azure AI Speech that identifies individual speakers in an audio file and labels their speech segments. This feature allows customers to distinguish between speakers, accurately transcribe their words, and create a more organized and structured transcription of audio files.
Customization/Finetuning (available soon): The Custom Speech capability in Azure Speech allows customers to finetune Whisper on their own data to improve recognition accuracy and consistency.
In Speech Studio you can find both the Whisper Model in Azure OpenAI Service try-out as well as the Batch speech to text try-out that now includes the Whisper model.
Speech to text try-outs and tools in Speech Studio:
The Batch speech to text try-out allows you to compare the output of the Whisper model side by side with an Azure Speech model as a quick initial evaluation of which model may work better for your specific scenario.
Comparing Whisper to Azure Speech model in the Batch speech to text try-out:
The Whisper model is a great addition to the broad portfolio of capabilities the Azure AI Speech Service offers. We are looking forward to seeing the innovative ways in which developers will take advantage of this new offering to improve business productivity and to delight users.