We are thrilled to announce the Public Preview of Fast Transcription service in Azure AI Speech, which allows customers and developers to transcribe audio file to text accurately and synchronously, with a high speed factor.
Fast Transcription service includes our latest end-to-end model technologies, with best quality and super high Speed Factor leveraging GPU inference (typically it can transcribe a 20-minute audio file in less than 1 minute).
Supported Locales: en-US, zh-CN, fr-FR, it-IT, es-ES, es-MX, ja-JP, ko-KR, pt-BR, hi-IN (more coming soon, learn more about language support in Speech service)
Supported Regions: East US, Southeast Asia, West Europe, Central India (learn more about region support in Speech service. For some unsupported regions it may still return the transcription result correctly but with slower speed)
Supported Audio Formats / Codecs: WAV, MP3, OPUS/OGG, FLAC, WMA, AAC, ALAW in WAV container, MULAW in WAV container, AMR, WebM, M4A, and SPEEX.
You can use Fast Transcription via Speech-to-text REST API (2024-05-15-preview or later versions)
The fast transcription API uses multipart/form-data to submit audio files for transcription. The API returns the transcription results synchronously.
Construct the request body according to the following instructions:
locales
property. This value should match the expected locale of the audio data to transcribe. To enable automatic language detection (support soon in future versions), you need to input a list of candidate languages.profanityFilterMode
property to specify how to handle profanity in recognition results. Accepted values are None
to disable profanity filtering, Masked
to replace profanity with asterisks, Removed
to remove all profanity from the result, or Tags
to add profanity tags. The default value is Masked
.channels
property to specify the zero-based indices of the channels to be transcribed separately. If not specified, multiple channels are merged and transcribed jointly. Only up to two channels are supported. If you want to transcribe the channels from a stereo audio file separately, you need to specify [0,1]
here. Otherwise, stereo audio will be merged to mono, mono audio will be left as is, and only a single channel will be transcribed. In either of the latter cases, the output has no channel indices for the transcribed text, since only a single audio stream is transcribed.diarizationSettings
to recognize and separate multiple speakers on mono channel audio file. You need to specify the minimum and maximum number of people who might be speaking in the audio file (for example, "diarizationSettings": {"minSpeakers": 1, "maxSpeakers": 4}
). Then the transcription file will contain a "speaker" entry for each transcribed phrase. The feature isn't available with stereo audio when you set the channels
property as [0,1]
.Make a multipart/form-data POST request to the endpoint with the audio file and the request body properties. The following example shows how to create a transcription using the fast transcription API.
YourSubscriptionKey
with your Speech resource key.YourServiceRegion
with your Speech resource region.YourAudioFile
with the path to your audio file.
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-05-15-preview' \
--header 'Content-Type: multipart/form-data' \
--header 'Accept: application/json' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{\"locales\":[\"en-US\"], \"diarizationSettings\": {\"minSpeakers\": 1, \"maxSpeakers\": 4}, \"profanityFilterMode\": \"Masked\", \"channels\": [0]}"'
AI Studio -> AI Services -> Speech -> Fast Transcription
With the Fast Transcription Service, you can streamline your workflows, enhance productivity, and unlock new possibilities in speech-enabled scenarios, such as Copilots, audio/video caption and edit, video translation, post-call analytics, etc.
Learn more about the practice of how ClipChamp is using for auto captioning, and how OPPO is using in their AI phones.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.