Improve speech-to-text accuracy with Azure Custom Speech


Written by Andy Beatman, Sr. Product Marketing Manager, Azure AI


With Microsoft Azure Cognitive Services for Speech, customers can build voice-enabled apps confidently and quickly in more than 140 languages. We make it easy for customers to transcribe speech to text (STT) with high accuracy, produce natural-sounding text-to-speech (TTS) voices, and translate spoken audio. In the past few years, we are inspired by the ways customers seek our customization features to fine-tune speech recognition to their use cases.


As our speech technology continues to change and evolve, we want to introduce four custom speech-to-text capabilities and their respective customer use cases. With these features, you can evaluate and improve the speech-to-text accuracy for your applications and products. A custom speech model is trained on top of a base model. With a custom model, you can improve recognition of domain-specific vocabulary by providing text data to train the model. You can also improve recognition based on the specific audio conditions of the application by providing audio data with reference transcriptions.


Custom Speech data types and use cases

Our Custom Speech features will let you customize Microsoft's speech-to-text engine. You will be able to customize the language model by tailoring it to the vocabulary of the application and customize the acoustic model to adapt to the speaking style of your users. By uploading text and/or audio data through Custom Speech, you'll be able to create these custom models, combine them with Microsoft's state-of-the-art speech models, and deploy them to a custom speech-to-text endpoint that can be accessed from any device.


Read the full article

0 Replies