Speech Service Update - Pronunciation Assessment is Generally Available
Published Feb 13 2023 05:26 AM 9,324 Views

This post was co-authored by Yinhe Wei, Ke Wang, Lei He, Sheng Zhao, Qinying Liao, Yan Xia, and Nalin Mujumdar


An important element of language learning is being able to accurately pronounce words. Speech service on Azure supports Pronunciation Assessment to empower language learners and educators more. Pronunciation Assessment is generally available in American English, British English, Australian English, Chinese, French, German, Japanese and Spanish, with other languages available in preview.


The Pronunciation Assessment capability evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of the speech, allowing users to benefit from various aspects. 


Comprehensive evaluation near human experts


Pronunciation Assessment, a feature of Speech in Azure Cognitive Services, provides subjective and objective feedback to language learners in computer-assisted language learning.  For language learners, practicing pronunciation and getting timely feedback are essential for improving language skills.  The assessment is conventionally driven by experienced teachers, which normally takes a lot of time and big efforts, making high-quality assessment expensive to learners.  Pronunciation Assessment, a novel AI driven speech capability, is able to make language assessment more engaging and accessible to learners of all backgrounds. 


Pronunciation Assessment provides various assessment results in different granularities, from individual phonemes to the entire text input. At the phoneme level, Pronunciation Assessment provides accuracy scores of each phoneme, helping learners to better understand the pronunciation details of their speech.  At the word-level, Pronunciation Assessment can automatically detect miscues and provide accuracy score simultaneously, which provides more detailed information on omission, repetition, insertions, and mispronunciation in the given speech.  At the full-text level, Pronunciation Assessment offers additional Fluency and Completeness scores: Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words, and Completeness indicates how many words are pronounced in the speech to the reference text input. An overall score aggregated from Accuracy, Fluency and Completeness is then given to indicate the overall pronunciation quality of the given speech.  With these features, learners can easily know the weakness of their speech, and improve with target goals. 


With Pronunciation Assessment, language learners can practice, get instant feedback, and improve their pronunciation. Online learning solution providers or educators can use the capability to evaluate pronunciation of multiple speakers in real-time.   

Pearson uses Pronunciation Assessment in Longman English Plus to empower both students and teachers to improve the productivity in language learning, with a personalized placement test feature and learning material recommendations for students at different levels. As the world’s leading learning company, Pearson enables tens of millions of learners every year to maximize their success. Key technologies from Microsoft used in Longman English Plus are Pronunciation Assessment, neural text-to-speech and natural language processing. Check below video for a demo of the Longman English learning app and learn more from the customer story



We enriched Longman with Cognitive Services to help reduce the workload of the primary school teachers while providing a companion for students to optimize their daily interactions in English. 

Joe Lam, Managing Director, Greater China and Southeast Asia, English Language Learning Division, Pearson


BYJU'S chooses Speech service on Azure to build the English Language App (ELA) to their target geographies where English is used as the secondary language and is considered an essential skill to acquire. The app blends the best of pedagogy using state-of-the-art speech technology to help children gain command over language with ease in a judgement-free learning environment. With a conversation-first interface, this app enables students to learn, and practice English while working on their language skills in a fun, engaging and effective manner. BYJU’S is using the Speech to Text and Pronunciation Assessment capabilities to ensure that children master English with ease - to practice speaking and receive feedback on pronunciation with phoneme, word and sentence-level pronunciation and fluency scores. BYJU'S ELA assesses pronunciation of students through speaking games, identifies areas of improvement, and provides personalized and adaptive lessons to help students improve in their weak areas. 



Mispronunciation detection and diagnosis


Mispronunciation Detection and Diagnose (MDD) is the core technique employed in Pronunciation Assessment, scoring word-level pronunciation accuracy, which provides judgement on miscues and contributes to the overall assessment.  To provide precise and consistent result, Pronunciation Assessment employs the latest powerful neural networks for modelling, exploiting information from lower senone granularity to higher word granularity with the use of hierarchical architecture. This design enables Pronunciation Assessment to fully exploit the detailed pronunciation information from small patterns, making mispronunciation detection more accurate and robust.  With 100,000+ hours training data on different accents, regions and ages, Pronunciation Assessment can also handle different scenarios with various users, for example, from kids to adults, from none-native speakers to native speakers, and provide trustable and consistent assessment performance. 


Teams Reading Progress uses Pronunciation Assessment to help students improve reading fluency, after the pandemic negatively affected students’ reading ability. It can be used inside and outside of the classroom to save teachers' time and improve learning outcomes for students. Learn how to get started.




“Reading Progress is built on the solid scientific foundation of oral repeated reading and close monitoring by the educator. It allows educators to provide personal attention to each student while at the same time dealing with a whole classroom full of students.”

Tim Rasinski, Professor of Literacy Education at Kent State University


Cutting-edge free-style speech assessment


Pronunciation Assessment also supports spontaneous speech scenarios.  Spontaneous speech, also known as free-style talk, is the scenario where speakers are giving speech without any prefixed reference, like in presentation and spoken language examination.  Empowered with Azure Speech-to-Text, Pronunciation Assessment can automatically transcribe a given speech accurately, and provide assessment result on aforementioned granularities


Pronunciation Assessment is used in PowerPoint Presenter Coach to advise presenters on the correct pronunciation of spoken words throughout their rehearsal. When Presenter Coach perceives that you may have mispronounced a word, it will display the word(s) and provide an experience that helps you practice pronouncing the word correctly. You’ll be able to listen to a recorded pronunciation guide of the word as many times as you’d like. 




Get started


To learn more and get started, you can first try out Pronunciation Assessment to evaluate a user’s fluency and pronunciation with the no-code tool provided in Speech Studio, which allows you to explore the Speech service with intuitive user interface. You need an Azure account and a Speech service resource before you can use Speech Studio. If you don't have an account and subscription, try the Speech service for free. 


Here are more resources to help you add speech to your educational applications: 



Version history
Last update:
‎Nov 15 2023 05:22 PM
Updated by: