This post was co-authored by Yinhe Wei, Ke Wang, Lei He, Sheng Zhao, Qinying Liao, Yan Xia, and Nalin Mujumdar
An important element of language learning is being able to accurately pronounce words. Speech service on Azure supportsPronunciation Assessmenttoempower language learners and educators more. Pronunciation Assessment is generally available in American English, British English, Australian English, Chinese, French, German, Japanese and Spanish, with otherlanguagesavailable in preview.
The Pronunciation Assessment capability evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of the speech, allowing users to benefit from various aspects.
Comprehensive evaluation near human experts
Pronunciation Assessment, a feature of Speech in Azure Cognitive Services, provides subjective and objective feedback to language learners incomputer-assisted language learning. For language learners, practicing pronunciation and getting timely feedback are essential for improving language skills. The assessment is conventionally driven by experienced teachers, which normally takes alot oftime and big efforts, making high-quality assessment expensive to learners. Pronunciation Assessment, a novel AI driven speech capability, is able to make language assessment more engaging and accessible to learners of all backgrounds.
Pronunciation Assessment provides variousassessment results in different granularities, from individual phonemes to the entire text input. At the phoneme level, Pronunciation Assessment provides accuracy scores of each phoneme, helping learners to better understand the pronunciation details of their speech. At the word-level, Pronunciation Assessment can automatically detect miscues and provide accuracy score simultaneously, which provides more detailed information on omission, repetition, insertions, and mispronunciation in the given speech. At the full-text level, Pronunciation Assessment offers additional Fluency and Completeness scores: Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words, and Completeness indicates how many words are pronounced in the speech to the reference text input. An overall score aggregated from Accuracy, Fluency and Completeness is then given to indicate the overall pronunciation quality of the given speech. With these features, learners can easily know the weakness of their speech, and improve with target goals.
With Pronunciation Assessment, language learners can practice, get instant feedback, and improve their pronunciation. Online learning solution providers or educators can use the capability to evaluate pronunciation of multiple speakers in real-time.
“We enriched Longman with Cognitive Services to help reduce the workload of the primary school teachers while providing a companion for students to optimize their daily interactions in English.”
— Joe Lam, Managing Director, Greater China and Southeast Asia, English Language Learning Division, Pearson
BYJU'S chooses Speech service on Azure to build theEnglish Language App (ELA)to their target geographies where English is used as the secondary language and is considered an essential skill to acquire. The app blends the best of pedagogy using state-of-the-art speech technology to help children gain command over language with ease in a judgement-free learning environment. With a conversation-first interface, this app enables students to learn, and practice English while working on their language skills in a fun, engaging and effective manner. BYJU’S is using the Speechto TextandPronunciation Assessment capabilities to ensure that children master English with ease - to practice speaking and receive feedback on pronunciation with phoneme, word and sentence-level pronunciation and fluency scores. BYJU'S ELA assesses pronunciation of students through speaking games, identifies areas of improvement, and provides personalized and adaptive lessons to help students improve in their weak areas.
Mispronunciation detection and diagnosis
Mispronunciation Detection and Diagnose (MDD) is the core technique employed in Pronunciation Assessment, scoring word-level pronunciation accuracy, which providesjudgement on miscues and contributes to the overall assessment. To provide precise and consistentresult, Pronunciation Assessment employs the latest powerful neural networks for modelling, exploiting information from lower senone granularity to higher word granularity with the use of hierarchical architecture.This design enables Pronunciation Assessment to fully exploit the detailed pronunciation information from small patterns, making mispronunciation detection more accurate and robust. With 100,000+ hours training data on different accents, regions and ages, Pronunciation Assessment can also handle different scenarios with various users, for example, from kids to adults, from none-native speakers to native speakers, and provide trustable and consistent assessment performance.
Teams Reading Progressuses Pronunciation Assessment to help students improve reading fluency, after the pandemic negatively affected students’ reading ability. It can be used inside and outside of the classroom to save teachers' time and improve learning outcomes for students. Learnhow to get started.
“Reading Progress is built on the solid scientific foundation of oral repeated reading and close monitoring by the educator. It allows educators to provide personal attention to each student while at the same time dealing with a whole classroom full of students.”
— Tim Rasinski, Professor of Literacy Education at Kent State University
Cutting-edge free-style speech assessment
Pronunciation Assessment alsosupportsspontaneous speech scenarios. Spontaneous speech, also known as free-style talk, is the scenario where speakers are giving speech without any prefixed reference, like in presentation and spoken language examination. Empowered with AzureSpeech-to-Text, Pronunciation Assessment can automatically transcribe a given speech accurately, and provide assessment result onaforementioned granularities.
Pronunciation Assessment is used inPowerPoint Presenter Coachto advise presenters on the correct pronunciation of spoken words throughout their rehearsal. When Presenter Coach perceives that you may have mispronounced a word, it will display the word(s) and provide an experience that helps you practice pronouncing the word correctly. You’ll be able to listen to a recorded pronunciation guide of the word as many times as you’d like.
To learn more and get started,you can first try out Pronunciation Assessmentto evaluate a user’s fluency and pronunciationwith theno-code tool provided inSpeech Studio, whichallows you to explore the Speech service with intuitive user interface.You need an Azure account and a Speech service resource before you can useSpeech Studio. If you don't have an accountandsubscription,try the Speech service for free.
Here are more resources to help youadd speech to your educational applications: