How to recognize and synthesize speech on Azure - GOL Clinics Recap

Microsoft

Aug 15, 2022

Game of Learners Clinics for ML (Machine learning) and AI (Artificial intelligence) is a 5-week Skilling initiative for students to level up in Artificial Intelligence on Azure. Find out more on previous sessions at http://aka.ms/golaiml-home

Natural Language Processing (NLP) involves analyzing text documents or phrases to gain insights into the content of the text. It is also the ability of a computer program to understand human language as it is spoken and/or written.

In this blog we explore natural language processing about speech recognition and synthesis. In the end you will see a demo of speech to text and text to speech.

Azure Resources for Speech Services:

Using Microsoft Azure, provision resources under cognitive services for speech. Using the services, you can perform several actions including translating speech to text and vice versa, speech translation and speaker recognition.

Speech to Text API

As the world becomes a global village with organizations needing to collaborate with people in different geographical regions, the removal of language barriers has become key. One solution has been through translation. Text translation can be used to translate documents from one language to another whereas speech translation is between spoken languages. Sometimes, speech translation may also involve speech to text translation.

Using speech to text API you can perform real-time to batch transcription of audio to text. As you analyze text/documents using the Language Cognitive Service, you can:

Real Time and batch transcription
Supports any form of audio source
Based on the Universal Language Model, trained by Microsoft. The model is optimized for both conversational and dictation scenarios.

Text to Speech API

Text to speech on the other hand involves generating spoken audio from text. The speech service language support enables you to translate in over 60 languages. The text to speech API enables you to convert text input into audible speech that can be directly played on your computer speaker or written to an audio file. It has the following characteristics:

Used to convert text input to audible speech
Supports multiple languages and regional pronunciation
Supports standard voices as well as neural voice.

Reference and Resources:

Follow along and build your Bot with Azure Bot Service at: https://aka.ms/gol-nlp

Updated Aug 03, 2022

Version 1.0

bethanyjep

Microsoft

Joined April 26, 2022

View Profile

Educator Developer Blog

Follow this blog board to get notified when there's new activity

Blog Post

How to recognize and synthesize speech on Azure - GOL Clinics Recap

Share