Take up the challenge - Build a speech translator app using Azure

VidushiGupta · ‎Apr 04 2024

Introduction

Join Charles and me on this exciting adventure as we dive into creating a speech translator app with Azure! This journey is far more than a simple coding challenge; it's about shattering language barriers and knitting the world a bit closer, one translated sentence at a time. Follow along this Challenge project and/or our Learn Live session and build along!

What are we building? What are the prerequisites?

Imagine being able to speak in your native language and have your words instantly translated and spoken in another language. That's exactly what we're creating—a speech translator app that seamlessly converts spoken words from one language to another. This app will listen to your voice, transcribe your words into text, translate that text into a language of your choice, and then speak the translation back to you. It's like having a personal interpreter in your pocket, ready to help you communicate with anyone, anywhere.

Our app will leverage Azure's powerful services, including the Speech-to-Text, Translator, and Text-to-Speech APIs, to handle the heavy lifting of transcription, translation, and vocalization. By the end of this guide, you'll have a fully functional app that can break down language barriers and bring people closer together.

Prerequisites:

An Azure account: If you don't have one already, sign up for a free account at Azure's website. This will give you access to all the services we'll be using.
Power Apps and Power Automate: Our app will utilize Power Apps for the user interface and Power Automate for handling the workflow between Azure services. A basic understanding of these tools will be beneficial, but not required. We'll guide you through every step.
Curiosity and creativity: Most importantly, bring your sense of adventure and imagination!

Why is this a challenging project?

Authentication and Security Management
Ensuring secure and effective management of authentication keys was crucial, especially for accessing Azure Blob Storage and the Text-to-Speech connector. Balancing security with accessibility required meticulous planning.
Handling Diverse Connectors
Our project depended on integrating multiple Azure services, each requiring a different language or markup, such as SSML for the Speech API. Coordinating these connectors demanded a deep understanding of each service's specifications.
Collaborative Dynamics
With a team of four distributed across the globe, coordinating our efforts, sharing resources, and aligning our ideas was both a challenge and a learning opportunity. Overcoming the hurdles of remote collaboration was key to our project's success.

Where can you look for help?
Navigating the complexities of our speech translator app required extensive research and support. Here are the invaluable resources that guided us through the process:

Speech Studio on Microsoft.com
For mastering SSML (Speech Synthesis Markup Language) and refining our text-to-speech output, Speech Studio was our go-to resource. It provided the necessary tools and documentation to enhance the app's vocal responses. Speech Studio (microsoft.com)
Power Automate Connectors on Microsoft Learn
With a plethora of connectors to manage, the comprehensive list of Power Automate connectors on Microsoft Learn was indispensable. It helped us understand and implement the connectors essential for our app's functionality. List of all Power Automate connectors | Microsoft Learn
Azure Batch Speech-to-Text on Microsoft Power Automate
The Azure Batch Speech-to-Text templates available on Microsoft Power Automate significantly accelerated our development process. These templates served as a foundational resource, streamlining the transcription component of our app. Azure Batch Speech-to-text | Microsoft Power Automate

How did we build this?

General workflow

Creating this speech translator app involves breaking down the process into three intuitive steps. Here’s how we do it:

Speech-to-Text (Transcription): First, you we the audio we've recorded and convert it into written text. This means whatever we say into the microphone gets turned into words on a screen.
Translation: Next, we take that written text and translate it into the language you want. It’s like having a bilingual friend who can quickly tell us how to say something in another language.
Text-to-Speech: Finally, we transform the translated text back into audio. So, instead of reading the translation, we’ll hear it spoken in the language of our choice.

In essence, you speak in one language, and the app delivers your message in another language, all through audio. It’s a simple journey from speaking to listening, with a bit of tech magic in between!

(PS. After reading this blog, you can do the tech magic yourself)

Part 1: Get transcribed text from recorded audio

On the PowerApps canvas, add a microphone, a button, and a text box.

Navigate to the Data section, select 'Add Data', and establish connections to both Azure Blob Storage and Azure Batch Speech-to-Text.

Create a new variable named newvtext and assign it the output from a Power Automate run. Use textsendback as the variable to store the transcribed text.

Set(newvtext, SwitchCaseAdded.Run().textsendback)

Place this code within the OnSelect event for the button.

The text box is configured to display the text from the variable that holds the transcription.
The PowerApps (V2) connector links the PowerApp directly to the specified Power Automate flow.

Create a variable named varTextFromAudio to store the transcribed text.

This section of code establishes a connection to Blob Storage, then generates a SAS token and path.

Now, use the Blob path obtained from the previous step to initiate the transcription process.

You need to execute the following two sections of code to enable the 'apply to each' step to locate the transcription text.

substring(outputs('Create_transcription')?['body/self'], add(lastIndexOf(outputs('Create_transcription')?['body/self'], '/'), 1))

You'll also require this section of code to enumerate all files generated by this process.

substring(outputs('Create_transcription')?['body/self'], add(lastIndexOf(outputs('Create_transcription')?['body/self'], '/'), 1))

Much of this step was adapted from the template. Essentially, it iterates through all outputs, and when it encounters an item where the "kind" equals "transcription", it captures that value and stores it in the variable.

Some recommendations:

Test Frequently: Keep a tab open on make.powerautomate.com dedicated to your flow. Regularly run tests, especially after each run or upon adding a new block, to verify that block and review the output.
Monitor Variables: Open another tab on make.powerapps.com and pay close attention to the variables as you execute each control on your canvas.
Use Postman for API Testing: Have Postman running and test each of your API calls to the connectors. Converting code from Microsoft Learn documents into Postman configurations using ChatGPT can be both fun and satisfying. It's rewarding to hit send and receive an actual response!

Here's the overview of the complete Power Automate flow setup to get the first part working:

Part 2: Translate transcribed text

Let's simplify the process of translating transcribed text into another language, making it user-friendly and intuitive. Here’s a step-by-step guide to setting up your app for language translation:

Choose Your Language: To allow users to select their desired translation language, add a dropdown widget to your PowerApp canvas. Populate the dropdown with a list of languages you want to offer, like French, German, and Hindi. Feel free to customize this list to your preference!

Capture the Language Selection: When a user picks a language from the dropdown, you need to translate this choice into a language code that your translation service understands. For this, use the following code in the dropdown's “OnChange” property:

If(Dropdown1.Selected.Value = "French", Set(SelectedLanguage, "fr"), If(Dropdown1.Selected.Value = "German", Set(SelectedLanguage,"de"), If(Dropdown1.Selected.Value = "Hindi", Set(SelectedLanguage, "hi"))))

This code assigns the appropriate language code to a variable based on the user's selection.

Set Up the Backend Flow: Behind the scenes, you'll create a workflow in Power Automate that takes the user's language choice and the text to be translated, processes it, and then returns the translated text. Here’s how to set it up:

Capture Inputs: The flow starts by receiving the selected language and the text from the app.
Initialize a Variable: Set up a variable to store the translated text.
Translate the Text: Use a "Translate Text" action to convert the text into the selected language, looping through the results to extract the translated text.
Return the Output: Finally, send the translated text back to the app.

Trigger the Translation: Add a button to your app that, when clicked, runs the translation flow with the selected language and the text input as parameters. Use this code snippet to connect the button to the action:

Set(results,TranslateTranscribedText.Run(SelectedLanguage,TextInput1.Text))

Display the Results: To show the translated text to the user, add a text display area to your app and set its content to the translation result.

Test this part and wohoo, you just finished 2 out of 3 parts of your app.

Part 3 - Convert translated text to speech

To complete our speech translator app, we now focus on transforming translated text back into speech, making the translation audible.

1. Build the Flow: Start by creating a new flow in Power Automate. This flow is responsible for converting text into speech. Set up the nodes as shown in the screenshot below

2. Configure Inputs and Outputs: Ensure your flow can receive the translated text from your app and then output audio. The key steps include:

Setting up a trigger that initiates the flow when translated text is available.
Adding actions within the flow to process the text through the text-to-speech service.

3. Run the Flow: With the flow built and configured, trigger it from within your app. This step involves:

Calling the flow with the translated text as its input.
Processing the text through the flow to generate speech.

4. Output the Result in Your App: After the flow runs and converts the text to speech, you'll need to handle the output. This involves:

Receiving the audio file or stream generated by the flow.
Playing back the audio in your app so the user can hear the translation.

5. Integrate and Test!

Impact of an application of this kind?

A Unified Development Environment
Utilizing PowerApps, PowerAutomate, and the Speech API together offers a comprehensive low-code solution for developing powerful applications. This trio allows for rapid prototyping and deployment, demonstrating the potential of Microsoft's ecosystem in facilitating app development.
The Challenge of Specialized Implementation
The most intricate aspects of our project revolved around debugging, managing various connectors, and implementing specialized code, such as SSML. These tasks underscored the complexity behind the seamless operation of our app, highlighting the importance of detailed technical planning and execution.
The Role of AI in Bridging Language Gaps
Employing GPT-4 for real-time language translation showcased the extraordinary capabilities of AI in overcoming language barriers. The irony of using computer-generated translations to solve human communication challenges illustrates the evolving relationship between technology and language. This application not only serves as a practical communication tool but also as a testament to the power of AI in fostering global connectivity.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs