Blog Post

AI - Azure AI services Blog
4 MIN READ

Transcribe audio to text from blob storage without writing any code using Power Automate

ArcherZ's avatar
ArcherZ
Icon for Microsoft rankMicrosoft
Mar 27, 2023

We are happy to introduce the Power Automate Flow template "Transcribe audio files to text from Azure Blob" that helps you to automatically transcribe audio files to text from Azure Blob storage, then save the transcribed text back to Blob storage. By leveraging the Azure AI Speech Batch Transcription, it is able to support more than 100 languages and dialects with best-in-class transcription accuracy.

 

We have created a step-by-step tutorial below to assist you in getting started with the Power Automate Flow template.

 

Step-by-step tutorial

 

Prerequisites

 

Step 1: Set up Azure Speech service Key and Region

  • After your Speech resource is deployed, you can go to Azure portal -> Go to resource -> Keys and Endpoint to view and manage keys. The Speech resource key and region will be required later for the Connector setup.

 

For more information about Cognitive Services resources, see Get the keys for your resource.

 

Step 2: Set up Azure Storage account and Blob container

  • After your Azure Storage resource is deployed, you can go to Azure portal -> Go to resource -> Access Keys to view and manage keys. The storage account name and key will be required later for the Connector setup.

 

  • You also need to create a new container or use an existing one to store your audio files. The container name will be used will be required later for the Connector setup as well. Here we created a container named “mypowercontainer” as an example.

 

Step 3: Create a Power Automate flow from Template

  • Sign in to Power Automate portal. From the left side menu, select My flows. Then select Automated cloud flow > Start from a template

 

 

 

  • Set up the Connection for Azure Blob Storage Connector. You can select an existing connection or add a new. Here we will add a new connection as an example. In the Authentication type dropdown, there are three types of authentications supported: , Access Key, Azure AD Integrated. Here we are using “Access Key” for authentication, and you need to input the account name and key from earlier steps.

 

If you want to use other alternative authentication types for the connection, learn more from Azure Blob Storage - Connectors | Microsoft Learn

 

  • Set up the Connection for Azure Batch Speech-to-text Connector. You can select an existing connection or add a new. Here we will add a new connection as an example. In the Authentication type dropdown, there are two types of authentications supported: Access Key, Azure AD Integrated. Here we are using “Access Key” for authentication, and you need to input the key and region from Step 1.

 

If you want to use Azure-AD for the connection, learn more from Authentication in Azure Cognitive Services - Azure Cognitive Services | Microsoft Learn

 

  • Then you should be able to use the template and edit on top of it. The original template includes actions of “When a blob is added or modified” and “Check audio format and transcribe into test”, with several conditions and variables.

 

  • In the action of “When a blob is added or modified”, input the container name you have from Step 2.

 

  • In the variable of “Input locale”, you need to input a locale variable that matches with your desired audio contents (here we are using en-us as an example). Learn more about Speech service supported languages and locales here.

 

Alternatively, Iin the action of “Create transcription”, you can also enable automatic Language Identification as an advanced option to identify languages spoken in audio when compared against a list of supported languages. Learn more about Language Identification in Speech Service.

 

Besides, you can also specify more advanced settings for the batch transcription service (enabling word level timestamps, number of audio channels, profanity filter, etc) per your needs.

 

The rests are all automatically configured. Now you can save the Flow and start to use it once it’s successfully saved.

 

Step 4: Run and test your Flow

  • Now let’s quickly test your automation flow. You can go back to the Storage container from Azure portal and upload an audio file. The Speech service supports audio formats of .wav, .ogg, .mp3. Here we upload “my audio.wav” file as an example.

 

  • Wait for a few seconds, if everything goes well, you should be able to see two folders (trans, log) created under the same container, and in the trans folder it contains the recognized plain text file as well as the detailed JSON output.

 

Enjoy your automation with Azure Speech service :smile:

 

For more information

Updated Mar 27, 2023
Version 3.0
  • SaraHesham's avatar
    SaraHesham
    Copper Contributor

    i followed this tutorial and worked perfect for me, thanks for sharing the template.

  • Tom_Murphy's avatar
    Tom_Murphy
    Copper Contributor

    I followed this and it does not work. There is no txt file of the transcribed text left in the blob.

     

    edit, removed and added the template again and it worked this time.

  • MHenn2207's avatar
    MHenn2207
    Copper Contributor

    Thank you so much for providing this detailed tutorial. I intend to utilize it for Batch Transcription with Speaker Diarization. However, despite modifying the diarization setting to "yes," the text output does not display any speakers. Although the JSON file does include speaker information towards the end, the current output does not meet my requirements. Currently, it appears as follows:

     

    [{"displayText":"This","offset":"PT0.56S","duration":"PT0.32S","offsetInTicks":5600000.0,"durationInTicks":3200000.0},{"displayText":"episode","offset":"PT0.88S","duration":"PT0.36S","offsetInTicks":8800000.0,"durationInTicks":3600000.0},{"displayText":"of","offset":"PT1.24S","duration":"PT0.08S","offsetInTicks":12400000.0,"durationInTicks":800000.0},{"displayText":"Yap","offset":"PT1.32S","duration":"PT0.32S","offsetInTicks":13200000.0,"durationInTicks":3200000.0},{"displayText":"is","offset":"PT1.64S","duration":"PT0.24S","offsetInTicks":16400000.0,"durationInTicks":2400000.0},{"displayText":"sponsored","offset":"PT1.88S","duration":"PT0.52S","offsetInTicks":18800000.0,"durationInTicks":5200000.0},{"displayText":"by","offset":"PT2.4S","duration":"PT0.24S","offsetInTicks":24000000.0,"durationInTicks":2400000.0},{"displayText":"Shopify.","offset":"PT2.64S","duration":"PT0.8S","offsetInTicks":26400000.0,"durationInTicks":8000000.0}]}]}

     

    It should be like this: 

    [Sprecher 1 00:00] This episode of Yap is sponsored by Shopify. Shopify simplifies selling online and in person so you can focus on successfully growing your business. Sign up for a $1.00 per month trial period at shopify.com/profiting.

     

    Do you have any advise for me how to solve the problem? Thanks a lot! 

  • zenAlex's avatar
    zenAlex
    Copper Contributor

    I attempted this today. Unfortunately, I am getting 401 unauthorized errors. I tried with our existing speech service and created a new speech service. Have the requirements changed for this to work?