Forum Discussion

voonsionglum's avatar
voonsionglum
Brass Contributor
Jul 20, 2022

Having an audio conversation with your Teams Bot

Hi,

 

We have recently started to explore the ability to have users call a Teams bot and have meaningful conversations.  The overall flow would be like this

1. User places an audio call to the Teams bot

2. Teams bot answers the call using MS Graph call:answer

3. Once the call has been established, the Teams bot uses the Microsoft Azure Cognitive Services Speech SDK to generate a WAV file and then posts to MS Graph call:recordResponse.  This will have the Teams bot speak a greeting and wait to hear the user's speech

4. With recording started, the user's speech will be saved locally.  The bot will use the Speech SDK to recognize the speech.  

5. The recognized text is then sent to our LUIS App so that an intent can be identified.

6. The bot handles the recognized intent, creates a text response, then uses the Speech SDK to generate a WAV file.  The WAV file is then posted to MS Graph call:recordResponse so that the response is spoken to the user, with the bot then enter into recording mode to get the user's next speech.

 

With the above approach, there is considerable pause between the user's last speech and the bot's audio response.  This is probably due to the number of API calls the bot has to do to recognize the user's speech and play an appropriate audio response.  Is there a better way to handle audio calls to a bot with Teams?  Is there a better way to utilize MS Graph with the Speech SDK?  We looked at 

Speech SDK's AudioConfig.fromDefaultMicrophoneInput() but it does not seem to work in a Teams channel.

 

Using the WindowsVoiceAssistantClient, we connect it to our speech resource and we find the conversation exchange between the user and the bot a lot more responsive.  Is it possible to implement this conversation experience in Teams?

 

Thank You

Resources