Forum Discussion

nsouth1625's avatar
nsouth1625
Copper Contributor
Feb 10, 2021

Can you clarify the flow of messages from a Direct Line Speech client to the bot service?

I've had a hard time confidently understanding the flow when a Direct Line Speech client interacts with a bot service. I think it's something like the following. Can you confirm/clarify? We would like to create an internal voice-powered virtual assistant, but need to evaluate the network flow for security concerns, since voice commands could contain references to sensitive IP.

  1. Audio captured on the client is posted to Speech Services, a SaaS offering with a common public endpoint per region. 
  2.  Speech Services recognizes the request's subscription key and sends its results to the Bot Framework Service (represented by the Bot Channels Registration resource in Azure. I think this might sometimes be synonymous with "Bot Services", but this is vague to me). This is also a public SaaS offering, not "ours" in the way that a PaaS resource is.
  3. Bot Framework Service forwards the request to the Bot App. The Bot App is a web service, the first part of the cloud flow that's "ours" in that some infrastructure can be allocated for it (the app service)
  4. The Bot App, during processing, sends message(s) back to Bot Framework Service, which in turn sends back to Speech Services, which in turn sends back to the client.

 

  • Hi nsouth1625,

    Here is a link to the Direct Line Speech docs with a diagram that shows the flow.
    Direct Line Speech - Speech service - Azure Cognitive Services | Microsoft Docs
    You are correct in that audio is sent to our Azure Speech Service and then from there via the DL Speech channel to your Bot. The Bot is hosted in your app service. On the way back it is sent back via the channel and text that should be rendered as audio is sent to our Speech To Text service.

     

    Our services comply with the various security and privacy certifications. Have a look here:
    Cognitive Services Compliance and Privacy | Microsoft Azure 

    As well as: Speech service encryption of data at rest - Azure Cognitive Services | Microsoft Docs

    • nsouth1625's avatar
      nsouth1625
      Copper Contributor

      HeikoRa, thank you for the reply and for the info. I have a couple of clarifying questions which are not clear from the diagram: 

      1. When the Speech Service sends info "via the DL speech channel" to my bot, does the DL speech channel imply an intermediary endpoint, the Bot Framework Service? My understanding was that the Bot Framework Service is a somewhat-behind-the-scenes service responsible for sending information between the channel and the bot app. Is my understanding correct? 
      2. You said, "On the way back it is sent back via the channel and text that should be rendered as audio is sent to our Speech To Text service." To clarify, if text does not need to be rendered as audio, does it flow straight back to the client without going to Speech Services?
      • HeikoRa's avatar
        HeikoRa
        Icon for Microsoft rankMicrosoft

        nsouth1625 

        1. Yes the DL Speech channel facilitates the communication between the app using the Speech SDK to send/receive audio and bot messages and your bot. It handles any conversion of audio to text and the other way around where needed. It basically just routes the data to the appropriate place (speech service or bot).

        2. If there is no text to render as audio we won't call the text to speech service.

Resources