From a support perspective, having a high-level view on a bot solution helps a lot figuring out the possible causes for issues, or at least isolate better where they happen.
With Bot Framework, a bot application is simply a REST endpoint, which makes use of Bot Framework SDK libraries for several different languages, including C#, NodeJS, JavaScript and Python. It is an application to drive the bot logic. Since the application has no UI, a user engaging the bot logic will have to use other means (or channels), with the help of Azure Connector Service.
In fact, we’re talking at least 3 sides, 3 applications that are “talking” via web requests.
The who and what
The bot solution may offer users various conversational channels (clients) to engage the bot, such as Facebook messenger or SMS messages. A custom client may be created too with the DirectLine channel, using the DirectLine Client SDK Libraries.
The bot logic is where the developer will deploy the custom bot code, which is the actual functionality. It may be hosted either as an Azure App Service, or as any other on-premises Web application, as long as its endpoint is publicly reachable by calls from the “Connector”.
The “Connector” service is a global application owned and operated by Microsoft, hosted on Azure nodes spread geographically to ensure scalability and fast response. It will “talk” with the client and the logic, providing proxy services in a conversational way. While relaying messages, it will also transform their “envelopes”: from a channel-specific message format to a consistent format used by the bot framework.
Communication between the connector service and the bot logic endpoint is ensured via HTTP POST requests. Each message from user or reply message from bot will become a POST request between connector and bot.
The HTTP requests between connector and logic are secured, authenticated; they bear a specific HTTP header holding a token issued by Azure Active Directory. Somehow, Azure AD must be aware of both connector and the bot logic, because both of them “outsource” authentication to Azure AD.
When creating a bot in the Azure portal, we may choose from a couple of provisioning templates:
It is important, when creating bots, to choose Application Insights or at least some other monitoring solution; I’ll detail why shortly. Also, enabling Application Insights needs an extra step; see below in the Monitoring section.
The connector, acting as a proxy between client and logic, has some expectations. When these are not met, issues are recorded.
One of the expectations is that the logic will acknowledge a user message in no more than 15 seconds, a threshold hardwired in the connector that the developer cannot configure.
Why the GatewayTimeout
The connector will relay the user message as a POST request, expecting an HTTP response from the logic with a 202(Accepted) status code.
However, before the logic can acknowledge the user message with the “202” response, it will usually try to reply to the user, via the connector, with its own POST message to the connector. In many cases, such a reply will involve calls made to dependency services:
Dependencies may add latency
All these dependencies will add to latency. But there is more. As a web application, the bot logic may be “asleep” due to idleness (no requests received in a while), or it may restart due to configuration changes. With Asp.Net Core or NodeJS, a separate process from IIS will actually execute the app; yet more latency.
The 15-seconds threshold may be easily exceeded, so all these aspects must be considered when thinking of performance. I’d recommend that the bot replies ASAP to the user, even if the reply is simply something like “I got you, allow me some time”; then the full answer may be provided. Such a fast reply may also be a simple activity update sent by bot to the connector.
In the end, lots of things may have to happen before the bot can reply to the user. So, there are lots of points where things may go wrong. How does one trace an error or a latency, when the bot logic may be a “black-box”? How can we “see” inside the execution?
While the web server logs or application logs may offer some information, the troubleshooting or performance tuning at times need to relay on more details. We need a monitoring solution. Enter Application Insights.
Application Insights fits in
When creating a bot in the Azure portal, the wizard does offer the option to employ Application Insights as monitoring solution, and it even places the needed configuration info for the bot logic. But there is a catch: the bot logic, the app, does not log into Application Insights by default; an extra step is need.
The bot logic must include the right NuGet or NPM packages; and these will report on exceptions, for instance, by default – no extra code needs be added by the developer. Other telemetry info that Application Insights can track will need extra code, if you want to monitor custom events or metrics.
Do enable Application Insights!
I found there is an easier way to enable it from the portal, for the App Service or Azure Functions instance, illustrated below:
Quickly enable Application Insights
Best of luck!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.