Nov 28 2021 11:22 PM
Nov 28 2021 11:22 PM
We need help with scaling.
Problem statement: With 2 instances of root bot running, a skill bot invocation is unable to return results reliably to the root bot.
With 1 instance of the root bot and 1 instance of the skill bot, everything works as expected.
We configured the root bot with "Scale Out" and "Manual Scale" to set the instance count to 2. When we repeat the interaction, we see that the skill response is not getting returned reliably.
In a successful scenario where the user's skill response is routed back to the root bot that the user interacted with, the user gets the skill results.
In a failed scenario, the user's skill response was routed to the root bot instance that the user did not interact with. Hence, the results are lost.
How do we ensure that the skill results are returned to the correct root bot instance?
We reviewed the "Skills overview" documentation at https://docs.microsoft.com/en-us/azure/bot-service/skills-conceptual?view=azure-bot-service-4.0, but it does not mention anything about scaling and persistence.
We had thought that the lack of persistence could be due to the bots' conversation state. However, our root bot is currently storing all conversation states in DB. Since both instances are using the same DB connection string, they should have access to the same conversation states in the DB.
Dec 06 2021 06:32 PM
Dec 15 2021 01:41 PM
Jan 03 2022 05:15 PM
Jan 13 2022 11:05 PM
Jan 16 2022 04:32 PM
Jan 16 2022 11:11 PM
Jan 17 2022 08:17 AM - edited Jan 17 2022 08:21 AM
We're having this exact scaling issue on 4.10. If we try to scale the root bot instances above one, communication made by a skill back to a root bot instance that is NOT the originating root bot produces a 404 (and the skill errors out).
Any luck with finding out whether this is a version issue?
We found this delivery mode option (ExpectReplies), which will tie the call and the response to the same root bot instance, but it seems like it might just be an alternate workaround.
Jan 18 2022 05:29 PM
Jan 18 2022 07:02 PM - edited Jan 18 2022 07:22 PM
Yup, misery loves company :)
We also upgraded to 4.15.1 but it did not solve the problem (within our own application); we still see skills getting 404s when scaling above 1 root instance. We're going to try and see if server affinity or some kind of root bot shared state (i.e. db) options even available at all within the framework, but perhaps delivery mode is the only multi-root option.
Its also not yet clear whether the expectReply is built into the sender and receiver framework (i.e. handled automatically by the middleware or other libraries) or is something that will have to be manually coded to keep things synchronous.
FYI, a couple of more (semi-useful) references about delivery mode:
Jan 18 2022 07:07 PM
Jan 18 2022 07:31 PM - edited Jan 18 2022 07:34 PM
FYI. I made a few edits to my last reply message above..
- We always have had ARR turned on, but what technology is actually looking for and keeping track of what instance each cookie should be routed to is unclear to us at this point. Typically that is some kind of load balancer or http router...is that just part of the chatbot framework? If it is, it does not seem to be pinning requests correctly (or that cookie is not being passed around correctly).
- We use Cosmos to save conversational state, but whatever lookup operation is producing that 404, the data it wants seems to be pinned in memory for a single root instance. Like you, we were hoping there is something that can be shared between root bots to effectively make that data it is looking for available to all instances. Have not found it yet.
Our next steps are basically more investigation and trial/error....
Jan 20 2022 07:05 PM