Forum Discussion
Root bot, skill bot and scaling
Hi,
We need help with scaling.
Problem statement: With 2 instances of root bot running, a skill bot invocation is unable to return results reliably to the root bot.
With 1 instance of the root bot and 1 instance of the skill bot, everything works as expected.
We configured the root bot with "Scale Out" and "Manual Scale" to set the instance count to 2. When we repeat the interaction, we see that the skill response is not getting returned reliably.
In a successful scenario where the user's skill response is routed back to the root bot that the user interacted with, the user gets the skill results.
In a failed scenario, the user's skill response was routed to the root bot instance that the user did not interact with. Hence, the results are lost.
How do we ensure that the skill results are returned to the correct root bot instance?
We reviewed the "Skills overview" documentation at https://docs.microsoft.com/en-us/azure/bot-service/skills-conceptual?view=azure-bot-service-4.0, but it does not mention anything about scaling and persistence.
We had thought that the lack of persistence could be due to the bots' conversation state. However, our root bot is currently storing all conversation states in DB. Since both instances are using the same DB connection string, they should have access to the same conversation states in the DB.
Thank You
We had James check his data and found this. See if it helps. In the root bot:
- Double check and be 100% sure that you're using the SkillConversationIdFactory that is a part of the MS chatbot framework (NOT one that you may have created). It should have a IStorage constructor parameter that lets you pass in whatever storage you want to use to persist ids used with skills communication. You probably need to use the class that is given to you in the chatbot framework. (i.e. SkillConversationIdFactory that inherits from"Microsoft.Bot.Builder.Skills.SkillConversationIdFactoryBase")
- For the IStorage object used by SkillConversationIdFactory, If you are using some kind of in-memory only storage, (i.e. A ConcurrentDictionary or other MemoryStorage type object), that might be a problem. The code in SkillConversationIdFactory might not be persisting the conversation/skill ID lookup data (needed to talk with skills) into a place that other apps can read.
I have found some old MS examples that give a "demo" of how to use skills and shows a SkillConversationIdFactory that uses in-memory storage...which of course won't scale or work across different apps.
- HunaidHanfee-MSFTMicrosoftWe are looking into this, I will get back to you soon.
Thanks - Prasad_Das-MSFTMicrosoft
voonsionglum
Could you please provide the sample code/repo which you are following up?- voonsionglumBrass ContributorHi,
Please refer to https://github.com/microsoft/BotBuilder-Samples/tree/main/samples/csharp_dotnetcore/80.skills-simple-bot-to-bot for the sample code.
Once the root bot and skill bot have been deployed, please SCALE OUT the root bot so that it has more than 1 instance. Once scaling is done, try interacting with the root bot.
Thank You- HunaidHanfee-MSFTMicrosoftThanks for sharing the steps, I will get back to you once have the repro.
Thanks
- Slacked2737Copper Contributor
We're having this exact scaling issue on 4.10. If we try to scale the root bot instances above one, communication made by a skill back to a root bot instance that is NOT the originating root bot produces a 404 (and the skill errors out).
Any luck with finding out whether this is a version issue?
We found this delivery mode option (ExpectReplies), which will tie the call and the response to the same root bot instance, but it seems like it might just be an alternate workaround.
https://docs.microsoft.com/en-us/azure/bot-service/skills-about-skill-consumers?view=azure-bot-service-4.0#using-a-delivery-mode-of-expect-replies- HunaidHanfee-MSFTMicrosoft
Slacked2737 - Hello did you checked by installing the manifest I share? Also, can you elaborate more on the repro step to be make sure not missing anything.
- voonsionglumBrass ContributorHunaidHanfee-MSFT, would it be possible to have access to the actual web apps behind the manifest you have shared? We would like to view the code via Kudu console and examine the scale out settings that have been applied to both the root bot and skill dialog bot.
- voonsionglumBrass ContributorGood to know we are not the only ones having this issue 🙂 We upgraded our root bot and skill dialog bot to use the latest 4.15.0 npm packages. We scaled out both the root and skill dialog bots to 2 instances. Sadly, we still face the same error.
Our plan was to redeploy the 4.15 dialog root bot and dialog skill bot samples and scale out the instances to 2. We have been having some trouble overwriting the existing dialog root bot's web app with the 4.15 sample. I'll update again when we get this resolved and test out scaling.
We were not aware of the delivery mode option. Thank You for bringing that to our attention. A workaround is better than nothing 🙂- Slacked2737Copper Contributor
Yup, misery loves company 🙂
We also upgraded to 4.15.1 but it did not solve the problem (within our own application); we still see skills getting 404s when scaling above 1 root instance. We're going to try and see if server affinity or some kind of root bot shared state (i.e. db) options even available at all within the framework, but perhaps delivery mode is the only multi-root option.Its also not yet clear whether the expectReply is built into the sender and receiver framework (i.e. handled automatically by the middleware or other libraries) or is something that will have to be manually coded to keep things synchronous.
FYI, a couple of more (semi-useful) references about delivery mode:
https://github.com/microsoft/botbuilder-dotnet/pull/5142
https://github.com/microsoft/botbuilder-dotnet/pull/5162