Nov 28 2021 11:22 PM
Hi,
We need help with scaling.
Problem statement: With 2 instances of root bot running, a skill bot invocation is unable to return results reliably to the root bot.
With 1 instance of the root bot and 1 instance of the skill bot, everything works as expected.
We configured the root bot with "Scale Out" and "Manual Scale" to set the instance count to 2. When we repeat the interaction, we see that the skill response is not getting returned reliably.
In a successful scenario where the user's skill response is routed back to the root bot that the user interacted with, the user gets the skill results.
In a failed scenario, the user's skill response was routed to the root bot instance that the user did not interact with. Hence, the results are lost.
How do we ensure that the skill results are returned to the correct root bot instance?
We reviewed the "Skills overview" documentation at https://docs.microsoft.com/en-us/azure/bot-service/skills-conceptual?view=azure-bot-service-4.0, but it does not mention anything about scaling and persistence.
We had thought that the lack of persistence could be due to the bots' conversation state. However, our root bot is currently storing all conversation states in DB. Since both instances are using the same DB connection string, they should have access to the same conversation states in the DB.
Thank You
Jan 20 2022 07:06 PM
Jan 21 2022 11:19 AM - edited Jan 21 2022 12:28 PM
Thank you for the update.
I reviewed most of the botbuilder-dotnet code and came to a few conclusions:
- There does not seem to be much code related to ARR at all. My guess is that cookie affinity is not a part of the framework to support pinning root to skill calls.
- Root to skill using DeliveryMode.ExpectReplies looks like it should work (and it sounds like you may have tried it already. Details would be great :-). Check out the code example below, it's a good template to how it works.
https://github.com/microsoft/botbuilder-dotnet/blob/f28cad18948298f30cb7fc4973c143cf08ad7341/tests/S...
Also, check out how it's handled in the SendActivitiesAsync call:
https://github.com/microsoft/botbuilder-dotnet/blob/f28cad18948298f30cb7fc4973c143cf08ad7341/librari...
This does not explain why the 404 occurs in the first place. It would be nice to get a definitive answer on whether the root bots are able to share across instances, the skill bot responses.
Jan 23 2022 04:35 PM
Jan 30 2022 10:57 PM
Feb 06 2022 04:57 PM
Mar 16 2022 11:23 PM
Apologies for the delay in response, We have setup the above sample and retested with scaling. It is working fine at our end.
Dec 12 2022 11:39 AM
Dec 12 2022 04:11 PM
@padamide Unfortunately, we are still stuck. We did not get good results from the ExpectReplies workaround. I'd be interested to know how it goes on your end.
Dec 12 2022 04:20 PM
@voonsionglum - that's unfortunate news. We haven't started work on the ExpectReplies workaround yet. We're still in the process of researching possible solutions. That's how I came across this thread.
What was the result with that workaround? Did you still run into scaling issues or did it bring on a whole new can of worms entirely?
We're also still using some of the deprecated types within the bot framework. It is a tech debt item to refactor those, however there's no indication that they have anything to do with this scaling issue.
Dec 13 2022 04:09 PM
Dec 13 2022 04:42 PM - edited Dec 14 2022 06:43 AM
SolutionWe had James check his data and found this. See if it helps. In the root bot:
I have found some old MS examples that give a "demo" of how to use skills and shows a SkillConversationIdFactory that uses in-memory storage...which of course won't scale or work across different apps.
Jan 02 2023 08:45 PM
@Slacked2737 Thank You for this AWESOME reply. With your suggestion, we made sure to use the SkillConversationIdFactory that is part of the MS chatbot framework and ensured that we have the same non-memory storage passed into it.
Note: we have also switched from our previous CosmosDB Mongo API storage implementation to BlobsStorage. Long story short, with CosmosDB Mongo API, we were getting this.storage.read and this.storage.write errors. This is not related to Microsoft, but related to the DBStorage class we implemented.
We scaled out our root bot to 3 instances and our skill bot to 3 instances. We restarted both bots and ALL skill results are now getting returned successfully!
I can't wait to share this with our Team.
Thank You and Happy New Year!