Forum Discussion
Peter Nilsson
Apr 28, 2021Microsoft
Capacity Concerns, Load Testing, and Concurrent Call Limitations on ACS and Bot Framework
Can anyone give me approximates to the capacity limitations if we build a conversational bot (through Bot framework) and have a single communications resource within our resource group. Could we handle 1000 concurrent calls, and could we even scale up to 5000 concurrent calls at times of peak load? I'm trying to rationalize if we need to create multiple communications resources within a single resource group or even subscriptions to handle enormous volumes of calls.
Secondly, what tooling could be used to validate Bot call quality, and also load testing? We will be leveraging a quality & load testing tool from Cyara, but does Azure have any suggestions on the best way to load test our calling applications?
- Christopher PolanishBrass ContributorVery curious about this as well. Not expecting an extremely high load, but already just in dev/testing my bot has been stuttering a lot, missing prompts, etc, and having a hard time nailing down if it's a capacity issue or something else. Would be very surprised if it's capacity as it's barely being hit, but definitely has me concerned about performance at scale.
- peterswimmMicrosoftBot apps are generally just web applications, so the typical load expectations should transfer over, which the bottleneck will locally occur in your Cognitive services calls before the bot tops out.
There is also community efforts for load testing such as
https://github.com/damadei/BotServiceStressToolkit
Note: Please let us know if you are planning any stress tests beforehand!
This is something we are looking to invest in soon in terms of providing more guidance and best practices.- peterswimmMicrosoftWith regards to the Telephony channel specifically, At GA, we expect to have a fully scalable platform that can scale to a large enterprise-scale capacity. While in Public Preview, best to provide a bot ID to telephony-preview@microsoft.com alias so that the engineering team could ensure service scaling is happening correctly in real-time depending on the load