Performance tuning is often harder than it should be. To help make this task a little easier, the Azure Cognitive Search team recently released new benchmarks, documentation, and a solution that you can use to bootstrap your own performance tests. Together, these additions will give you a deeper understanding of performance factors, how you can meet your scalability and latency requirements, and help set you up for success in the long term.
The goal of this blog post is to give you an overview of performance in Azure Cognitive Search and to point you to resources so you can explore the concept more deeply. We’ll walk through some of the key factors that determine performance in Azure Cognitive Search, show you some performance benchmarks and how you can run your own performance tests, and ultimately provide some tips on how you can diagnose and fix performance issues you might be experiencing.
Key Performance Factors in Azure Cognitive Search
First, it’s important to understand the factors that impact performance. We outline these factors in more depth in this article but at a high level, these factors can be broken down into three categories:
It’s also important to know that both queries and indexing operations compete for the same resources on your search service. Search services are heavily read-optimized to enable fast retrieval of documents. The bias towards query workloads makes indexing more computationally expensive. As a result, a high indexing load will limit the query capacity of your service.
While every scenario is different and we always recommend running your own performance tests (see the next section), it’s helpful to have a benchmark for the performance you can expect. We have created two sets of performance benchmarks that represent realistic workloads that can help you understand how Cognitive Search might work in your scenario.
These benchmarks cover two common scenarios we see from our customers:
E-commerce search - this benchmark is based on a real customer, CDON, the Nordic region's largest online marketplace
Document search - this benchmark is based on queries against the Semantic Scholar dataset
The benchmarks will show you the range of performance you might expect based on your scenario, search service tier, and the number of replicas/partitions you have. For example, in the document search scenario which included 22 GB of documents, the maximum queries per second (QPS) we saw for different configurations of an S1 can be seen in the graph below:
As you can see, the maximum QPS achieved tends to scale linearly with the number of replicas. In this case, there was enough data that adding an additional partition significantly improved the maximum QPS as well.
Above all, it’s important to run your own performance tests to validate that your current setup meets your performance requirements. To make it easier to run your own tests, we created a solution containing all the assets needed for you to run scalable load tests. You can find those assets here: Azure-Samples/azure-search-performance-testing.
The solution assumes you have a search service with data already loaded into the search index. We provide a couple of default test strategies that you can use to run the performance test as well as instructions to help you tailor the test to your needs. The test will send a variety of queries to your search service based on a CSV file containing sample queries and you can tune the query volume based on your production requirements.
Apache JMeter is used to run the tests giving you access to industry standard tooling and a rich ecosystem of plugins. The solution also leverages Azure DevOps build pipelines and Terraform to run the tests and deploy the necessary infrastructure on demand. With this, you can scale to as many worker nodes as you need so you won’t be limited by the throughput of the performance testing solution.
After running the tests, you’ll have access to rich telemetry on the results. The test results are integrated with Azure DevOps and you can also download a dashboard from JMeter that allows you to see a range of statistics and graphs on the test results:
If you find your current levels of performance aren’t meeting your needs, there are several different ways to improve performance. The first step to improve performance is understanding why your service isn’t performing as you expect. By turning on diagnostic logging, you can gain access to a rich set of telemetry about your search service—this is the same telemetry that Microsoft Azure engineers use to diagnose performance issues. Once you have diagnostic logs available, there’s step by step documentation on how to analyze your performance.