This post is co-authored by @adieldar (Principal Data Scientist, Microsoft)
In a previous blog – Azure Data Explorer for Vector Similarity Search, we focused on how Azure Data Explorer (Kusto) is perfectly suited for storing and searching vector embeddings.
In this blog, we will focus on performance tuning and optimizations for running vector similarity searches at scale.
We will continue working on the Wikipedia scenario where we generate the embeddings of wiki pages using OpenAI and store them in kusto. We then use series_cosine_similarity_fl kusto function to perform similarity searches.
Demo scenario
Optimizing for scale
To optimize the cosine similarity search we need to split the vectors table to many extents that are evenly distributed among all cluster nodes. This can be done by setting Partitioning Policy for the embedding table using the .alter-merge policy partitioning command:
.alter-merge table WikipediaEmbeddingsTitleD policy partitioning
```
{
"PartitionKeys": [
{
"ColumnName": "vector_id_str",
"Kind": "Hash",
"Properties": {
"Function": "XxHash64",
"MaxPartitionCount": 2048, // set it to max value create smaller partitions thus more balanced spread among all cluster nodes
"Seed": 1,
"PartitionAssignmentMode": "Uniform"
}
}
],
"EffectiveDateTime": "2000-01-01" // set it to old date in order to apply partitioning on existing data
}
```
In the example above we modified the partitioning policy for WikipediaEmbeddingsTitleD. This table was created from WikipediaEmbeddings by projecting the documents’ title and embeddings.
Notes:
To test the effect of partitioning we created in a similar manner multiple tables containing up to 1M embedding vectors and tested the cosine similarity performance on clusters with 1, 2, 4, 8 & 20 nodes (SKU Standard_E4d_v5).
The following table and chart compare search performance (in seconds) before and after partitioning:
|
Number of Nodes |
||||
# of vectors |
1* (no partitioning) |
2 |
4 |
8 |
20 |
25,000 Vectors |
3.4 |
0.95 |
0.67 |
0.57 |
0.51 |
50,000 Vectors |
6.2 |
1.5 |
0.92 |
0.65 |
0.55 |
100,000 Vectors |
12.4 |
2.6 |
1.55 |
1 |
0.57 |
200,000 Vectors |
24.2 |
5.2 |
2.8 |
1.65 |
0.63 |
400,000 Vectors |
48.5 |
10.3 |
5.4 |
2.95 |
0.87 |
800,000 Vectors |
96.5 |
20.5 |
10.5 |
6 |
1.2 |
1,000,000 Vectors |
102 |
26 |
13.3 |
7.2 |
1.4 |
* Note that the cluster has 2 nodes, but the tables are stored on a single node (this is our baseline before applying the partitioning policy)
You can see that even on the smallest 2 nodes cluster the search speed is improved by more than x4 factor, and in general the speed is inversely proportional to the number of nodes. The number of embedding vectors that are needed for common LLM scenarios (e.g. Retrieval Augmented Generation) rarely exceeds 100K, thus by having 8 nodes searching can be done in 1 sec.
How can you get started?
If you would like to try this demo, head to the azure_kusto_vector GitHub repository and follow the instructions.
The Notebook in the repo will allow you to -
You can start by -
We look forward to your feedback and all the exciting things you build with kusto & vectors!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.