Azure Architecture Blog

11 MIN READ

Harnessing Generative AI with Weaviate on Azure Kubernetes Service and Azure NetApp Files

GeertVanTeylingen

Microsoft

Sep 10, 2024

Introduction

Prerequisites

Install Weaviate

Approximate Nearest Neighbor (ANN) Benchmarks

ANN Benchmarks Setup

ANN Benchmarks – Glove 100 Angular

ANN Benchmarks – Sift 128 Euclidean

ANN Benchmarks Analysis

Results

Conclusion

Additional Information

Introduction

In this era of generative AI, the ability to process and analyze large datasets with precision and speed is not just advantageous—it’s essential. Vector databases, such as Weaviate, play a pivotal role in the infrastructure that powers generative AI applications, from natural language processing to image generation. These databases efficiently handle the similarity search operations at the core of generative models, enabling them to parse vast datasets and identify patterns that drive the creation of new, synthetic content.

By leveraging Azure Kubernetes Service (AKS) and using high-performance Azure NetApp Files (ANF) as the back-end storage, deploying Weaviate creates a scalable foundation that effectively meets the demanding requirements of generative AI models. This blog post guides you through setting up Weaviate on AKS, backed by the robust storage solution of Azure NetApp Files. We then benchmark our setup with ANN-Benchmarks—the established framework for testing approximate nearest neighbor search algorithms with vector databases—to quantitatively measure Weaviate's performance in a controlled environment.

Follow along as we streamline the deployment process and benchmarking steps, providing a clear view of Weaviate's performance in a cloud environment. By the end of our journey, you'll have a comprehensive understanding of how to deploy a scalable vector search solution and what to expect from its performance on Azure's robust infrastructure.

Co-authors: Michael Haigh, Senior Technical Marketing Engineer, Kyle Radder, Technical Marketing Engineer (NetApp)

Prerequisites

If you’ll be following along step by step, be sure to have the following resources at your disposal:

An Azure Kubernetes Service cluster with at least one node with 64 vCPU (as called out in the ANN-Benchmarks readme), like Standard_D64_v4
An Azure NetApp Files capacity pool of service level Ultra with at least 30TiB available (A 30TiB volume provides 30Gbps throughput, roughly equivalent to the expected bandwidth of the Standard_D64_v4 node.)
NetApp Astra Trident™ installed on the AKS cluster, with a back-end configuration and storage class referencing the Azure NetApp Files capacity pool
An Azure Linux VM in the same virtual network as the AKS cluster, with helm and kubectl installed and configured to access your AKS cluster

Install Weaviate

We use the Kubernetes Helm chart to install Weaviate on the AKS cluster. First, SSH to the Linux VM that’s deployed in the same virtual network as your AKS cluster, then add the Weaviate repository:

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm repo update

To view the possible configuration values for the Weaviate Helm chart, run the following command:

helm show values weaviate/weaviate

Depending on your generative AI application, you may want to configure additional Weaviate replica pods or enable local machine learning (ML) models. For our performance benchmarking, we leave all the defaults except for the following settings:

cat <<EOF > values.yaml
storage:
  size: 30Ti
service:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
grpcService:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
EOF

As mentioned in the prerequisites, a 30TiB Azure NetApp Files Ultra volume provides 30Gbps of throughput, which is roughly equivalent to the 30,000Mbps of bandwidth provided by the Standard_D64_v4 AKS node. If you’re using a smaller AKS node, you can reduce your volume size to result in an equivalent throughput (each TiB of an Ultra volume provides 1Gbps of throughput).

The other two Helm settings are to use internal IP addresses for the HTTP and GRPC Weaviate services, so network traffic stays confined to our internal virtual network.

To deploy Weaviate with these values, run the following command:

helm install weaviate -n weaviate --create-namespace weaviate/weaviate -f values.yaml

To check on the status of the deployment, run the following command:

kubectl -n weaviate get all,pvc

It takes less than a minute to get the external IPs populated, and about 5 to 10 minutes for the volume to go into a Bound state:

$ kubectl -n weaviate get all,pvc
NAME             READY   STATUS    RESTARTS   AGE
pod/weaviate-0   1/1     Running   0          8m21s
 
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
service/weaviate            LoadBalancer   172.16.213.188   10.20.0.8     80:31961/TCP      8m21s
service/weaviate-grpc       LoadBalancer   172.16.23.238    10.20.0.9     50051:30943/TCP   8m21s
service/weaviate-headless   ClusterIP      None             <none>        80/TCP            8m21s
 
NAME                        READY   AGE
statefulset.apps/weaviate   1/1     8m21s
 
NAME                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
persistentvolumeclaim/weaviate-data-weaviate-0   Bound    pvc-4c354c0d-29fe-4af8-a611-35e993c9ecab   30Ti       RWO            azure-netapp-files-ultra   8m21s

Depending on your virtual network settings, your external IPs will probably be different, but verify that they’re RFC 1918 internal IP addresses to ensure that network traffic stays on the internal virtual network. Take note of these IPs for use in the next section. Once the volume is bound and the weaviate-0 pod is in a Running state, we’re ready to start performance testing.

Approximate Nearest Neighbor (ANN) Benchmarks

ANN benchmarks is a benchmarking environment for approximate nearest neighbor (ANN) algorithms. ANN algorithms are used to find the nearest neighbors to a point in a dataset, where approximate means that the algorithm is allowed to return points that are close to the nearest neighbors, rather than the exact ones. This trade-off enables significantly faster processing times, which is especially useful when dealing with very large datasets.

ANN Benchmarks Setup

ANN algorithms are an effective tool for testing vector databases due to their efficiency with these large datasets, which are typical in real-world applications such as recommendation systems and natural language processing. By simulating practical use cases, ANN benchmarks allow the evaluation of a vector database's ability to balance accuracy and speed, a critical aspect of user experience. These tests also offer insights into the scalability and resource efficiency of the databases, revealing how performance evolves with growing data volumes and complexity. ANN testing can also inform about the impact of the underlying infrastructure on the database's performance, which is vital for optimizing deployments.

The Weaviate test module in ANN-Benchmarks uses the v3 Weaviate client and an embedded Weaviate instance. Because the v3 client is deprecated and is no longer recommended, and we’re using an external (running on AKS) Weaviate instance, the test module must be modified. A fork of the ANN-Benchmarks repository has been created with these modifications. If you’re curious about the specific changes, see this diff.

From your workstation VM, run the following commands to clone the forked ANN-Benchmarks repository and change into the created directory:

git clone https://github.com/MichaelHaigh/ann-benchmarks.git
cd ann-benchmarks

We now install Python 3.10, which is the validated Python version for ANN-Benchmarks:

sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.10 python3.10-distutils python3.10-venv

(This example is for Ubuntu; if you’re running a different flavor of Linux, the commands will be different.)

Next, we create our Python virtual environment and install the necessary packages:

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install weaviate-client

Finally, open the weaviate module.py file with your favorite text editor:

vim ann_benchmarks/algorithms/weaviate/module.py

Take note of lines 14-21, and especially lines 15 and 18:

14            self.client = weaviate.connect_to_custom(
15                http_host="10.20.0.8",
16                http_port="80",
17                http_secure=False,
18                grpc_host="10.20.0.9",
19                grpc_port="50051",
20                grpc_secure=False,
21            )

Lines 15 and 18 must be updated with the external IPs of the weaviate and weaviate-grpc services, respectively, from the previous section. When complete, save the file and exit the text editor.

ANN Benchmarks – Glove 100 Angular

We're now ready to start our performance benchmarking with the following command:

python run.py --algorithm weaviate --local

(i) Note

This task will take 1 to 2 days to complete, depending on your setup; it took 30 hours with the configuration just described.

The --algorithm argument instructs ANN-Benchmarks to run the Weaviate tests, and the --local argument instructs to run the tests “locally” rather than the default Docker method. Because we’ve modified the test module to connect to our external Weaviate instance running on AKS, it’s not truly a “local” test.

The first action of the benchmark is to download the GloVe 100 Angular dataset. (This can be modified by the --dataset argument, as shown in the next section.) We then print out a large list of the order of the tests that will be run:

$ python run.py --algorithm weaviate --local
downloading https://ann-benchmarks.com/glove-100-angular.hdf5 -> data/glove-100-angular.hdf5...
2024-08-19 19:52:11,777 - annb - INFO - running only weaviate
2024-08-19 19:52:12,622 - annb - INFO - Order: [Definition(algorithm='weaviate', constructor='Weaviate', module='ann_benchmarks.algorithms.weaviate', docker_tag='ann-benchmarks-weaviate', arguments=['angular', 64, 128], query_argument_groups=[[16], [32], [48], [64], [96], [128], [256], [512], [768]], disabled=False), Definition(algorithm='weaviate', constructor='Weaviate',
...

In this example (order varies because the tests are randomized), the first test is:

GloVe. Global vectors for word representation, where the vector representations of words are learned in such a way that the geometric relationships between the vectors capture semantic meaning.
100. The number of dimensions of the vectors in the dataset (100-dimensional).
Angular. The distance metric used to measure the similarity between vectors when searching for nearest neighbors. When the angular distance is used, the focus is on the orientation of the vectors, not their length, which is particularly useful for comparing word embeddings where the direction of the vector is more meaningful than its magnitude.
Arguments (64 and 128). The number of groups into which the dataset's vectors are categorized during indexing. A higher value implies a more fine-grained partitioning of the data, which could lead to a longer preprocessing stage, because the algorithm must process and organize the vectors into more groups.
Query arguments (16, 32, 48, 96, 128, 256, 512, and 768). The number of groups that are considered when searching for the nearest neighbors to a query vector. A higher value means that the algorithm checks more groups, which can increase the computational effort during the query phase but may also improve the likelihood of finding the true nearest neighbors, thus increasing recall.

These tests are controlled by the config.yml file located in the algorithm directory, so feel free to modify that file to reduce the number of tests, if desired.

After the tests have been running for a few hours, you can view the PVC overview page of the Azure portal to view the volume’s metrics. Make sure that the “throughput limit reached” chart stays at 0; otherwise your volume has been sized too small in relation to the bandwidth of your selected node.

After 1 to 2 days, the GloVe 100 Angular benchmarking will be complete, and we can move on to our next dataset.

ANN Benchmarks – Sift 128 Euclidean

Because the GloVe 100 Angular dataset is geared toward word vectors, we’ll use the Sift 128 Euclidean dataset, which is geared toward image vectors:

Sift. Scale Invariant Feature Transform vectors capture local features of images and are widely used in computer vision tasks. Each vector in the dataset represents a distinct feature extracted from an image.
128. the number of dimensions of the vectors in the dataset (128-dimensional).
Euclidean. This term specifies the distance metric used to measure the similarity between vectors in the dataset; the smaller the distance the more similar the vectors. The Euclidean distance, also known as L2 norm or L2 distance, is the "ordinary" straight-line distance between two points in Euclidean space.

This time when we execute the benchmark, we’ll use the --dataset argument to specify this dataset:

python run.py --algorithm weaviate --local --dataset sift-128-euclidean

Again, this command can take 1 to 2 days to complete; in our testing it took roughly 24 hours. When complete, you can continue to run additional benchmarks with more datasets, if desired. However, we’ll now move on to analysis.

ANN Benchmarks Analysis

There are a handful of ways to analyze the results of our benchmark testing:

Run python plot.py, which creates a single image of the vector database(s) and dataset(s). This image can be heavily customized to specify the X and Y axis units and scales, in addition to several other options. (Run python plot.py --help to view all available configuration options.)
Run python create_website.py, which creates an HTML page with about a dozen graphs.
Run python data_export.py --out res.csv, which exports all results to a CSV file, which can be useful when additional post-processing is needed.

We’ll go with option 2 here, because a single command yields many interesting images. However, feel free to play around with options 1 and 3 in your environment. In your workstation, run the following command:

python create_website.py

If your workstation has a desktop environment, open the weaviate.html file that was generated. Otherwise, run the following command to copy the file to your physical machine:

scp <user>@<ip>:/home/<user>/ann-benchmarks/weaviate.html weaviate.html

Once opened, scroll through the page to view the results. The entire page of results is included in the results section of this blog, but let’s dig into just two of the images.

The axes of the above chart represent:

Recall. The fraction of true nearest neighbors that are returned by the approximate nearest neighbor search. For example, if the ANN search is supposed to return the 10 nearest neighbors to a query point, but only 7 of those are among the true 10 nearest neighbors, the recall would be 0.7.
Queries per second. The number of queries that can be processed per second, indicating the performance or efficiency of the vector database. In general, the longer amount of time spent on making the queries should result in a higher recall value.

Values to the up and right are better, meaning that Weaviate performed better with the Sift 128 Euclidean dataset than with the GloVe 100 Angular dataset. This indicates that Weaviate is a more capable vector database for computer vision tasks rather than natural language processing. However we recommend testing against additional datasets and vector databases to find the best match for your specific application.

Let’s investigate one more chart:

While the previous chart focused purely on the query phase, this chart focuses on the trade-off between the quality of the search results and the memory footprint of the index. The X axis (Recall) is the same; however the Y axis represents the amount of memory used by the vector database to store the data structure that facilitates the neighbor search. As we can see, for certain levels of recall, Weaviate has a lower memory footprint for the GloVe 100 Angular dataset, but for other levels of recall the Sift 128 Euclidean dataset’s memory footprint is lower.

(i) Note

For more information about the tooltip, see this page of the Weaviate documentation.

Depending on your generative AI application, you may value memory footprint over query speed—for example, a computer vision application in embedded systems. Other applications, like a chatbot or coding assistant, may value query speed over memory footprint. Performing benchmarks against potential vector databases with relevant datasets can help determine the ideal configuration for your generative AI applications.

Results

The remaining results of the ANN-Benchmarks testing are shown here.

Conclusion

The deployment and benchmarking of Weaviate on Azure Kubernetes Service with Azure NetApp Files demonstrates the platform's robust capabilities in handling generative AI workloads. The detailed walk-through in this blog simplifies the setup process, and it also equips users with the necessary insights to make informed decisions about their vector database deployments.

The results from the ANN-Benchmarks reveal valuable performance metrics that are essential for optimizing AI applications. Weaviate's impressive handling of the Sift 128 Euclidean dataset suggests a strong suit in computer vision tasks, and its performance with the GloVe 100 Angular dataset opens avenues for natural language processing applications. However, users must consider the specific requirements of their applications, because trade-offs can significantly impact the user experience and operational costs.

By leveraging Azure's scalable infrastructure and Weaviate's vector search capabilities, developers and organizations can confidently scale their AI solutions, knowing that they have a reliable and efficient system in place. The benchmarks are a testament to the potential of Weaviate on AKS and Azure NetApp Files, providing a solid foundation for future generative AI endeavors. Whether your focus is on maximizing recall, query throughput, or maintaining a minimal memory footprint, this setup means that you can achieve your goals with efficiency and precision.