If 2023 was the year of GenAI prototypes, 2024 is the year RAG applications go into production. To be production ready, your retrieval system must deliver on two fronts: it must provide highly relevant results and be cost effective so it’s ready to grow with your app’s success.
Today, we are happy to announce several improvements to Azure AI Search. We have dramatically increased storage capacity and vector index size for new services at no additional cost, positioning Azure AI Search as one of the most cost-effective options on the market. In addition, vector search now supports quantization, narrow numeric types for vectors, and has options to reduce vector field storage overhead (in preview).
With these announcements, Azure AI Search delivers an enterprise-ready, full-featured retrieval system with advanced search technology without sacrificing cost or performance. The result is empowering your app to deliver high quality experiences for every user and interaction with no compromises.
New services in the Basic and Standard tiers in select regions now have more storage capacity and compute for high performance retrieval of vectors, text, and metadata. On average, cost per vector is reduced by 85% and you’ll save on total storage costs per GB by up to 75% or more. For example, in an S1 search service you can store 28M vectors with 768 dimensions for $1/hour, a savings of 91% over our previous vector limits.
New services will have:
We’re also announcing a new set of options for vector search, in preview, to control performance and reduce storage cost:
These vector search enhancements are available in existing search services using the new 2024-03-01-Preview release of the data plane REST API.
The table below details the change in total storage per partition for each service tier:
Service Tier |
Current Storage per Partition |
New Storage per Partition |
Storage Increase per Partition |
Current Total Storage per Service |
New Total Storage per Service |
Basic |
2 GB |
15 GB |
7.5x |
2 GB |
45 GB |
S1 |
25 GB |
160 GB |
6.4x |
300 GB |
1.88 TB |
S2 |
100 GB |
350 GB |
3.5x |
1.17 TB |
4.1 TB |
S3 |
200 GB |
700 GB |
3.5x |
2.34 TB |
8.2 TB |
L1 |
1 TB |
No change |
N/A |
12 TB |
No change |
L2 |
2 TB |
No change |
N/A |
24 TB |
No change |
The table below details the change in vector index size for each service tier:
Service Tier |
Current Vector Index Size per Partition |
New Vector Index Size per Partition |
Vector Index Size Increase per Partition |
Current Total Vector Index Size per Service |
New Total Vector Index Size per Service |
Basic |
1 GB |
5 GB |
5x |
1 GB |
15 GB |
S1 |
3 GB |
35 GB |
11.5x |
36 GB |
420 GB |
S2 |
12 GB |
100 GB |
8.3x |
144 GB |
1.17 TB |
S3 |
36 GB |
200 GB |
5.6x |
432 GB |
2.34 TB |
L1 |
12 GB |
No change |
N/A |
144 GB |
No change |
L2 |
36 GB |
No change |
N/A |
432 GB |
No change |
Based on the new limits, here are some estimates of maximum vector workload sizes you can expect:
SKU |
Max Vector Count 1536 dims 1 partition |
Max Vector Count 256 dims 1 partition |
Max Vector Count 1536 dims 12 partitions |
Max Vector Count 256 dims 12 partitions |
Basic |
700k |
4.7M |
2.4M |
14M |
S1 |
5.5M |
33M |
66M |
396M |
S2 |
15M |
94M |
188M |
1B |
S3 |
31M |
189M |
377M |
2B |
There are several aspects that can affect the number of vectors your service can hold, such as your choice of HNSW parameters and deleted document count. These are estimates assuming a float32 vector with 10% overhead from the HNSW vector index. Learn more about the factors that affect vector index size in the Azure AI Search documentation.
Additional details about the changes we announced today:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.