Announcing cost-effective RAG at scale with Azure AI Search
Published Apr 04 2024 09:00 AM 6,910 Views
Microsoft

If 2023 was the year of GenAI prototypes, 2024 is the year RAG applications go into production. To be production ready, your retrieval system must deliver on two fronts: it must provide highly relevant results and be cost effective so it’s ready to grow with your app’s success.

 

Today, we are happy to announce several improvements to Azure AI Search. We have dramatically increased storage capacity and vector index size for new services at no additional cost, positioning Azure AI Search as one of the most cost-effective options on the market. In addition, vector search now supports quantization, narrow numeric types for vectors, and has options to reduce vector field storage overhead (in preview). 

 

With these announcements, Azure AI Search delivers an enterprise-ready, full-featured retrieval system with advanced search technology without sacrificing cost or performance. The result is empowering your app to deliver high quality experiences for every user and interaction with no compromises.

 

Support for larger vector workloads  

 

New services in the Basic and Standard tiers in select regions now have more storage capacity and compute for high performance retrieval of vectors, text, and metadata. On average, cost per vector is reduced by 85% and you’ll save on total storage costs per GB by up to 75% or more. For example, in an S1 search service you can store 28M vectors with 768 dimensions for $1/hour, a savings of 91% over our previous vector limits.

 

New services will have: 

  • 3x to 6x increase in total storage per partition 
  • 5x to 11x increase in vector index size per partition 
  • Additional compute backing the service supports more vectors at high performance and up to 2x improvement in indexing and query throughput. 

 

New vector search features to optimize vector storage

 

We’re also announcing a new set of options for vector search, in preview, to control performance and reduce storage cost: 

  • Use quantization and oversampling to compress and optimize vector data storage. Reduces vector index size by 75% and vector storage on disk by ~25%. 
  • Set the Stored property on vector fields to reduce vector storage overhead, with an expected storage reduction of ~50% for vector fields using exhaustive KNN and ~25% for vector fields using HNSW. 
  • Use narrow vector field primitive types such as int8, int16, or float16, to reduce vector index size and vector storage on disk by up to 75%. 

These vector search enhancements are available in existing search services using the new 2024-03-01-Preview release of the data plane REST API.

 

Details about increased capacity 

 

The table below details the change in total storage per partition for each service tier:

Service Tier

Current Storage

per Partition 

New Storage

per Partition 

Storage Increase

per Partition 

Current Total Storage

per Service

New Total Storage

per Service

Basic

2 GB

15 GB

7.5x

2 GB

45 GB

S1

25 GB

160 GB

6.4x

300 GB

1.88 TB

S2

100 GB

350 GB

3.5x

1.17 TB

4.1 TB

S3

200 GB

700 GB

3.5x

2.34 TB

8.2 TB

L1

1 TB

No change

N/A

12 TB

No change

L2

2 TB

No change

N/A

24 TB

No change

 

The table below details the change in vector index size for each service tier:

Service Tier

Current Vector Index Size

per Partition

New Vector Index Size

per Partition

Vector Index Size Increase

per Partition

Current Total Vector Index Size

per Service 

New Total Vector Index Size

per Service

Basic 

1 GB 

5 GB 

5x 

1 GB 

15 GB 

S1 

3 GB 

35 GB 

11.5x 

36 GB 

420 GB 

S2 

12 GB 

100 GB 

8.3x 

144 GB 

1.17 TB 

S3 

36 GB 

200 GB 

5.6x 

432 GB 

2.34 TB 

L1 

12 GB 

No change 

N/A 

144 GB 

No change 

L2 

36 GB 

No change 

N/A 

432 GB 

No change 

 

Based on the new limits, here are some estimates of maximum vector workload sizes you can expect:

SKU

Max Vector Count

1536 dims

1 partition

Max Vector Count

256 dims

1 partition

Max Vector Count

1536 dims

12 partitions

Max Vector Count

256 dims 

12 partitions

Basic

700k

4.7M

2.4M

14M

S1

5.5M

33M

66M

396M

S2

15M

94M

188M

1B

S3 

31M 

189M 

377M 

2B 

 

There are several aspects that can affect the number of vectors your service can hold, such as your choice of HNSW parameters and deleted document count. These are estimates assuming a float32 vector with 10% overhead from the HNSW vector index. Learn more about the factors that affect vector index size in the Azure AI Search documentation.

 

Additional details about the changes we announced today: 

  • Search services created before April 3, 2024 will not see any changes to their storage limits. 
  • Basic service tier now supports up to 3 partitions with up to 45 GB of total storage, up from a previous maximum of 2 GB. 
  • Storage Optimized service tier, L-series, storage limits will not be changed at this time.
  • Per index storage limits apply for new services in some service tiers. See the Azure AI Search documentation for more information.

 

Getting started with Azure AI Search 

 

Version history
Last update:
‎Apr 04 2024 08:44 AM
Updated by: