If 2023 was the year of GenAI prototypes, 2024 is the year RAG applications go into production. To be production ready, your retrieval system must deliver on two fronts: it must provide highly relevant results and be cost effective so it’s ready to grow with your app’s success.
Today, we are happy to announce several improvements to Azure AI Search. We have dramatically increased storage capacity and vector index size for new services at no additional cost, positioning Azure AI Search as one of the most cost-effective options on the market. In addition, vector search now supports quantization, narrow numeric types for vectors, and has options to reduce vector field storage utilization for use cases where some capabilities aren’t required (in preview).
With these announcements, Azure AI Search delivers an enterprise-ready, full-featured retrieval system with advanced search technology without sacrificing cost or performance. The result is empowering your app to deliver high quality experiences for every user and interaction with no compromises.
Customer Momentum
Azure AI Search enables more customers, like KPMG and AT&T, to bring their GenAI applications into production at scale.
KPMG
“Building “Advisory Content Chat” with Azure AI Search allows us to deliver scalable, high quality knowledge access to more than 10,000 U.S. KPMG Advisory employees and soon 40,000 KPMG Advisory employees globally. By implementing enterprise RAG, leaving the data in place and honoring entitlements, we have created Advisory Content Chat as a solution to serve our people and bring to our clients as well.”
- Matt Bishop, Chief Technology Officer, Advisory, KPMG LLP
AT&T
AT&T, a pioneer in the telecommunications industry, has helped shape machine learning and AI technology for decades. Over the past year, AT&T has built a robust generative AI platform, Ask AT&T, to improve productivity and deliver better results for its employees and customers. The platform applies question answering, summarization, and classification of documents, data, and images across various areas of the business. Ask AT&T has is being used by more than 80,000 internal and external users, across their developer teams, supply chain, human resources and more. Given its massive scale, AT&T needed an information retrieval system that could handle its retrieval augmented generation (RAG) workloads and grow with the business.
“To teach Ask AT&T about AT&T, we rely on Azure and Azure AI Search to support our in-production RAG-based applications at scale, and use search capabilities like vector, text, hybrid, and filtered search to quickly retrieve answers no matter if they are in images, tables or text in documents. With increased vector capacity, we will continue to expand our GenAI use cases and ensure high performance for our applications without compromising cost."
-Mark Austin, Vice President, Data Science, AT&T
Support for larger vector workloads
New services in the Basic and Standard tiers in select regions now have more storage capacity and compute for high performance retrieval of vectors, text, and metadata. On average, cost per vector is reduced by 88% and you’ll save on total storage costs per GB by up to 75% or more. For example, in an S1 search service you can store 28M vectors with 768 dimensions for $1/hour, a savings of 91% over our previous vector limits.
New services will have:
- 3x to 6x increase in total storage per partition
- 5x to 12x increase in vector index size per partition
- Additional compute backing the service supports more vectors at high performance and up to 2x improvement in indexing and query throughput.
New vector search features to optimize vector storage
We’re also announcing a new set of options for vector search, in preview, to control performance and reduce storage cost:
- Use quantization and oversampling to compress and optimize vector data storage. Reduces vector index size for Edm.Single fields by 75% and vector storage on disk by ~25%.
- Set the Stored property on vector fields to reduce vector storage overhead, with an approximate storage reduction of ~50% for vector fields using exhaustive KNN and ~25% for vector fields using HNSW.
- Use narrow vector field primitive types such as binary, int8, int16, or float16, to reduce vector index size and vector storage on disk by up to 95%.
The binary vector data type is available in the 2024-05-01-Preview release of the data plane REST API. The other vector search enhancements listed above are available in existing search services using the new 2024-03-01-Preview release of the data plane REST API.
Details about increased capacity
The table below details the change in total storage per partition for each service tier:
Service Tier |
Previous Storage per Partition |
New Storage per Partition |
Storage Increase per Partition |
Previous Total Storage per Service |
New Total Storage per Service |
Basic |
2 GB |
15 GB |
7.5x |
2 GB |
45 GB |
S1 |
25 GB |
160 GB |
6.4x |
300 GB |
1.88 TB |
S2 |
100 GB |
512 GB |
5.0x |
1.17 TB |
6 TB |
S3 |
200 GB |
1 TB |
5.0x |
2.34 TB |
12 TB |
L1 |
1 TB |
2 TB |
2.0x |
12 TB |
24 TB |
L2 |
2 TB |
4 TB |
2.0x |
24 TB |
48 TB |
The table below details the change in vector index size for each service tier:
Service Tier |
Previous Vector Index Size per Partition |
New Vector Index Size per Partition |
Vector Index Size Increase per Partition |
Previous Total Vector Index Size per Service |
New Total Vector Index Size per Service |
Basic |
1 GB |
5 GB |
5x |
1 GB |
15 GB |
S1 |
3 GB |
35 GB |
11.5x |
36 GB |
420 GB |
S2 |
12 GB |
150 GB |
12.5x |
144 GB |
1.75 TB |
S3 |
36 GB |
300 GB |
8.3x |
432 GB |
3.52 TB |
L1 |
12 GB |
150 GB |
12.5x |
144 GB |
1.75 TB |
L2 |
36 GB |
300 GB |
8.3x |
432 GB |
3.52 TB |
Based on the new limits, here are some estimates of maximum vector workload sizes you can expect:
Service Tier |
Max Vector Count 1536 dims 1 partition |
Max Vector Count 256 dims 1 partition |
Max Vector Count 1536 dims 12 partitions |
Max Vector Count 256 dims 12 partitions |
Basic |
700k |
4.7M |
2.4M |
14M |
S1 |
5.5M |
33M |
66M |
396M |
S2 |
22M |
141M |
264M |
1.5B |
S3 |
46M |
283M |
552M |
3B |
L1 |
22M |
141M |
264M |
1.5B |
L2 |
46M |
283M |
552M |
3B |
There are several factors that can affect the number of vectors your service can hold, such as your choice of HNSW parameters and deleted document count. These are estimates assuming an Edm.Single vector field with 10% overhead from the HNSW vector index. These estimates scale up by a factor of ~4x when using scalar quantization or an Edm.SByte vector field, or ~32x when using an Edm.Byte packed binary field. Learn more about the factors that affect vector index size in the Azure AI Search documentation.
Additional details about the changes announced here:
- Search services created before April 3, 2024 will not see any changes to their storage limits.
- Storage optimized tier search services created before May 17, 2024 will not see any changes to their storage limits.
- Basic service tier now supports up to 3 partitions with up to 45 GB of total storage, up from a previous maximum of 2 GB.
- Per index storage limits apply for new services in some service tiers. See the Azure AI Search documentation for more information.
Changes were made to this post since it was originally published. Storage limits for S2 and S3 services created after April 3, 2024 have been increased a total of 5x over their initial limits. New storage optimized tier services created after May 17, 2024 have 2x their initial storage limits. Also, added support for packed binary vectors.
Getting started with Azure AI Search
- More information about the Azure AI Search service limits.
- Learn more about vector quantization and narrow data type enhancements
- Learn more about Azure AI Search and more about all the latest features
- Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file.
- Go from zero to hero with our RAG Solution Accelerator
- Read the blog: Outperforming vector search with hybrid retrieval and ranking capabilities
- Watch a video on Microsoft Mechanics: How vector search and semantic ranking improve your AI prompts
Updated May 22, 2024
Version 4.0pablocastro
Microsoft
Joined March 07, 2023
AI - Azure AI services Blog
Follow this blog board to get notified when there's new activity