Introduction
Azure AI Search has recently introduced two groundbreaking features aimed at optimizing vector search implementations: Scalar Quantization and the "stored" property for vector fields. These features address one of the most significant challenges in vector search - managing storage costs while maintaining search quality. In this comprehensive guide, we'll explore these features, their implementation, and their impact on search performance.
Understanding the Optimization Features
Azure AI Search now offers several complementary approaches to optimize vector storage:
- Vector Compression
- Scalar Quantization: Converts high-dimensional floating-point vectors to lower-bit integer representations
- Binary Quantization: Transforms vectors into binary (bit-based) representations
- Dimension Reduction
- Allows truncating high-dimensional vectors to fewer dimensions
- Can be combined with compression techniques
- Storage Controls
- The stored property controls whether original vectors are preserved
- Vectors can be searchable without being stored
Scalar Quantization
Scalar Quantization is a sophisticated compression technique that transforms high-dimensional vectors into lower-bit representations. Think of it like image compression - just as JPEG reduces image file size while preserving visual quality, Scalar Quantization compresses vector data while maintaining search relevance.Key benefits:
- Reduced storage requirements
- Lower memory usage
- Maintained search accuracy with proper configuration
- Cost savings for large-scale deployments
Implementation Guide
Let's walk through a complete implementation of these features using Python. We'll cover everything from setup to optimization and testing.
1. Environment Setup
First, let's install the necessary dependencies:
pip install azure-search-documents==11.6.0b3
pip install azure-identity
pip install numpy
pip install python-dotenv
2. Initial Configuration
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient, SearchIndexClient
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Configuration
SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
SEARCH_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
DIMENSIONS = 1536 # Vector dimensions (e.g., for OpenAI embeddings)
# Initialize clients
credential = AzureKeyCredential(SEARCH_KEY)
index_client = SearchIndexClient(endpoint=SEARCH_ENDPOINT, credential=credential)
3. Creating Optimized Indexes
Here's a comprehensive implementation for creating indexes with different optimization configurations:
from azure.search.documents.models import (
SearchIndex, SimpleField, SearchField, SearchFieldDataType,
VectorSearch, VectorSearchProfile, HnswAlgorithmConfiguration,
ScalarQuantizationCompressionConfiguration, ScalarQuantizationParameters
)
def create_optimized_index(scenario: str) -> SearchIndex:
"""Create an optimized search index based on the specified scenario."""
# Base configuration
hnsw_params = {
"m": 4,
"metric": "cosine",
"ef_construction": 400,
"ef_search": 500
}
# Define fields
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SimpleField(name="title", type=SearchFieldDataType.String),
SearchField(
name="embedding",
type="Collection(Edm.Single)",
vector_search_dimensions=DIMENSIONS,
vector_search_profile_name=f"profile-{scenario}",
stored=(scenario != "stored" and scenario != "both")
)
]
# Configure vector search
vector_search_config = {
"profiles": [
VectorSearchProfile(
name=f"profile-{scenario}",
algorithm_configuration_name=f"hnsw-{scenario}",
compression_configuration_name=(
f"compression-{scenario}"
if "scalar" in scenario
else None
)
)
],
"algorithms": [
HnswAlgorithmConfiguration(
name=f"hnsw-{scenario}",
kind="hnsw",
parameters=hnsw_params
)
]
}
# Add compression if needed
if "scalar" in scenario:
vector_search_config["compressions"] = [
ScalarQuantizationCompressionConfiguration(
name=f"compression-{scenario}",
rerank_with_original_vectors=True,
default_oversampling=10,
parameters=ScalarQuantizationParameters(
quantized_data_type="int8"
)
)
]
return SearchIndex(
name=f"vector-index-{scenario}",
fields=fields,
vector_search=VectorSearch(**vector_search_config)
)
The "stored" Property
The new "stored" property for vector fields provides granular control over storage management. When set to False, vector data remains searchable but isn't stored in the index. This is particularly useful when you:
- Have vectors stored in another system
- Don't need to retrieve original vectors
- Want to optimize storage costs
- Need to maintain searchability while reducing storage footprint
Performance Analysis
We conducted extensive testing with a dataset of 1 million vectors to evaluate the impact of these optimizations:
Storage Impact
def analyze_storage_metrics():
scenarios = ["baseline", "stored", "scalar", "both"]
metrics = {}
for scenario in scenarios:
index_name = f"vector-index-{scenario}"
stats = index_client.get_index_statistics(index_name)
metrics[scenario] = {
"storage_size": stats.storage_size,
"vector_index_size": stats.vector_index_size
}
return metrics
metrics = analyze_storage_metrics()
Results:
- Stored Property Only: 48% storage reduction
- Scalar Quantization: 74% vector index size reduction
- Combined Optimizations: 66% total storage reduction
Search Quality Impact
We measured search quality across different configurations:
def compare_search_quality(query_vector, k=5):
searcher = OptimizedSearcher(SEARCH_ENDPOINT, credential)
baseline_results = searcher.search(
"vector-index-baseline",
query_vector,
k=k
)
for scenario in ["stored", "scalar", "both"]:
results = searcher.search(
f"vector-index-{scenario}",
query_vector,
k=k,
scenario=scenario
)
# Compare results with baseline
score_diff = compare_results(baseline_results, results)
print(f"{scenario} average score difference: {score_diff}")
Results showed minimal impact on search quality:
- Average score difference < 0.01 across all configurations
- Top-5 result sets maintained 95%+ consistency
- Reranking effectively mitigated quantization effects
Bonus: Putting it all together
Here's how to implement these optimizations using the Azure Search SDK:
def create_optimized_index(index_client, name, config):
"""
Creates a search index with specified optimization settings.
"""
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchField(name="title", type=SearchFieldDataType.String, searchable=True),
SearchField(
name="embedding",
type="Collection(Edm.Single)",
vector_search_dimensions=DIMENSIONS,
vector_search_profile_name="default-profile",
stored=config.get("stored", True) # Control vector storage
)
]
# Configure compression if specified
compression = None
if config.get("compression_type"):
compression_params = {
"compression_name": f"{config['compression_type']}-compression",
"rescoring_options": RescoringOptions(
enable_rescoring=not config.get("discard_originals", False),
rescore_storage_method=(
VectorSearchCompressionRescoreStorageMethod.DISCARD_ORIGINALS
if config.get("discard_originals")
else VectorSearchCompressionRescoreStorageMethod.PRESERVE_ORIGINALS
)
)
}
# Add dimension truncation if specified
if config.get("truncate_dims"):
compression_params["truncation_dimension"] = config["truncate_dims"]
# Create appropriate compression object
if config["compression_type"] == "scalar":
compression = ScalarQuantizationCompression(
parameters=ScalarQuantizationParameters(
quantized_data_type="int8"
),
**compression_params
)
elif config["compression_type"] == "binary":
compression = BinaryQuantizationCompression(**compression_params)
# Create the index with optimization settings
index = SearchIndex(
name=name,
fields=fields,
vector_search=VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(
name="hnsw-config",
kind="hnsw",
parameters=HnswParameters(
m=4,
metric="cosine",
ef_construction=400,
ef_search=500
)
)
],
profiles=[
VectorSearchProfile(
name="default-profile",
algorithm_configuration_name="hnsw-config",
compression_name=compression.compression_name if compression else None
)
],
compressions=[compression] if compression else None
)
)
return index_client.create_or_update_index(index)
Performance Analysis
We conducted extensive testing using a dataset of 100,000 documents with 3,072-dimensional embeddings from the OpenAI text-embedding-3-large model.
Here are the key findings:
Optimization Strategy | Storage Size (MB) | Storage Reduction | Vector Index Size (MB) | Vector Index Reduction | Total Size (MB) | Total Reduction |
---|---|---|---|---|---|---|
Baseline (No Optimization) | 3,813.29 | - | 1,177.15 | - | 4,990.44 | - |
Scalar Quantization | ||||||
- Full Dimensions | 1,606.84 | 57.86% | 300.60 | 74.46% | 1,907.44 | 61.78% |
- Truncated (1024) | 1,409.21 | 63.04% | 103.77 | 91.18% | 1,512.98 | 69.68% |
- Truncated + Discard | 232.59 | 93.90% | 102.91 | 91.26% | 335.50 | 93.28% |
Binary Quantization | ||||||
- Full Dimensions | 1,347.59 | 64.66% | 41.88 | 96.44% | 1,389.47 | 72.16% |
- Truncated (1024) | 1,323.33 | 65.30% | 17.47 | 98.52% | 1,340.80 | 73.13% |
- Truncated + Discard | 145.44 | 96.19% | 17.45 | 98.52% | 162.89 | 96.74% |
Conclusion
Azure AI Search's vector storage optimization features represent a significant advancement in managing the costs and efficiency of vector search implementations. Our detailed analysis and testing using a dataset of 100,000 documents with 3,072-dimensional embeddings revealed impressive optimization potential:
Key Findings:
- Binary quantization with dimension reduction and original vector discarding achieved up to 96.74% total storage reduction
- Scalar quantization with dimension reduction provided a balanced 69.68% reduction while maintaining search quality
- Even basic scalar quantization without dimension reduction delivered a substantial 61.78% storage saving
Implementation Recommendations:
- For cost-sensitive applications:
- Use binary quantization with dimension reduction
- Discard original vectors
- Accept potential minor impact on search quality
- For balanced performance:
- Implement scalar quantization with dimension reduction
- Retain original vectors for reranking
- Benefit from significant storage savings while maintaining search quality
- For quality-critical systems:
- Use scalar quantization without dimension reduction
- Keep original vectors
- Still achieve substantial storage optimization
When implementing these optimizations, it's crucial to:
- Test search quality impact with representative queries
- Monitor performance metrics
- Balance storage savings against search quality requirements
- Regularly review and adjust configurations based on application needs
These new features make vector search more cost-effective and scalable while maintaining high performance. Organizations can now confidently implement vector search at scale with optimized storage costs, choosing the approach that best matches their specific requirements for search quality, performance, and cost optimization.
Looking ahead, we expect to see continued improvements in vector optimization techniques, making vector search even more accessible and cost-effective for organizations of all sizes.
Additional Resources
- Azure AI Search Documentation
- Vector Search Overview
- Sample Code Repository