When building RAG solutions, it’s important to ensure that you’re grounding your LLM with the highest quality results, for the best LLM performance. As part of our 2024-05-01-Preview API Version, we’re launching several new updates to our Azure AI Search relevance stack to give you more control of your retriever.
Support for Binary Vector Types
We've added support for binary vectors (bit vectors) enabling Azure AI Search to store and process embeddings that support binary outputs such as Cohere Embed V3. This new feature allows for larger vector datasets at a lower cost while maintaining or improving fast search capabilities. According to Cohere, binary embeddings can keep up to 95% of the search quality, and net space used for vector data is reduced by 32x.
Score Threshold Filtering
The new "threshold" property in Azure AI Search queries allows customers to improve search result relevance for vector and hybrid search queries. This feature filters out documents with low similarity scores before combining results from different recall sets. Whether you prefer filtering by 'searchScore' or 'vectorSimilarity,' you have the flexibility to prioritize the most relevant documents.
Vector Weighting
The new vector weighting feature in Azure AI Search specifies relative weight (importance) of vector queries to term queries in hybrid search scenarios. For example, you can favor vector similarity over keyword similarity. Also, a relative weight of different vector queries in multi-vector search requests can be defined to favor similarity of one vector field over another. The specified weights are used when calculating the scores of each document, giving you more control over the final result set.
MaxTextSizeRecall for Hybrid Search
Tailor your hybrid search experience with the ability to specify the maximum number of documents recalled in hybrid search queries from the keyword search recall set. You can also adjust the 'count' property to include all matching documents or only those retrieved within a defined window. This enhancement improves relevance and empowers you to control the number of documents retrieved. Additionally, reducing the number of text documents retrieved can significantly improve performance.
Document Boosting support for Vector/Hybrid Search
Boost your search results with scoring profiles tailored to vector and hybrid search queries. Whether you prioritize freshness, geolocation, or specific keywords, our new feature allows for targeted document boosting, ensuring more relevant results for your needs.
Experience up to a 50% Decrease in Query Latency for Hybrid Search
We've also been working hard on performance. With the latest set of improvements, customers are seeing up to 50% lower latency in hybrid search queries with no changes in their apps.
More news from Azure AI Search:
- Announcing cost-effective RAG at scale with Azure AI Search
- Streamlined multimodal search with AI Studio embedding models and Azure AI Search
- Automatically index Microsoft Fabric OneLake files in Azure AI Search, now in preview
Getting started with Azure AI Search
- Learn more about Azure AI Search
- Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file.
- Review Azure AI Search pull-indexers and push data approaches.
- Learn about Retrieval Augmented Generation in Azure AI Search
- Learn more about integrated vectorization and why chunking and vectors are important in your RAG solutions.
- Read the blog: Outperforming vector search with hybrid retrieval and ranking capabilities
- Watch a video on Microsoft Mechanics: How vector search and semantic ranking improve your AI prompts