azure ai search
42 TopicsVoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio
In this blog post we present a simple architecture for voice-based generative AI applications that implements the RAG pattern by combining the newgpt-4o-realtime-preview model with Azure AI Search.35KViews7likes13CommentsIntroduction to OCR Free Vision RAG using Colpali For Complex Documents
Explore the cutting-edge world of document retrieval with "From Pixels to Intelligence: Introduction to OCR Free Vision RAG using ColPali for Complex Documents." This blog post delves into how ColPali revolutionizes the way we interact with documents by leveraging Vision Language Models (VLMs) to enhance Retrieval-Augmented Generation (RAG) processes.4KViews1like0CommentsImproving RAG performance with Azure AI Search and Azure AI prompt flow in Azure AI Studio
Explore how Azure AI Studio leverages Retrieval Augmented Generation (RAG) to enhance the accuracy and context of large language model responses. This article discusses the challenges in evaluating RAG performance and how the integration of Azure AI Search and Azure AI prompt flow can overcome these hurdles. Learn about the role of hybrid search in improving document retrieval and the importance of user feedback in refining system performance. Dive into practical scenarios of RAG evaluation andgain insights into how Azure AI Studio offers a robust platform for enhancing and evaluating RAG applications.13KViews1like0CommentsBuild Intelligent RAG For Multimodality and Complex Document Structure
Struggling with implementing RAG for Document with multiples Tables, Figures , Plots with complex structured scan documents, Looking for a solution? In this blog you will learn implementing end to end solution using Azure AI services including Document Intelligence, AI Search, Azure Open AI and our favorite LangChain 🙂 I have taken the advantage of Multimodal capability of Azure OpenAI GPT-4-Vision model for Figures and Plots.14KViews8likes5CommentsBinary quantization in Azure AI Search: optimized storage and faster search
As organizations continue to harness the power of Generative AI for building Retrieval-Augmented Generation (RAG)applications and agents, the need for efficient, high-performance, and scalable solutions has never been greater. Today, we're excited to introduce Binary Quantization, a new feature that reduces vector sizes by up to 96% while reducing search latency by up to 40%. What is Binary Quantization? Binary Quantization (BQ) is a technique that compresses high-dimensional vectors by representing each dimension as a single bit. This method drastically reduces the memory footprint of a vector index and accelerates vector comparison operations at the cost of recall. The loss of recall can be compensated for with two techniques called oversampling and reranking, giving you tools to choose what to prioritize in your application: recall, speed, or cost. Why should I use Binary Quantization? Binary quantization is most applicable to customers who want to store a very large number of vectors at a low cost. Azure AI Search keeps the vector indexes in memory to offer the best possible search performance. Binary Quantization (BQ) allows to reduce the size of the in-memory vector index, which in turn reduces the number of Azure AI Search partitions you need to fit your data, leading to cost reductions. Binary quantization reduces the size of the in-memory vector index by converting 32-bit floating point numbers into 1-bit values, can achieve up to a 28x reduction in vector index size (slightly less than the theoretical 32x due to overheads introduced by the index data structures). The table below shows the impact of binary quantization on vector index size and storage use. Table 1.1: Vector Index Storage Benchmarks Compression Configuration Document Count Vector Index Size (GB) Total Storage Size (GB) % Vector Index Savings % Storage Savings Uncompressed 1M 5.77 24.77 SQ 1M 1.48 20.48 74% 17% BQ 1M 0.235 19.23 96% 22% Table 1.1 compares the storage metrics of three different vector compression configurations: Uncompressed, Scalar Quantization (SQ), and Binary Quantization (BQ). The data illustrates significant storage and performance improvements with Binary Quantization, showing up to 96% savings in vector index size and 22% in overall storage. MTEB/dbpedia was used with default vector search settings and OpenAI text-embeddings-ada-002@1536 dimensions. Increased Performance Binary Quantization (BQ) enhances performance, reducing query latencies by 10-40% compared to uncompressed indexes in our testing. The improvement will vary based on oversampling rate, dataset size, vector dimensionality, and service configuration. BQ is fast for a few reasons, such as Hamming distance being faster to compute than cosine similarity, and packed bit vectors being smaller yielding improved locality. This makes it a great choice where speed is critical, and allows moderate oversampling to be applied to balance speed with relevance. Quality Retainment Reduction in storage use and improvements in the search performance come at the cost of recall when binary quantization is used. However, the tradeoff can be managed effectively using techniques like oversampling and reranking. Oversampling retrieves a greater set of potential documents to offset the resolution loss due to quantization. Reranking will recalculate similarity scores using the full-resolution vectors. The table below shows a subset of theMTEB datasetsfor OpenAIand Cohere embeddings with binary quantization meanNDCG@10with and without reranking/oversampling. Table 1.2: Impact of Binary Quantization on Mean NDCG@10 Across MTEB Subset Model No Rerank (Δ / %) Rerank 2x Oversampling (Δ / %) Cohere Embed V3 (1024d) -4.883 (-9.5%) -0.393 (-0.76%) OpenAI text-embedding-3-small (1536d) -2.312 (-4.55%) +0.069 (+0.14%) OpenAI text-embedding-3-large (3072d) -1.024 (-1.86%) +0.006 (+0.01%) Table 1.2 compares the relative point differences of Mean NDCG@10 when using Binary Quantization from an Uncompressed index across different embeddings models from a subset of MTEB datasets. Key takeaways: BQ+Reranking yields higher retrieval quality compared to no reranking The impact of reranking is more pronounced in models with lower dimensions, while for higher dimensions, the effect is smaller and sometimes negligible Strongly considering reranking with full precision vectors to minimize or even eliminate recall loss caused by quantization When to Use Binary Quantization Binary Quantization is recommended for applications with high-dimensional vectors and large datasets, where storage efficiency and fast search performance are critical. It is particularly effective for embeddings with dimensions greater than 1024. However, for smaller dimensions, we recommend testing BQ's quality or considering SQ as an alternative. Additionally, BQ performs exceptionally well when embeddings are centered around zero, as seen in popular embedding models like OpenAI and Cohere. BQ + reranking/oversampling works by searching over a compressed vector index in-memory and reranking using full-precision vectors stored on disk, allowing you to significantly reduce costs while maintaining strong search quality. This approach achieves the goal of efficiently operating on memory-constrained settings by leveraging both memory and SSDs to deliver high performance and scalability with large datasets. BQ adds to our price-performance enhancements made over the past several months, offering storage savings and performance improvements. By adopting this feature, organizations can achieve faster search results and lower operational costs, ultimately driving better outcomes and user experiences. More Functionality now Generally Available We're pleased to share several vector search enhancements are now generally available in Azure AI Search. These updates provide users with more control over their retriever in RAG solutions and optimize LLM performance. Here are the key highlights: Integrated vectorization with Azure OpenAI for Azure AI Search is now generally available! Support for Binary Vector Types: Azure AI Search supports narrow vector types including binary vectors. This feature enables the storage and processing of larger vector datasets at lower costs while maintaining fast search capabilities. Vector Weighting: This feature allows users to assign relative importance to vector queries over term queries in hybrid search scenarios. It gives more control over the final result set by enabling users to favor vector similarity over keyword similarity. Document Boosting: Boost your search results with scoring profiles tailored to vector and hybrid search queries. Whether you prioritize freshness, geolocation, or specific keywords, our new feature allows for targeted document boosting, ensuring more relevant results for your needs. Getting started with Azure AI Search To get started with binary quantization, visit our official documentation here:Reduce vector size - Azure AI Search | Microsoft Learn Learn more about Azure AI Search and about all the latest features. Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file. Learn about Retrieval Augmented Generation in Azure AI Search. Explore our preview client libraries in Python, .NET, Java, and JavaScript, offering diverse integration methods to cater to varying user needs. Explore how to create end-to-end RAG applications with Azure AI Studio.18KViews2likes0CommentsBoost RAG Performance: Enhance Vector Search with Metadata Filters in Azure AI Search
Boost RAG Performance: Enhance Vector Search with Metadata Filters in Azure AI Search In Retrieval-Augmented Generation (RAG) setups, user-specified filters are often overlooked during vector searches, as indices mainly focus on semantic similarity. Ensuring specific queries are answered using a predefined set of documents is crucial. By incorporating metadata or tags, you can enforce this restriction effectively. This blog demonstrates how to tag documents with metadata, such as genre, releaseYear, and director, to create a secure overlay for vector or hybrid searches in Azure AI Search. Learn how to classify documents, create an Azure AI search index, embed and upload document chunks with metadata, and perform filtered vector searches to improve search results. Discover how metadata filtering can enhance your RAG setup for more accurate and relevant results. Explore the full implementation and step-by-step guide in our latest blog post.16KViews2likes3CommentsAutomate RAG Indexing: Azure Logic Apps & AI Search for Source Document Processing
Azure Logic Apps has introduced new functionality that streamlines the entire process of handling documents from unstructured data connectors and pushing them to Azure AI Search. This new feature simplifies file data extraction, data parsing, chunking, vectorizing, and indexing, consolidating these steps into one seamless, integrated flow.3.7KViews2likes0Comments