Event banner
Azure Cognitive Search AMA: Vector search, Azure OpenAI Service, generative apps, plugins & more
Event details
Introduction: This is Adam Koch and Todd Meinershagen from Paycor. We are in technical roles working to help deliver some prototype AI features into our application suite.
Some topics on our team's mind in preparation for the live session:
- Multi-tenancy 1: Are there any formal recommendations on having multi-tenant Cognitive Search-LLM via Azure AI Studio? (beyond having a full instance per tenant)
- Multi-tenancy 2: We are proofing the idea of having multiple indexes in a single cognitive search resource - one for each of our customers. We would then have a single LLM that would process the prompt along with the results of the particular index search based on the customer. Are there any limits to the number of indexes within one Search resource? Are there any risks or challenges we should be aware of in using this approach?
- In all of the samples, the pattern leverages a Blob Container with documents that are indexed with the index being automatically set up by the Open AI Studio. We are wondering how we would do that from a straight code/automation perspective. What are the commands/sdk that we use to create a new index for a Blob Container that pulls out the correct 5 pieces of metadata?
- Since Azure Cognitive Search can handle databases and json data - Does Search + Azure OpenAI also support pure data from Sql Server or json documents? Or are documents (Word, PDF, etc.) the only items supported in that scenario?
- What is the difference between the regular search and the higher priced semantic search with regards to the RAG pattern?
I can help with #5.
What is the difference between the regular search and the higher priced semantic search with regards to the RAG pattern?
RAG pattern consists broadly of two steps:
[the summary of RAG pattern below, number points 1 and 2 are sourced from https://vitalflux.com/retrieval-augmented-generation-rag-llm-examples/]
1. Retrieval Phase: Given an input query (like a question), the RAG system first retrieves relevant documents or passages from a large corpus using a retriever. This is often done using efficient dense vector space methods, like the Dense Retriever (DPR), which embeds both the query and documents into a continuous vector space and retrieves documents based on distance metrics.
2. Generation Phase: Once the top-k relevant documents or passages are retrieved, they are fed into a sequence-to-sequence generator along with the original query. The generator is then responsible for producing the desired output (like an answer to the question) using both the query and the retrieved passages as context.
During the retrieval phase, the candidate documents that are returned will directly affect the generation phase, as the quality of that phase will only be as good as the input documents and the completion model.
As a result, it is beneficial to improve the quality of the retrieval phase. This is where vector search and/or semantic search can improve on this RAG pattern. Both features (either used together or only using one or the other) can return more semantically relevant information than traditional keyword search, because they are searching based on the meaning of the search query and candidate documents and doesn't require keyword matches and term frequency, document length, term saturation, etc that TF IDF and BM25 keyword search techniques would use.