RAG with Azure AI: why your retrieval strategy matters AMA

Retrieval-augmented generation (RAG) allows you to build GenAI applications that use your own data, to optimize LLM performance. Join our AMA to ask us about RAG, vector databases, running RAG...

EricStarker

Updated Feb 14, 2024

gia_mondragon

Microsoft

Feb 14, 2024

Q1: The Azure OpenAI “on your data” feature from the Azure OpenAI PlayGround (https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line%2Cpython&pivots=programming-language-studio) you’re only able for now to add a single data source at a time. However, there are other options to get the data into the AI Search index so you can use the index directly in that feature. From the AI Search end, you could use Integrated vectorization - Azure AI Search | Microsoft Learn to chunk and vectorize files from different Blob containers and use a single index as a target, then you can use the Azure OpenAI on your data feature and use that index accordingly. The number of indexers you can have in a single instance is limited by the SKU you use: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity Q3: If you’re using Azure OpenAI “on your data” functionality, you can control the number of documents retrieved in the advanced options: Q4: We would need to understand the scenario better, where are you asking the questions (in which console/system)? What kind of questions are you asking? What is in your documents to help the LLM answer the question? This would help us with the next steps to answer this properly. Thanks. Q8: The first run of an indexer may take even multiple hours while running, depending on the size of the documents and the number of the documents in the blob container. If the creation state is what taking long, this may be expected based on that. However, you should be able to start searching your index with the documents already indexed.

CPS

Occasional Reader

Feb 14, 2024

Re. Q4, we were asking the questions from the basic "Contoso" chat application generated and deployed by the Studio. Example of question: "How many micro-credentials are available from University of Toronto? The chatbot responds with 5, and we know that there are 210 in the dataset that we indexed. (If we ask the same question in our Custom GPT with the same dataset it responds correctly.) Note that we are using a structured dataset (CSV), not a bunch of loose documents. However, since your examples and documentation are mostly around indexing documents, we even created separate files (one per CSV row) and included a document with statistics about the dataset to try to help it along, but it didn't help.

Event details