RAG with Azure AI: why your retrieval strategy matters AMA

Event details

Retrieval-augmented generation (RAG) allows you to build GenAI applications that use your own data, to optimize LLM performance. Join our AMA to ask us about RAG, vector databases, running RAG...

EricStarker

Updated Feb 14, 2024

CPS

Occasional Reader

Feb 14, 2024

Context:

The intention is to leverage the Azure OpenAI Chat
With the following Properties for the deployment:
- Model name: gpt-4-32k
- Model version: 0613
- Version update policy: Once a new default version is available.
- Deployment type: Standard
- Content Filter: Default
- Tokens per Minute Rate Limit (thousands): 30
- Rate limit (Tokens per minute): 30000
- Rate limit (Requests per minute): 180
We configured a data source that is based on structured data (Azure Search Service with an Index that has Semantic Search configured). In our case is a list and the corresponding details for Micro Credentials offered by Higher Education Institutions. The dataset we tested is not large, about 2000 records and about 3 MB of data in total.

Questions:

Q1: We need to have one source with structured data and one that is a BLOB Storage with PDF files. The PDF files are meant to offer guidelines to the Azure OpenAI Chat. How can we add more than one data source?

Q2: How to get around the way some of the responses are formulated, often the response starts with “Based on the retrieved documents, the institutions that ….” Ideally will be to say “Based on my knowledge base, the institutions that…”

Q3: We run into functionality issues for basic questions (see screenshot) where Azure OpenAI is not able to retrieve a complete list even though is not an extensive one even though the data source was set to not have data content limits. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q4: All the responses to questions that require some analytics (nothing complicated just Counts) are returning incorrect results. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q5: One of our requirements is to allow a user to upload a file as part of their request (in our case the user will upload a brief resume file and the Azure OpenAI Chat is expected to quickly analyze it and return a relevant list of Micro Credentials). NOTE: this functionality is available in the OpenAI custom ChatGPT.

Q6: How can we get around quota limitations in Azure OpenAI Service?

Q7: Are there any limitations on Azure Search Service side?

Q8: We were not able to create an Index for an Azure Search Service that relies on JSON files. It gets stuck on the last step when the indexer is created, just displays “Validating” and never gets out from that state.

gia_mondragon

Microsoft

Feb 14, 2024

Q1: The Azure OpenAI “on your data” feature from the Azure OpenAI PlayGround (https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line%2Cpython&pivots=programming-language-studio) you’re only able for now to add a single data source at a time. However, there are other options to get the data into the AI Search index so you can use the index directly in that feature. From the AI Search end, you could use Integrated vectorization - Azure AI Search | Microsoft Learn to chunk and vectorize files from different Blob containers and use a single index as a target, then you can use the Azure OpenAI on your data feature and use that index accordingly. The number of indexers you can have in a single instance is limited by the SKU you use: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity Q3: If you’re using Azure OpenAI “on your data” functionality, you can control the number of documents retrieved in the advanced options: Q4: We would need to understand the scenario better, where are you asking the questions (in which console/system)? What kind of questions are you asking? What is in your documents to help the LLM answer the question? This would help us with the next steps to answer this properly. Thanks. Q8: The first run of an indexer may take even multiple hours while running, depending on the size of the documents and the number of the documents in the blob container. If the creation state is what taking long, this may be expected based on that. However, you should be able to start searching your index with the documents already indexed.

CPS
Occasional Reader
Feb 14, 2024
Re. Q4, we were asking the questions from the basic "Contoso" chat application generated and deployed by the Studio. Example of question: "How many micro-credentials are available from University of Toronto? The chatbot responds with 5, and we know that there are 210 in the dataset that we indexed. (If we ask the same question in our Custom GPT with the same dataset it responds correctly.) Note that we are using a structured dataset (CSV), not a bunch of loose documents. However, since your examples and documentation are mostly around indexing documents, we even created separate files (one per CSV row) and included a document with statistics about the dataset to try to help it along, but it didn't help.
CPS
Occasional Reader
Feb 14, 2024
Re. Q8, we tried six times, and even we waited overnight for the index to be created and it was still stuck on validating. The input was a single JSON test file with only 50 records (60KB), in a Storage Blob container with the single file. When we use CSV input with 2000 records it is indexed in less that 1 minute.
- gia_mondragon
  Microsoft
  Feb 15, 2024
  Are you following the guidance to index JSON files? Depending on the data structure you need to choose a different parsing mode: https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs
CPS
Occasional Reader
Feb 14, 2024
Re. Q3: we are aware of the Advanced options and we did uncheck the option to remove the limitations, but it is still applying the limits.
CPS
Occasional Reader
Feb 14, 2024
Re. Q1: we are not doing it from the AI Studio Playground because we want to use Semantic ranking. We create the Azure Search Service with an index that points to a blob storage with our structured data, add Semantic ranking, and we ask the Studio to use that Search Service.