RAG with Azure AI: why your retrieval strategy matters AMA

Occasional Reader

Feb 14, 2024

Context:

The intention is to leverage the Azure OpenAI Chat
With the following Properties for the deployment:
- Model name: gpt-4-32k
- Model version: 0613
- Version update policy: Once a new default version is available.
- Deployment type: Standard
- Content Filter: Default
- Tokens per Minute Rate Limit (thousands): 30
- Rate limit (Tokens per minute): 30000
- Rate limit (Requests per minute): 180
We configured a data source that is based on structured data (Azure Search Service with an Index that has Semantic Search configured). In our case is a list and the corresponding details for Micro Credentials offered by Higher Education Institutions. The dataset we tested is not large, about 2000 records and about 3 MB of data in total.

Questions:

Q1: We need to have one source with structured data and one that is a BLOB Storage with PDF files. The PDF files are meant to offer guidelines to the Azure OpenAI Chat. How can we add more than one data source?

Q2: How to get around the way some of the responses are formulated, often the response starts with “Based on the retrieved documents, the institutions that ….” Ideally will be to say “Based on my knowledge base, the institutions that…”

Q3: We run into functionality issues for basic questions (see screenshot) where Azure OpenAI is not able to retrieve a complete list even though is not an extensive one even though the data source was set to not have data content limits. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q4: All the responses to questions that require some analytics (nothing complicated just Counts) are returning incorrect results. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q5: One of our requirements is to allow a user to upload a file as part of their request (in our case the user will upload a brief resume file and the Azure OpenAI Chat is expected to quickly analyze it and return a relevant list of Micro Credentials). NOTE: this functionality is available in the OpenAI custom ChatGPT.

Q6: How can we get around quota limitations in Azure OpenAI Service?

Q7: Are there any limitations on Azure Search Service side?

Q8: We were not able to create an Index for an Azure Search Service that relies on JSON files. It gets stuck on the last step when the indexer is created, just displays “Validating” and never gets out from that state.

gia_mondragon
Microsoft
Feb 14, 2024
Q1: The Azure OpenAI “on your data” feature from the Azure OpenAI PlayGround (https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line%2Cpython&pivots=programming-language-studio) you’re only able for now to add a single data source at a time. However, there are other options to get the data into the AI Search index so you can use the index directly in that feature. From the AI Search end, you could use Integrated vectorization - Azure AI Search | Microsoft Learn to chunk and vectorize files from different Blob containers and use a single index as a target, then you can use the Azure OpenAI on your data feature and use that index accordingly. The number of indexers you can have in a single instance is limited by the SKU you use: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity Q3: If you’re using Azure OpenAI “on your data” functionality, you can control the number of documents retrieved in the advanced options: Q4: We would need to understand the scenario better, where are you asking the questions (in which console/system)? What kind of questions are you asking? What is in your documents to help the LLM answer the question? This would help us with the next steps to answer this properly. Thanks. Q8: The first run of an indexer may take even multiple hours while running, depending on the size of the documents and the number of the documents in the blob container. If the creation state is what taking long, this may be expected based on that. However, you should be able to start searching your index with the documents already indexed.
- CPS
  Occasional Reader
  Feb 14, 2024
  Re. Q4, we were asking the questions from the basic "Contoso" chat application generated and deployed by the Studio. Example of question: "How many micro-credentials are available from University of Toronto? The chatbot responds with 5, and we know that there are 210 in the dataset that we indexed. (If we ask the same question in our Custom GPT with the same dataset it responds correctly.) Note that we are using a structured dataset (CSV), not a bunch of loose documents. However, since your examples and documentation are mostly around indexing documents, we even created separate files (one per CSV row) and included a document with statistics about the dataset to try to help it along, but it didn't help.
- CPS
  Occasional Reader
  Feb 14, 2024
  Re. Q8, we tried six times, and even we waited overnight for the index to be created and it was still stuck on validating. The input was a single JSON test file with only 50 records (60KB), in a Storage Blob container with the single file. When we use CSV input with 2000 records it is indexed in less that 1 minute.
  - gia_mondragon
    Microsoft
    Feb 15, 2024
    Are you following the guidance to index JSON files? Depending on the data structure you need to choose a different parsing mode: https://learn.microsoft.com/en-us/azure/search/search-howto-index-json-blobs
- CPS
  Occasional Reader
  Feb 14, 2024
  Re. Q3: we are aware of the Advanced options and we did uncheck the option to remove the limitations, but it is still applying the limits.
danquirk
Former Employee
Feb 14, 2024
Q2: Prompt engineering is the component of Retrieval Augmented Generation with Azure AI Search that gives you the ability to influence the formulation of output responses. The Azure OpenAI Service has content on prompt engineering (ranging from introductory to advanced) to help you with this topic: Azure OpenAI Service - Azure OpenAI | Microsoft Learn
- allisonsparrow
  Microsoft
  Feb 14, 2024
  + 1 to Dan's comment - Fine-tuning is also a method that's effective in changing the LLMs tone/manner of speaking: "Good use cases for fine-tuning include steering the model to output content in a specific and customized style, tone, or format, or scenarios where the information needed to steer the model is too long or complex to fit into the prompt window." https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine-tuning-considerations
  - CPS
    Occasional Reader
    Feb 14, 2024
    we are using gpt-4 model which does not support Fine Tuning (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython&pivots=programming-language-studio)
fsunavala-msft
Microsoft
Feb 14, 2024
Q6: Quota limits exist for capacity reasons and to maintain the health of your service. For further information on quota limitations, please visit the Azure OpenAI Service documentation: Azure OpenAI Service quotas and limits - Azure AI services | Microsoft Learn. Additionally, you can find how to manage your quota here: Manage Azure OpenAI Service quota - Azure AI services | Microsoft
- CPS
  Occasional Reader
  Feb 14, 2024
  Re. Q6, we are hitting the limit with just two human users doing some basic and simple testing in the "Contoso" Chatbot created and deployed by the Studio. The index was created from a 2000 record CSV, i.e. not a big dataset. This would make it very unusable for a production environment accessible to the public, even if it has only a few visitors.

Event details