RAG with Azure AI: why your retrieval strategy matters AMA

Retrieval-augmented generation (RAG) allows you to build GenAI applications that use your own data, to optimize LLM performance. Join our AMA to ask us about RAG, vector databases, running RAG...

EricStarker

Updated Feb 14, 2024

CPS

Occasional Reader

Feb 14, 2024

Context:

The intention is to leverage the Azure OpenAI Chat
With the following Properties for the deployment:
- Model name: gpt-4-32k
- Model version: 0613
- Version update policy: Once a new default version is available.
- Deployment type: Standard
- Content Filter: Default
- Tokens per Minute Rate Limit (thousands): 30
- Rate limit (Tokens per minute): 30000
- Rate limit (Requests per minute): 180
We configured a data source that is based on structured data (Azure Search Service with an Index that has Semantic Search configured). In our case is a list and the corresponding details for Micro Credentials offered by Higher Education Institutions. The dataset we tested is not large, about 2000 records and about 3 MB of data in total.

Questions:

Q1: We need to have one source with structured data and one that is a BLOB Storage with PDF files. The PDF files are meant to offer guidelines to the Azure OpenAI Chat. How can we add more than one data source?

Q2: How to get around the way some of the responses are formulated, often the response starts with “Based on the retrieved documents, the institutions that ….” Ideally will be to say “Based on my knowledge base, the institutions that…”

Q3: We run into functionality issues for basic questions (see screenshot) where Azure OpenAI is not able to retrieve a complete list even though is not an extensive one even though the data source was set to not have data content limits. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q4: All the responses to questions that require some analytics (nothing complicated just Counts) are returning incorrect results. NOTE: in the OpenAI custom ChatGPT the results returned are correct.

Q5: One of our requirements is to allow a user to upload a file as part of their request (in our case the user will upload a brief resume file and the Azure OpenAI Chat is expected to quickly analyze it and return a relevant list of Micro Credentials). NOTE: this functionality is available in the OpenAI custom ChatGPT.

Q6: How can we get around quota limitations in Azure OpenAI Service?

Q7: Are there any limitations on Azure Search Service side?

Q8: We were not able to create an Index for an Azure Search Service that relies on JSON files. It gets stuck on the last step when the indexer is created, just displays “Validating” and never gets out from that state.

danquirk

Former Employee

Feb 14, 2024

Q2: Prompt engineering is the component of Retrieval Augmented Generation with Azure AI Search that gives you the ability to influence the formulation of output responses. The Azure OpenAI Service has content on prompt engineering (ranging from introductory to advanced) to help you with this topic: Azure OpenAI Service - Azure OpenAI | Microsoft Learn

allisonsparrow
Microsoft
Feb 14, 2024
+ 1 to Dan's comment - Fine-tuning is also a method that's effective in changing the LLMs tone/manner of speaking: "Good use cases for fine-tuning include steering the model to output content in a specific and customized style, tone, or format, or scenarios where the information needed to steer the model is too long or complex to fit into the prompt window." https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine-tuning-considerations
- CPS
  Occasional Reader
  Feb 14, 2024
  we are using gpt-4 model which does not support Fine Tuning (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython&pivots=programming-language-studio)

Event details