Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. When we use RAG, we use the user's question to search a knowledge base (like Azure AI Search), then pass along both the question and the relevant content to the LLM (gpt-3.5-turbo or gpt-4), with a directive to answer only according to the sources. In psuedo-code:
user_query = "what's in the Northwind Plus plan?"
user_query_vector = create_embedding(user_query, "ada-002")
results = search(user_query, user_query_vector)
response = create_chat_completion(system_prompt, user_query, results)
If the search function can find the right results in the index (assuming the answer is somewhere in the index), then the LLM can typically do a pretty good job of synthesizing the answer from the sources.
This simple RAG approach works best for "unstructured queries", like:
When using Azure AI Search as the knowledge base, the search call will perform both a vector and keyword search, finding all the relevant document chunks that match the keywords and concepts in the query.
But you may find that users are instead asking more "structured" queries, like:
We can think of them as structured queries, because they're trying to filter on specific metadata about a document. You could imagine a world where you used a syntax to specify that metadata filtering, like:
We don't want to actually introduce a query syntax to a a RAG chat application if we don't need to, since only power users tend to use specialized query syntax, and we'd ideally have our RAG just do the right thing in that situation.
Fortunately, we can use the OpenAI function-calling feature to recognize that a user's query would benefit from a more structured search, and perform that search instead.
If you've never used function calling before, it's an alternative way of asking an OpenAI GPT model to respond to a chat completion request. In addition to sending our usual system prompt, chat history, and user message, we also send along a list of possible functions that could be called to answer the question. We can define those in JSON or as a Pydantic model dumped to JSON. Then, when the response comes back from the model, we can see what function it decided to call, and with what parameters. At that point, we can actually call that function, if it exists, or just use that information in our code in some other way.
To use function calling in RAG, we first need to introduce an LLM pre-processing step to handle user queries, as I described in my previous blog post. That will give us an opportunity to intercept the query before we even perform the search step of RAG.
For that pre-processing step, we can start off with a function to handle the general case of unstructured queries:
tools: List[ChatCompletionToolParam] = [
{
"type": "function",
"function": {
"name": "search_sources",
"description": "Retrieve sources from the Azure AI Search index",
"parameters": {
"type": "object",
"properties": {
"search_query": {
"type": "string",
"description": "Query string to retrieve documents from azure search eg: 'Health care plan'",
}
},
"required": ["search_query"],
},
},
}
]
Then we send off a request to the chat completion API, letting it know it can use that function.
chat_completion: ChatCompletion = self.openai_client.chat.completions.create(
messages=messages,
model=model,
temperature=0.0,
max_tokens=100,
n=1,
tools=tools,
tool_choice="auto",
)
When the response comes back, we process it to see if the model decided to call the function, and extract the search_query
parameter if so.
response_message = chat_completion.choices[0].message
if response_message.tool_calls:
for tool in response_message.tool_calls:
if tool.type != "function":
continue
function = tool.function
if function.name == "search_sources":
arg = json.loads(function.arguments)
search_query = arg.get("search_query", self.NO_RESPONSE)
If the model didn't include the function call in its response, that's not a big deal as we just fall back to using the user's original query as the search query. We proceed with the rest of the RAG flow as usual, sending the original question with whatever results came back in our final LLM call.
Now that we've introduced one function into the RAG flow, we can more easily add additional functions to recognize structured queries. For example, this function recognizes when a user wants to search by a particular filename:
{
"type": "function",
"function": {
"name": "search_by_filename",
"description": "Retrieve a specific filename from the Azure AI Search index",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The filename, like 'PerksPlus.pdf'",
}
},
"required": ["filename"],
},
},
},
We need to extend the function parsing code to extract the filename
argument:
if function.name == "search_by_filename":
arg = json.loads(function.arguments)
filename = arg.get("filename", "")
filename_filter = filename
Then we can decide how to use that filename filter. In the case of Azure AI search, I build a filter that checks that a particular index field matches the filename argument, and pass that to my search call. If using a relational database, it'd become an additional WHERE
clause.
Simply by adding that function, I was able to get much better answers to questions in my RAG app like 'Summarize the document called "perksplus.pdf"', since my search results were truly limited to chunks from that file. You can see my full code changes to add this function to our RAG starter app repo in this PR.
This can be a very powerful technique, but as with all things LLM, there are gotchas:
Here are additional approaches you can try:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.