Natural Language to SQL Architecture

Copper Contributor

Jul 09, 2024

Thanks for the reply, markremmey!

I have a few follow-up questions:

Extra Context: We are currently implementing our solution in Semantic Kernel.

1. Regarding RAG Approaches:

How accurate can Retrieval-Augmented Generation (RAG) get? Many existing works suggest that the accuracy might not be optimal. Do you have any insights on improving the accuracy of RAG?

Here are a couple of references for context:
https://www.cidrdb.org/cidr2024/papers/p74-floratou.pdf
https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm

I am curious if implement pre-processing or post-processing to the RAG context will yield better results, maybe by adding more metadata like column description. What is your take on this?

2. LLM Agent schema selector:

Some approaches mention using schema information in step 2. However, passing all the schema information into an LLM to select the top tables could result in token limitations.

I am trying to replicate the work described in https://arxiv.org/abs/2312.11242 for the selector. However, even a single table can have a few hundred columns, each with its own descriptions, making it easy to hit token limitations.

Looking forward to your insights!

Blog Post

NL to SQL Architecture Alternatives