Why Should Business Adopt RAG and migrate from LLMs?

kevin_comba · ‎May 17 2024

In this blog we are going to discuss the importance of migrating your product or startup project from LLMS to RAG. Adopting RAG empowers businesses to leverage external knowledge, enhance accuracy, and create more robust AI applications. It’s a strategic move toward building intelligent systems that bridge the gap between generative capabilities and authoritative information. Below are topics in this blog.

Brief History of AI
What are Large Language Models (LLMS).
Limitation of LLMS.
How can we incorporate domain knowledge.
What is Retrieval Augmented Generation (RAG).
What is Robust retrieval for RAG Apps.

Once we are done with these concepts, I hope to convince you to adopt RAG in your project.

Brief History of AI

The concept of having intelligent systems started in 1950s when Artificial intelligence was introduced as a field of computer science. In 1959 Machine Learning was introduced as a subset of AI. 2017 the concept of deep learning took over as a way of using neural networks to process data and make decisions. From 2021(birth of Generative AI) till now we are in the error of Generative AI which basically creates response (images, text, audio or video) when given prompts(query) on the data it has been trained on. In summary Generative AIs are Large Language Models (LLMs) capable of generating coherent and contextual responses.

What are Large Language Models (LLMS)

An LLM is a model that is so large that it achieves general-purpose language understanding and generation. When an LLM is trained on data to give sentiment when prompted with a review should be able to produce positive or negative sentiment as its output.

On Azure for example we have various models which can be deployed via Azure OpenAI or Azure AI Studio and be readily available to consume, fine-tune and even train using different parameters.

Talking about Generative Pre-trained Transformers (GPT) models which are one of the models available in Azure, they are trained on the next word prediction task.

Limitations of LLMS

LLMS have been faced by various limitations ie

Bias and Hallucination
Unforeseen Consequences ie harmful information

Among others which have been solved ie Microsoft adheres to strong Ethical guidelines and polices for responsible AI practices. Which helps in resolving the issue of unforeseen consequences. Fine-Tuning and Customization helps to reduce bias and hallucinations of LLMs.

The biggest limitation of all LLMs is Outdated Public knowledge and no internal knowledge.

To solve this, we must incorporate techniques offered by domain knowledge.

Incorporating Domain Knowledge

Incorporating domain knowledge into LLMs is crucial for enhancing their performance and making them more contextually relevant. Examples are

Prompt engineering – it relies on in-context learning whereby LLMs like GPT-3 is trained on vast amounts of text data to predict the next token based on preceding text.
Fine-tuning - involves training an LLM on a smaller dataset specific to a particular domain. By fine-tuning, the model adapts its pre-learned knowledge to the nuances of the target domain.

Both In-context learning and fine turning don’t address the issue of Outdated Public knowledge. This brings us to Retrieval Augmented Generation (RAG).

What is Retrieval Augmented Generation (RAG)

RAG is based on the concept of an LLM leaning new facts temporarily. RAG with Azure OpenAI allows developers to use supported AI chat models that can reference specific sources of information to ground the response. Adding this information allows the model to reference both the specific data provided and its pretrained knowledge to provide more effective responses.

Azure OpenAI enables RAG by connecting pretrained models to your own data sources. Azure OpenAI on your data utilizes the search ability of Azure AI Search to add the relevant data chunks to the prompt. Once your data is in a AI Search index, Azure OpenAI on your data goes through the following steps:

Receive user prompt.
Determine relevant content and intent of the prompt.
Query the search index with that content and intent.

Insert search result chunk into the Azure OpenAI prompt, along with system message and user prompt.
Send the entire prompt to Azure OpenAI.
Return response and data reference (if any) to the user.

This concept enables LLMs to learn fast and efficiently beating the process for fine-tuning which is both costly and time intensive, and should only be used for use cases where it's necessary.

Robust retrieval for RAG Apps

To achieve robust retrieval for RAG Apps we must first consider the importance of the search step (in the image above). Below are points to keep in mind ie responses from RAG Apps are only as good as retrieved data.

We can also achieve robust retrieval in our RAG apps by incorporating vector-based search and vector databases.

Products (49)

Special Topics (26)

Video Hub (462)

Most Active Hubs