Educator Developer Blog

11 MIN READ

What is retrieval-augmented generation (RAG)?

sourabhkv

Brass Contributor

Nov 12, 2024

Retrieval Augmented Generation (RAG) is a pattern that works with pretrained Large Language Models (LLM) and your own data to generate responses

Need of RAG

LLM's have outdated knowledge and lack internal knowledge

Large Language Models like GPT-3, Llama are trained on vast datasets that include a wide range of publicly available texts. However, they have notable limitations:

Lack of Updated Knowledge: LLMs are only as current as the data they were trained on. For instance, if a model was trained on data up to October 2021, it wouldn't be aware of developments or facts that emerged after that date. This lack of temporal awareness makes them less effective for tasks requiring up-to-date information.
Outdated Public Knowledge: Even within the timeframe of their training data, LLMs may still rely on outdated or incorrect public information. This is because the data they are trained on reflects the knowledge available at the time, which may have since evolved or been corrected. These models are not aware of private data as they are not trained on it.

Techniques to Address Limitations: RAG and Fine-Tuning

To overcome these limitations, two primary techniques can be employed: RAG and Fine-Tuning.

Two ways to add domain knowledge.

Retrieval-Augmented Generation (RAG)

RAG is a method that combines the strengths of traditional information retrieval systems with the generative capabilities of LLMs. It works by:

Retrieval: When a user query is received, the system searches a large, up-to-date database or corpus for relevant documents. This ensures that the latest information is considered in generating the response.
Augmentation and Generation: The retrieved documents are used to inform and guide the generation process of the LLM, allowing it to produce responses that integrate the most current and relevant information available.

RAG thus allows LLMs to provide more accurate and up-to-date answers by leveraging external databases that can be continuously updated.

While fine tuning is capable of achieving similar or better results, it's often criticized being expensive in terms of cost, maintenance.

Real-World Example of RAG

A practical example of RAG can be seen in customer support systems. Consider a tech company using RAG to enhance its AI-driven customer support chatbot. When a customer asks about the latest software updates or troubleshooting steps, the RAG system can retrieve the most recent documentation and support articles from the company's knowledge base. The LLM then uses this information to generate an accurate and updated response, providing the customer with the latest solutions and guidance.

This approach ensures that the support system remains relevant and effective, reducing the need for manual updates and allowing the company to maintain high-quality customer service with minimal resource investment.

Embeddings

Embeddings are numerical representations of data, typically in the form of dense vectors, used to capture the semantic meaning of words, sentences, or even entire documents. They are a core component in many machine learning and natural language processing (NLP) applications. Here's a more detailed explanation of what embeddings are and why they are important:

Importance of Embeddings:

Semantic Understanding: Enable models to grasp meaning and context beyond simple keyword matching.
Efficiency: Reduce dimensionality, making computations more efficient and scalable.
Transfer Learning: Allow re-use of pre-trained knowledge across tasks, improving performance and reducing training time.
Improved Generalization: Help models recognize related concepts even with limited training data.
Contextualization: Modern embeddings capture context, allowing for better handling of polysemy.
Cross-Linguistic Applications: Facilitate tasks like machine translation by providing a common representation space.

Hybrid search combines both vector search and traditional keyword search to leverage the strengths of each approach, offering several advantages over using vector search alone:

Complementary Strengths:
- Vector Search: Excels at capturing semantic similarity, finding documents that are conceptually related to the query even if they don't share the same keywords.
- Keyword Search: Ensures precision by retrieving documents that contain exact matches to the query terms, which can be crucial for specific or technical terms.
Precision and Recall:
- Hybrid search can improve both precision and recall. Vector search increases recall by finding semantically similar documents, while keyword search enhances precision by ensuring relevant keywords are present.
Handling Ambiguity:
- Pure vector search might retrieve documents that are semantically similar but not contextually relevant due to polysemy (words with multiple meanings). Keyword search can help disambiguate by ensuring the presence of specific terms.

The embeddings are like a GPS coordinate which represent a particular point in a vector space, the closer the meaning of sentences the closer the vectors. Vectors represent a high order array of numbers; the length of vectors may vary from model.

Embedding model	Dimension
NV-Embed-v2	4096
text-embedding-3-large	3072
text-embedding-ada-002	1536
text-embedding-3-small	1536
multilingual-e5-large	1024
multilingual-e5-small	384

MTEB Leaderboard - a Hugging Face Space by mteb

Types of RAG

In RAG systems, the choice of data sources is crucial as it directly influences the quality and relevance of the information retrieved and, subsequently, the responses generated. Data sources for RAG can be broadly categorized into structured and unstructured data.

Various sources of RAG

Structured data (PostgreSQL using pgvector, Azure SQL)
Documents (PDF, markdown) using Azure blob storage, Cosmos DB, AI search
GraphRAG (Knowledge graph)
VoiceRAG (realtime voice API) using GPT-4o-realtime API.

RAG flows:

Simple RAG
Flow of simple RAG
Code: aka.ms/rag-postgres

Demo: aka.ms/rag-postgres /demo
Advanced RAG with Query rewriting
Flow of Advanced RAG with query re-writing
Results are often queried using Hybrid search.

Hybrid search in PostgreSQL typically combines traditional keyword-based search with vector-based similarity search. This approach is useful in applications like document retrieval, where you want to find results that match both specific keywords and semantic similarity.
```
WITH keyword_search AS ( SELECT id, content, embedding 
FROM documents 
WHERE content_tsvector @@ plainto_tsquery('PostgreSQL') ) 
SELECT id, content FROM keyword_search ORDER BY embedding <-> '[0.15, 0.25, ..., 0.35]'::VECTOR LIMIT 5;
```
Example of hybrid search

In this method, there is a column added to table as embedding, which is embeddings of a particular field of that row or full row combined.
RAG on structured data with PostgreSQL
RAG with PostgreSQL | Chat with Azure Maps

GraphRAG

GraphRAG, which stands for RAG using Graph Databases, is an innovative approach in artificial intelligence that improves LLMs by incorporating structured, context-rich data from knowledge graphs. By enabling AI systems to access and process interconnected information, this method enhances the retrieval process, resulting in more accurate and relevant responses than those produced by traditional retrieval methods.

How Graph RAG Differs from Traditional RAG

Traditional RAG models typically retrieve isolated facts from unstructured data sources. In contrast, Graph RAG leverages structured data from knowledge graphs, enabling it to handle complex queries more effectively. This difference allows for:

Enhanced Accuracy: Responses are more likely to be contextually accurate because they are grounded in well-defined relationships within the knowledge graph.
Improved Reasoning: The structured nature of knowledge graphs facilitates better inference and reasoning capabilities, which is crucial for industries requiring intricate data analysis, such as finance and healthcare.
Reduced Hallucination: By relying on structured data, Graph RAG minimizes the risk of generating incorrect or irrelevant information, a common issue with traditional models that depend solely on unstructured text

GraphRAG: Unlocking LLM discovery on narrative private data - Microsoft Research

Using graphrag, one can search globally on the graph and get high level overview, using local search we can get low level overview of document.

In this demonstration of GraphRAG, I've integrated the story of "Alice in Wonderland" into a LLM. I will showcase how local and global search capabilities can be used to explore the story. By leveraging the structured, interconnected data from a knowledge graph, the LLM can perform efficient local searches to find specific details, as well as global searches to understand broader themes and connections within the story. This approach highlights the enhanced retrieval process enabled by GraphRAG, providing more accurate and relevant insights.

Global search

Question: what is the theme of the story?

<li-code lang="">python -m graphrag.query --root . --method global "what is theme of the story"</li-code>
<p>Output</p>

The story presents several prominent themes that intertwine to create a rich narrative tapestry. Below are the key themes identified:

### Authority and Power Dynamics A significant theme in the story is the exploration of authority and power dynamics, particularly illustrated through the character of the Queen of Hearts. Her authoritarian rule and the fear she instills in others highlight the chaotic nature of her court. This absurdity of authority is evident in her arbitrary commands and the conflicts that arise from them, especially during pivotal events like the croquet game and the trial [Data: Reports (29, 30, 26, 27, 25, +more)].

### Identity and Transformation The theme of identity is central to Alice's journey as she navigates her experiences in Wonderland. Her encounters with various characters, such as the Cheshire Cat and the Caterpillar, challenge her understanding of herself and prompt reflections on her sense of self. This exploration of identity is crucial for Alice's character development, particularly as she undergoes physical transformations throughout her adventures [Data: Reports (27, 5, 9, 3, +more)].

### Absurdity and Chaos Absurdity permeates the narrative, exemplified by nonsensical interactions and whimsical events, such as the Mad Tea Party. The chaotic nature of Wonderland defies logic, creating a surreal atmosphere where characters engage in bizarre dialogues. This theme is further emphasized through the nonsensical trial proceedings, which critique the legal system and highlight the whimsical yet critical perspective on authority and justice [Data: Reports (5, 27, 26, 24, +more)].

### Exploration and Curiosity Alice's desire for exploration and curiosity drives her character throughout the story. Her longing to enter the garden and her various adventures reflect the importance of seeking new experiences. This theme underscores the notion of discovery as a vital aspect of her journey in Wonderland [Data: Reports (10, 14)].

### Childhood Innocence and Connection The narrative also explores themes of childhood innocence and the longing for connection, particularly through Alice's reflections on her pet cat, Dinah. This relationship symbolizes comfort amidst the chaos of Wonderland, contrasting with the bizarre and often hostile interactions she faces. Additionally, the friendships Alice forms with characters like the Gryphon and the Mock Turtle highlight the importance of support and companionship in navigating her whimsical surroundings [Data: Reports (27, +more), (3, 16)].

### Conclusion In summary, the themes of authority, identity, absurdity, exploration, and connection are intricately woven into the fabric of the story. They not only shape Alice's experiences but also reflect broader critiques of societal norms and the nature of reality itself. Each theme contributes to a deeper understanding of the narrative and the complexities of Alice's journey through Wonderland.

Local search

Question: "what is ending of the story"

<li-code lang="">python -m graphrag.query --root . --method local "what is ending of the story"</li-code>
<p> Output</p>

The ending of "Alice's Adventures in Wonderland" culminates in a chaotic trial where the Knave of Hearts is accused of stealing the Queen's tarts. Throughout the trial, Alice's experiences and interactions with various characters lead her to challenge the absurdity of the proceedings. The Queen of Hearts, representing arbitrary authority, demands justice, but the trial itself is characterized by nonsensical rules and a lack of coherent evidence [Data: Reports (27); Entities (42, 121, 124); Relationships (122, 123, 216)].

As the trial progresses, Alice grows larger, both physically and in her confidence. She ultimately stands up to the Queen, declaring that she is not afraid of her. This act of defiance symbolizes Alice's journey toward self-assertion and understanding of her own identity amidst the chaos of Wonderland. The Queen's threats and the absurdity of the court do not intimidate her, and she boldly asserts that the Queen is nothing but a pack of cards [Data: Reports (27); Relationships (12, 121)].

In the final moments, as the court descends into further chaos, Alice awakens from her dream, realizing that her adventures in Wonderland were a product of her imagination. This awakening signifies a return to reality and a reflection on her experiences, leaving her with a sense of wonder and curiosity about the world around her [Data: Reports (27); Entities (145); Relationships (68)]. The story concludes with Alice sharing her adventures with her sister, who contemplates the whimsical nature of Wonderland and the lessons learned from Alice's journey. This ending emphasizes themes of growth, identity, and the transition from childhood innocence to a more complex understanding of the world.

VoiceRAG

VoiceRAG is an application pattern that combines RAG with voice capabilities, leveraging Azure AI's search and GPT-4o Realtime API for audio to enhance user interactions. This approach allows users to engage with AI systems through natural voice queries, which are processed to retrieve and generate more relevant and accurate responses. It uses the power of Azure's AI search capabilities to access structured data efficiently, improving the overall user experience by providing contextually rich and precise information.

Ensuring Security in RAG Implementations

Whether you are an individual or an enterprise, security is a crucial aspect of implementing RAG. Here are some key considerations:

Data Privacy and Security: Protecting sensitive information in RAG processes is paramount. Utilize robust encryption and implement strict access controls to safeguard against unauthorized access and potential data breaches.
Inaccurate Outputs: RAG systems might produce incorrect or misleading responses if there is a disconnect between the AI model's understanding and the grounding data. It is vital to ensure that data alignment is maintained to uphold accuracy.
Compliance Challenges: Adhering to data protection regulations such as GDPR and CCPA is essential. This requires maintaining comprehensive audit trails and implementing strong data governance practices to ensure compliance.
Security Vulnerabilities: RAG systems can be susceptible to prompt injection attacks and data poisoning, where malicious actors manipulate data sources to introduce harmful content or misinformation. Implementing content filtering and other preventive measures is necessary.
Operational Complexity: Managing the diverse and extensive data sources required for RAG can be resource intensive. Ensuring that infrastructure is scalable to meet performance demands is critical.

Safety is crucial especially when there is so much in database behind the AI, there can be input or output that might be biased, harmful, or misleading so that we can trust AI's output and also prevent any misuse.

Using Azure AI content safety, we can apply different content filter for real-time input as well as output to prevent any misinformation, harmful info. It can be done by adjusting categorical values (violence, hate, sexual, self-harm) depending on use case.
Additionally, a stronger system prompt can server as layer on top to proactively prevent users from undermining safety instructions. It can save from prompt injection attacks.
There are 2 types of prompt injection attacks:
- Direct jailbreak attacks - when attackers try to work around responsible AI guardrails.
- Indirect jailbreak attacks - when attackers try to poison the grounding data in RAG app, which could be referenced in RAG app in further process. In this way AI services violate their own policies.

These attacks can be filtered in Azure AI studio, prompt shield filters both direct and indirect attacks.
Once the application is deployed, Risks & Safety monitoring can be used see and analyze what is getting blocked both in input and output filters, on top of that alerts can be set up to get notification of

To address these risks, both individuals and enterprises should enforce stringent access controls, utilize tools to verify the accuracy of AI responses, implement content filtering to thwart attacks, and ensure adherence to data protection laws. By doing so, you can secure RAG implementations and enhance the effectiveness of AI, whether on a personal or organizational level.