Authors: Maxim Lukiyanov, PhD, Principal PM Manager; Serdar Mumcu, Principal Software Engineer; Abe Omorogbe, Senior PM; Joshua Johnson, Principal PM; Jonathon Frost, Principal PM.
Many businesses today are being reimagined from the ground up starting from the new foundation of Generative AI technology. The key challenge these innovators face is ensuring the trust of the users in the new platform. To achieve that, GenAI apps need to become much more accurate and precise in their responses than they are today. This is why we are excited to announce the Public Preview of GraphRAG Solution Accelerator for Azure Database for PostgreSQL! It offers an end-to-end example of applying the GraphRAG technique to the Postgres Legal Research Copilot application to boost the quality of LLM responses and the accuracy of information retrieval pipeline by leveraging graph-like relationships in the legal data. Additionally, it simplifies the application architecture by using PostgreSQL as both a relational and graph database.
The problem of accuracy
Innovation in the GenAI space is moving fast. Only recently we were introduced to ChatGPT and we learned that it can hallucinate and make up answers. This put an initial dent in our trust in GenAI apps. But the industry quickly came up with a solution – the Retrieval Augmented Generation (RAG) approach. RAG apps use facts gathered from enterprise or internal data sources of the user to ground LLM model responses in factual data and improve their accuracy. The technique delivers great immediate results in initial POCs and small-scale deployments. But when the number of documents grows or the documents are too similar to each other, the fact gathering engine of the RAG apps – the vector search - starts to fail. We hear from many customers that progressed from the initial POC step to production deployment that the solution is almost there but its accuracy is still too low. The productivity loss from the incorrect answers and the loss of user trust are just too big to ignore. For these experienced customers the question of deploying RAG apps quickly becomes the question of whether they can boost the accuracy of the GenAI app to an acceptable level.
Improving the accuracy of Generative AI apps
There are many techniques that can be used to improve the accuracy of vector search and information retrieval pipelines in general. Some of them are basic and generally applicable, like chunking strategies, bigger embeddings, query rewriting, hybrid search and metadata filtering. Others are more advanced and more dataset specific: semantic ranking, hierarchical summarization (RAPTOR), knowledge graphs (GraphRAG), agentic systems. In general, the performance of each technique is dependent on the dataset it is being applied to and the reader is well advised to invest in a robust evaluation framework to systematically measure the quality of the GenAI responses and how well these advanced techniques work on their own data.
In this blog we focus on the GraphRAG technique - one of the more powerful techniques that works well with graph data - and discuss details of the provided Solution Accelerator for Azure Database for PostgreSQL. In an accompanying blog post: Introducing Semantic Ranker Solution Accelerator for Azure Database for PostgreSQL, we dive deep into another powerful technique – semantic ranking.
An overview of the solution accelerator
This solution accelerator is designed as an end-to-end example of a Legal Research Copilot application. It demonstrates the implementation of three information retrieval techniques: vector search, semantic ranking, and GraphRAG on Azure Database for PostgreSQL, and illustrates how they can be combined to deliver high quality responses to legal research questions. The app uses the U.S. Case Law dataset of 0.5 million legal cases as a source of the factual data. The source code of the Solution Accelerator is provided in the following repo: https://github.com/Azure-Samples/graphrag-legalcases-postgres
The application has following architecture:
Semantic ranking
Semantic ranking improves accuracy of the vector search by re-ranking results using a semantic ranker model, which brings more relevant items to the top of the ranked list. For example, using NDCG@10 metric, which characterizes accuracy of top10 results, and dependent on the dataset used, semantic ranking can deliver up to 10 to 20 percentage point increase in accuracy. The semantic ranker component used in this solution accelerator is published as an independent solution accelerator module. You can learn more about it in the corresponding blog post: Introducing Semantic Ranking Solution for Azure Database for PostgreSQL.
GraphRAG
GraphRAG is an advanced RAG technique proposed by Microsoft Research to improve quality of RAG system responses by extracting knowledge graph from the source data and leveraging it to provide better context to the LLM. The GraphRAG technique consists of three high level steps:
- Graph extraction
- Entity summarization
- Graph query generation at query time
Step 1: Graph extraction
The GraphRAG technique fits the Legal Research use case especially well as the U.S. Case Law dataset has a natural graph structure. Legal cases make explicit citations of prior legal cases used as precedents or examples in the proceedings. These citations hold an important informational signal about the prominence of the legal cases. Prominent cases typically receive more citations. We will leverage that informational signal in our application to improve the relevance of the information retrieval pipeline.
We use citations to simplify the graph extraction process. Instead of extracting the graph structure from the unstructured text of the documents using LLM processing, we use pre-extracted citation references to directly form the citation graph of the legal cases. We store the citation graph in a Postgres database using the Apache AGE extension. This approach allows us to simplify the application architecture and use Postgres as both a relational store for the app and as a graph database for GraphRAG. You can learn more about Apache AGE on Azure Database for PostgreSQL in its announcement blog post: Introducing support for Graph data in Azure Database for PostgreSQL (Preview).
Step 2: Entity summarization
After we form the graph structure using citations between legal cases, we use Microsoft Research GraphRAG library (https://github.com/microsoft/graphrag) to build multi-level summaries of the legal cases in the graph. The multi-level summarization step is the key step of GraphRAG technique improving its accuracy.
Step 3: Graph query generation at query time
We leverage the structure of the citation graph at the query time by using specialized graph query. The graph query is designed to use the prominence of the legal cases as a signal to improve the accuracy of the information retrieval pipeline. The graph query is expressed as a mixture of traditional relational query and OpenCypher graph query and executed on Postgres using the Apache AGE extension. The resulting information retrieval pipeline that combines vector search, semantic ranker and GraphRAG is shown below.
The Cypher graph query is used to find the number of relevant citations of the cases identified by the semantic ranker. A high number of citations represents high prominence of the legal case. At the final stage of the pipeline the two rankings based on semantic relevance and case prominence are fused together using Reciprocal Rank Fusion (RRF). This fusion improves the ranking position of the relevant cases that also have high prominence. The top 10 cases are used as context for the LLM.
Due to extensible nature of PostgreSQL we are able to express complete information retrieval pipeline purely in SQL, including vector search, semantic ranking, graph query and RRF steps (see examples in the following sections).
Comparison of vector search, semantic ranking and GraphRAG
We found that combining the three retrieval techniques - vector search, semantic ranking, and GraphRAG - results in a significant improvement in the quality of LLM responses and the accuracy of the underlying information retrieval pipeline, compared to using only vector search or a combination of vector search and semantic ranking. Let’s compare the three techniques side-by-side on an example query.
Vector search
We ask the Legal Research Copilot the following question: “Water leaking into the apartment from the floor above. What are the prominent legal precedents in Washington on this problem?”
The app uses vector search to retrieve relevant legal cases and passes them as context to the LLM to generate a response:
The quality of the response is low. The LLM incorrectly identifies the second legal principle, goes off topic, and struggles to find relevant references. Let’s take a look at the accuracy of the underlying vector search results. For this question we have built a golden dataset of the ten best legal cases to use to answer this question. We will compare results to that golden dataset. The golden dataset is represented by the orange nodes in the Citation Graph view shown below.
The Citation Graph window shows a graph representation of the legal cases returned by the vector search. The nodes represent legal cases, and the edges represent citations between cases. The number of citation nodes displayed is reduced by 5x compared to the actual number to improve the readability of the graph. The size of the nodes represents the prominence of the legal cases and is proportionate to the number of relevant citations to the case. Green circles represent cases that were selected by the information retrieval pipeline (in this case vector search).
As we can see, vector search was able to correctly identify only 4 out of 10 legal cases, resulting in 40% recall for this question. Vector search is unaware of the relationships between cases and can’t assess prominence of the case from its text alone. As a result, the 6 incorrect legal cases returned by vector search caused LLM to produce incorrect response. This example showcases the importance of the accuracy of the information retrieval pipeline. But let’s move forward and improve it.
For the reader’s reference we provide below the PostgreSQL query which was used to execute vector search:
-- Vector query
WITH
embedding_query AS (
SELECT azure_openai.create_embeddings('text-embedding-3-small', query)::vector AS embedding
)
vector AS (
SELECT cases.id, cases.data#>>'{opinion, text}' AS case_text
FROM cases, embedding_query
WHERE (cases.data#>>'{court, id}')::integer IN (9029) -- Washington Supreme Court (9029)
ORDER BY description_vector <=> embedding
LIMIT 60 -- taking top 60 for subsequent reranking
)
SELECT * FROM vector
LIMIT 10;
Semantic Ranker
To improve retrieval results quality, we will switch the app into Semantic Ranker mode. In this mode the app will use Open Source semantic ranker model (BGE-reranker-v2-m3) to improve accuracy of the vector search results. Let’s first take a look at the quality of the results returned by semantic ranker.
We can see that semantic ranker was able to correctly identify 6 out of 10 cases improving the recall metric to 60%. But semantic ranker is also unaware of the graph relationships between legal cases and can’t identify prominent ones. Let’s assess how this improvement in recall affected the accuracy of the LLM response.
The response quality has improved over the vector search response, but there is still an error in the identification of the second legal principle, this time the LLM went off topic into a different direction and discussed flooding situations caused by the infrastructure of the cities.
For the reader’s reference below is the PostgreSQL query used in the app for semantic ranking of vector search results:
-- Semantic ranker query
WITH
semantic AS (
SELECT text AS case_text, relevance,
RANK() OVER (ORDER BY relevance DESC) AS semantic_rank -- capturing the rank of items according to their semantic relevance score
FROM semantic_reranking(query,
ARRAY(SELECT case_text FROM vector))
ORDER BY relevance DESC
)
SELECT * FROM semantic;
GraphRAG
As a final step, let’s execute the experiment in GraphRAG mode. First, we’ll take a look at the search results from the GraphRAG information retrieval pipeline.
We can see that introduction of the prominence information signal into the query allowed GraphRAG to correctly capture top prominent legal cases, such as “Foisy v. Wyman”, and “Stuart v. Coldwell”. Overall recall also improved to 70%. Let’s take a look at the response from the LLM.
The results are qualitatively the best out of the 3 examples. Both legal principles are identified correctly, such as “Duty of Care in Maintaining Premises” and “Implied Warranty of Habitability”. We can see also the prominent legal cases that were retrieved by search results are now used as examples illustrating the legal principles. This is the type of answer expected by the user for the given question.
For the reader’s reference below is the PostgreSQL query used in the app for GraphRAG retrieval:
-- Graph query
WITH
graph AS (
SELECT *, SELECT RANK() OVER (ORDER BY graph.refs DESC) AS graph_rank
FROM semantic
JOIN cypher('case_graph', $$
MATCH ()-[r]->(n)
RETURN n.case_id, COUNT(r) AS refs
$$) as graph_query(case_id TEXT, refs BIGINT)
ON semantic.id = graph_query.case_id
),
rrf AS (
SELECT *,
COALESCE(1.0 / (60 + graph_rank), 0.0) +
COALESCE(1.0 / (60 + semantic_rank), 0.0) AS score
FROM graph
LIMIT 10
)
SELECT * FROM rrf;
Here, we use a Cypher query and the Apache AGE extension to compute the number of citations to the legal cases. Then we seamlessly join the results back into the relational query, where we proceed to use the RRF formula to fuse semantic ranking and graph ranking into a unified rank. This illustrates the flexibility of Postgres and its ability to combine graph and relational queries in one database.
Conclusion
The GraphRAG solution accelerator for Azure Database for PostgreSQL demonstrates significant improvement in the accuracy of the information retrieval pipeline in the Legal Research Copilot application. The application successfully applies the GraphRAG technique to the legal domain where it leverages graph relationships in the legal data to improve the accuracy of the GenAI application. By leveraging the power of semantic ranker, GraphRAG and Azure Database for PostgreSQL businesses can ensure the success of their Generative AI investments. As this technology continues to evolve, it promises to unlock new levels of productivity and drive GenAI innovation across various sectors.
We encourage businesses to take advantage of this GraphRAG Solution Accelerator for Azure Database for PostgreSQL to explore the potential of the GraphRAG technique and experience firsthand how it can revolutionize their Generative AI apps. Stay tuned for further updates and enhancements as we continue to refine and expand this solution.