Authors: Maxim Lukiyanov, PhD, Principal PM Manager, and Binnur Gorer, PhD, Senior Software Engineer.
The success of Generative AI apps in the enterprise is frequently decided by the amount of trust their users can put into them, which is, in turn, measured by the accuracy of their responses. When GenAI apps deliver precise, fact-based responses, users can rely on them for insightful, accurate information. This is why we are excited to announce Generally Availability of the new Semantic Ranking Solution Accelerator for Azure Database for PostgreSQL, offering a powerful boost to the accuracy of GenAI apps’ information retrieval pipeline by reranking vector search results with semantic ranking models.
Addressing the problem of accuracy
GenAI has come far, but accuracy remains a challenge. Retrieval Augmented Generation (RAG) helps by grounding responses in factual data, but as datasets grow or when documents are too similar to each other, its vector search can falter, leading to users losing trust and the promise of improved productivity evaporating. Improving accuracy requires optimizing the information retrieval pipeline with techniques ranging from general methods like chunking, larger embeddings, and hybrid search, to advanced, dataset-specific approaches like semantic ranking, RAPTOR summarization, and GraphRAG. The effectiveness of these techniques depends on the dataset, so investing in a robust evaluation framework is critical as well.
In this blog we focus on the semantic ranking technique - one of the more universally applicable techniques - and discuss details of the provided Solution Accelerator for Azure Database for PostgreSQL. In another accompanying blog post we dive deep into another powerful technique, GraphRAG, where we also provide a Solution Accelerator: Introducing GraphRAG Solution for Azure Database for PostgreSQL.
An overview of the solution accelerator
This solution accelerator is designed to extend your PostgreSQL instance on Azure with the ability to perform semantic ranking directly in the SQL query language. The solution accelerator provides two components:
- Automated Deployment Script: This script provisions the Semantic Ranker model as an Azure Machine Learning (AML) inference endpoint in your subscription.
- SQL Integration: A SQL User Defined Function (UDF) integrates the Semantic Ranker model directly into SQL queries. The function makes use of the azure_ai extension to make remote calls to the AML inference endpoint.
The architecture of the Solution Accelerator is shown below:
Semantic ranking
Semantic ranker works by comparing two strings of text: the search query and the text of one of the items it is searching over. The ranker produces a relevance score indicating whether these two text strings are relevant to each other, or, in other words, if the text holds an answer to the query. The semantic ranker is a machine learning model. Usually, it is one of the variants of the BERT language model fine-tuned to perform the ranking task, as illustrated below. The ranker model can also be an LLM. The ranker model takes as input two strings and outputs one relevance score, usually a number in the range of 0 to 1. For that reason, this type of model is also called a cross-encoder model.
Compared to vector search, which simply measures vector similarity between two vector embeddings, the semantic ranker model goes down to the level of the actual text and performs deeper analysis of the semantic relevance between two text strings. This gives the semantic ranker a potential to produce more accurate results. The actual accuracy of the semantic ranker model is dependent on its size, what data it was fine-tuned on and how compatible it is with the dataset it is being used on. For this Solution Accelerator we benchmarked open-source semantic ranker models to pick the best one for deployment.
Benchmarking open-source semantic ranker models
There are many OSS semantic ranker models available. For testing we selected 3 of the strongest models with permissive and commercially friendly licenses.
Ranker model name |
Developer of the model |
Model size |
Base model type |
Average latency of ranking 100 query-item pairs |
BGE-reranker-large |
Beijing Academy of Artificial Intelligence |
560M |
xml-roberta-large |
0.12 sec |
BGE-reranker-v2-m3 |
Beijing Academy of Artificial Intelligence |
568M |
bge-m3 |
0.15 sec |
mxbai-rerank-large-v1 |
Mixedbread.AI |
435M |
NA |
0.06 sec |
We tested the models on 9 datasets of different types from BEIR benchmark. First, we retrieve 100 items using vector search (HNSW algorithm with Azure Open AI text-embedding-v3-small embeddings with 1536 dimensions) and then apply ranker models to re-rank results and retrieve top 10 most relevant items out of 100. You can see the results in the table below.
Dataset |
Dataset size |
Vector search |
BGE-reranker-large |
BGE-reranker-v2-m3 |
mxbai-rerank-large-v1 |
Perfect Ranker |
dbpedia-entity |
400 |
37.3 |
41.5 |
44.2 |
46.3 |
74.0 |
climate_fever |
1535 |
25.8 |
36.9 |
32.5 |
25.9 |
66.0 |
arguana |
1406 |
40.0 |
23.6 |
56.4 |
18.7 |
98.4 |
scidocs |
1000 |
19.2 |
15.4 |
16.4 |
18.7 |
53.6 |
webis-touche2020 |
49 |
30.5 |
19.2 |
40.3 |
33.8 |
81.5 |
trec_covid |
50 |
77.7 |
76.9 |
82.7 |
84.7 |
1.0 |
nq |
3452 |
46.3 |
31.1 |
57.1 |
54.5 |
87.4 |
fever |
6666 |
72.7 |
85.4 |
83.3 |
78.8 |
91.4 |
hotpotqa |
7405 |
56.1 |
70.1 |
70.0 |
62.1 |
74.5 |
Average |
54.4 |
60.2 |
66.3 |
58.4 |
81.7 |
|
Gain over |
+0.0 |
+5.8 |
+11.9 |
+4.0 |
+27.3 |
We use the NDCG metric to measure accuracy of the retrieval process. On average, BGE-reranker-v2-m3 achieves the best results by improving on vector search accuracy by 11 NDCG percentage points. This is a significant improvement, as ideal search results are represented by an NDCG value of 100 (we scaled values from the 0-1 range to the 0-100 range).
You can also notice that results are heavily dependent on the dataset. BGE-reranker-v2-m3 produces the best results only on 3 out of 9 of them, but brings the good results on large and challenging datasets, which results in a significantly better average result. This reinforces the importance of measuring accuracy of GenAI apps and retrieval systems on your own data.
We also provide a Perfect Ranker score to illustrate the potential of ranker models to improve relevance over the vector search results. It estimates what the metric would be if the top 10 results returned by the ranker were the best available in the 100 items returned by the vector search. As you can see, there is a significant potential still available for stronger ranker models. By improving ranking performance, they can improve NDCG@10 by up to 27 percentage points.
Based on these results, we configured this Solution Accelerator to deploy the BGE-reranker-v2-m3 model by default. But you can easily adjust that setting, as all of the ranker models listed here are available from the Hugging Face model catalog.
Getting started with semantic ranker on Azure Database for PostgreSQL
It’s easy to extend your Postgres instance on Azure with semantic ranker capabilities using the provided Solution Accelerator.
Step 1: Deploy Semantic Ranker model
Clone the Solution Accelerator repo: aka.ms/pg-ranker-repo and use Azure Developer CLI to deploy a ranker model as an AML inference endpoint. From the cloned repo folder:
azd up
./deploy_model.sh
Please refer to the Solution Accelerator repo for the instructions on how to install Azure Developer CLI tool in the first place.
Step 2: Connect your instance of Azure Database for PostgreSQL to the semantic ranker model
psql -f setup_azure_ai.sql
This script will prompt you to enter the URL of your Postgres instance and will take steps to connect it to the semantic ranker model:
- Enable the azure_ai and vector extension.
- Configure azure_ai credentials in the database to connect it to OpenAI embedding model and semantic ranker model endpoints.
Step 3: Perform semantic ranking directly in your Postgres SQL query
Now that your Postgres instance is connected to the ranker model endpoint, you can perform semantic ranking in SQL. The example below shows how to perform semantic re-ranking of the vector search results for the “Latest news on artificial intelligence” query over the CNN news using the user defined function semantic_reranking provided by the solution accelerator:
WITH vector_results AS (
SELECT article
FROM cnn_daily c
ORDER BY embedding <=> azure_openai.create_embeddings('text-embedding-3-small',
'Latest news on artificial intelligence')::vector
LIMIT 10
)
SELECT article, relevance
FROM semantic_reranking('Latest news on artificial intelligence',
ARRAY(SELECT article from vector_results))
ORDER BY relevance DESC
LIMIT 3;
The full demo script is available in the solution accelerator repo: semantic_reranker_demo.sql.
Conclusion
The semantic ranker solution accelerator for Azure Database for PostgreSQL enables a significant improvement in the accuracy of the information retrieval pipelines of Generative AI apps. By leveraging the power of semantic ranking, businesses can achieve unprecedented accuracy in data retrieval and ensure success of their Generative AI investments. As this technology continues to evolve, it promises to unlock new opportunities and drive GenAI innovation across various sectors.
We encourage businesses to take advantage of this Solution Accelerator to explore the capabilities of the semantic ranker model and experience firsthand how it can revolutionize their Generative AI apps. Stay tuned for further updates and enhancements as we continue to refine and expand this solution.
Updated Nov 19, 2024
Version 2.0maxluk
Microsoft
Joined May 20, 2024
Azure Database for PostgreSQL Blog
Follow this blog board to get notified when there's new activity