Microsoft Developer Community Blog

4 MIN READ

Advanced RAG with Azure Cosmos DB and Cohere Rerank 3.5

Paul_VicenteH

Microsoft

May 20, 2025

The effectiveness of a Retrieval-Augmented Generation (RAG) application depends significantly on its search capabilities, which can often become a single point of failure. High-quality search ensures that the most relevant information is retrieved, directly influencing the application's performance and user satisfaction. Vector search—particularly when implemented using Azure Cosmos DB’s NoSQL API and models such as text-embeddings-3-small or text-embeddings-3-large—can deliver strong results. Additionally, incorporating a reranking step using the Cohere Rerank 3.5 model can further refine search results. This approach simplifies rerank implementation and enhances the relevance of retrieved information, improving the overall quality of RAG applications.

Vector Search

Vector search enables querying large datasets by comparing vector representations of data points. It is particularly useful in applications such as recommendation systems, image search, and natural language processing. Reranking the search results can further improve the relevance and accuracy of the retrieved information.

Why Rerank Results?

Reranking enhances the quality of search outcomes by adjusting the order of results based on additional criteria such as relevance or user preferences. This ensures that the most pertinent results are prioritised, improving user engagement and satisfaction.

What Is a Rerank Model?

A rerank model is an algorithm that refines initial search results by evaluating various factors beyond the basic query. These models often use machine learning to assess the relevance of each result, delivering a more tailored and accurate search experience.

Value Proposition and Problem Solving

Implementing vector search and rerank models in Azure Cosmos DB offers several advantages:
• Improved Search Accuracy: Vector representations capture semantic similarities, leading to more relevant results.
• Scalability: The NoSQL API in Azure Cosmos DB efficiently handles large datasets, ensuring fast and reliable performance.
• Customisation: Rerank models can be adapted to specific business needs, improving the overall user experience.

Example: Vectorising Data Using text-embeddings-3-small and Cosmos DB

The following example uses the text-embeddings-3-small model to vectorise data stored in Azure Cosmos DB.

Process Overview:

Process overview of document ingestion using embeddings in Azure Cosmos Db

The HotpotQA dataset was used as test data. This dataset is designed for multi-hop reasoning, where each question requires synthesising information from multiple documents. A reduced version containing 100,000 documents was used. Sample questions were selected, and relevant corpora were retained to maintain dataset integrity while making it more manageable.

HotpotQA includes:

A corpus dataset (with identifiers, titles, and text),
A list of questions,
A mapping dataset linking questions to relevant corpora.

Example corpus structure (Python dictionary):

{
  '12': {'text': 'Anarchism is a political philosophy ...', 'title': 'Anarchism'},
  '25': {'text': 'Autism is a neurodevelopmental disorder ...', 'title': 'Autism'},
  '39': {'text': 'Albedo (...) is a measure for ...', 'title': 'Albedo'}
}

Document design is straightforward: use the corpus ID as the document ID, include fields for text and title, and vectorise the concatenated title and text. Example document in the database:

{
  "id": "25",
  "text": "Autism is a neurodevelopmental disorder ...",
  "title": "Autism",
  "vectorized_text": [0.00988, -0.00505, 0.05237, 0.01458, -0.03818, 0.00907]
}

Evaluating Vector Search

In a typical RAG scenario, the top n results from a search are used. If documents are chunked, the top n chunks closest to the input question are selected. However, language models have token limits, so typically only 3–10 chunks are included—sometimes up to 100 if feasible.

Evaluation Example 1

Question: The director of the romantic comedy Big Stone Gap is based in what New York city?

Required corpora:

Big Stone Gap (film): mentions Adriana Trigiani as the director.
Adriana Trigiani: states she is based in Greenwich Village, NYC.

The Big Stone Gap corpus appears first, but Adriana Trigiani ranks 16th—outside the top 10—preventing a correct answer unless more results are included.

Top 10 results for the example 1

Evaluation Example 2

Question: What government position was held by the woman who portrayed Corliss Archer in the film Kiss and Tell?

Required corpora:

Kiss and Tell (1945 film)
Shirley Temple

While the first corpus ranks first, Shirley Temple ranks 273rd—far too low to be included in typical result sets. As the knowledge base grows, retrieving the right information becomes increasingly difficult.

The corpus with title "Shirley Temple" appears in position 273

Rerank to the rescue

Reranking improves accuracy by reordering results based on relevance. To use Cohere Rerank 3.5, provision it as a Pay-As-You-Go API in Azure AI Foundry. This provides an endpoint and API key for integration:

Cohere Rerank 3.5 in Azure AI Foundry

This is how the optimized RAG application looks like:

Vector Search with Re-ranked Results

Using the Cohere Python SDK, the top 300 results were reranked for evaluation:

With the Python SDK calling the reranking service looks like:

Rerank Evaluation Results

Question 1: Big Stone Gap director’s NYC location

Adriana Trigiani moved from 16th to 12th position—making it more likely to be included.

Question 2: Shirley Temple’s government role

Shirley Temple moved from 273rd to 5th position—making a correct answer feasible.

Conclusion

Integrating vector search and reranking models in Azure Cosmos DB using the NoSQL API can enhance search accuracy and user satisfaction. By leveraging advanced techniques such as text embeddings and machine learning, organisations can deliver more relevant and personalised search experiences.