ReRanking

Introduction

In previous chapters, you learned keyword search and dense retrieval, and you were able to apply them by querying a large Wikipedia article dataset. You noticed that keyword search performed well with some queries, and not so well with others. Dense retrieval, on the other hand, worked well with all the queries.

For both, keyword search and dense retrieval, and in fact, for any other search mechanism we use, there is a very powerful method called ReRank, which can enhance it. ReRank works as follows: For each pair (query, response), it assigns a relevance score. As the name hints, relevance scores are high for pairs in which the response is relevant to the query, and low otherwise. In this chapter, you’ll learn how to use Reranking to improve the wikipedia search results you found previously in this module.

Colab Notebook

This chapter uses the same Colab notebook, as the previous chapter, and we encourage you to follow it along as you read the chapter.

Using Rerank to Improve Keyword Search

Rerank is a very powerful method which can significantly boost any existing search system. In short, rerank takes a query and a response, and outputs a relevance score between them. In that way, one can use any search system to surface a number of documents that can potentially contain the answer to a query, and then sort them using Rerank.

The results from any search system get reranked based on their relevance to the query

The results from any search system get reranked based on their relevance to the query

Remember that the results we obtained for the query “Who was the first person to win two Nobel prizes” using the keyword_search function were the following (for the full text, please check out the Colab notebook):

Query: “Who was the first person to win two Nobel prizes?”

Responses:

  • Neutrino
  • Western culture
  • Reality television

These could contain the answer somewhere in the document, but they are certainly not the best documents for this query. Let’s dig in a bit more, and find the first 100 results. To save space, I’ll only note the top 20 titles.

  1. Neutrino
  2. Western culture
  3. Reality television
  4. Peter Mullan
  5. Indiana Pacers
  6. William Regal
  7. Nobel Prize
  8. Nobel Prize
  9. Nobel Prize
  10. Noble gas
  11. Nobel Prize in Literature
  12. D.C. United
  13. Nobel Prize in Literature
  14. 2021-2022 Manchester United F.C. season
  15. Nobel Prize
  16. Nobel Prize
  17. Zach LaVine
  18. 2011 Formula One World Championship
  19. 2021-2022 Manchester United F.C. season
  20. Christians

Ok, there’s a high chance that the answer is there. Let’s see if Rerank can help us find it. The following function calls the Rerank endpoint. Its inputs are the query, the responses, and the number of responses we’d like to retrieve.

def rerank_responses(query, responses, num_responses=3):
    reranked_responses = co.rerank(
        query = query,
        documents = responses,
        top_n = num_responses,
        model = 'rerank-english-v3.0',
        return_documents=True
    )
    return reranked_responses

Rerank will output the result, as well as the relevance score. Let’s look at the top 3 results.

Query: “Who was the first person to win two Nobel prizes?”

Responses:

  • Nobel Prize: “Five people have received two Nobel Prizes. Marie Curie received the …”
    Relevance score: 1.00
  • Nobel Prize: “In terms of the most prestigious awards in STEM fields, only a small …”
    Relevance score: 0.97
  • Nobel Prize in Literature: “There are also prizes for honouring the lifetime achievement of writers …”
    Relevance score: 0.87

Well, that certainly improved the keyword search results! Even though the third result doesn’t work, the first two retrieved the correct article that contains the answer. Notice that the relevance score for both is close to 1.

Conclusion

ReRank is a very useful method to find the most relevant responses to a particular query. It is very useful as a way to improve keyword search for dense retrieval. In this lab, we used it to vastly improve the results of keyword search, by first using keyword search to retrieve 100 potential documents that may contain the answer, and then using ReRank to retrieve the top 3 among those. We encourage you to try ReRank to improve the other searches we performed in the previous labs, and check your results!

Original Source

This material comes from the post Using LLMs for Search with Dense Retrieval and Reranking.


What’s Next

Now that you know how to search, learn to combine a search system with a generative model, in order to answer questions in sentence format!