Reranking

Introduction

In previous chapters, you learned keyword search and dense retrieval, and you were able to apply them by querying a large Wikipedia article dataset. You noticed that keyword search performed well with some queries, and not so well with others. Dense retrieval, on the other hand, worked well with all the queries.

For both, keyword search and dense retrieval, and in fact, for any other search mechanism we use, there is a very powerful method called ReRank, which can enhance it. ReRank works as follows: For each pair (query, response), it assigns a relevance score. As the name hints, relevance scores are high for pairs in which the response is relevant to the query, and low otherwise. In this chapter, you’ll learn how to use Reranking to improve the wikipedia search results you found previously in this module.

Colab Notebook

This chapter uses the same Colab notebook, as the previous chapter, and we encourage you to follow it along as you read the chapter.

Using Rerank to Improve Keyword Search

Rerank is a very powerful method which can significantly boost any existing search system. In short, rerank takes a query and a response, and outputs a relevance score between them. In that way, one can use any search system to surface a number of documents that can potentially contain the answer to a query, and then sort them using Rerank.

The results from any search system get reranked based on their relevance to the query

Remember that the results we obtained for the query “Who was the first person to win two Nobel prizes” using the keyword_search function were the following (for the full text, please check out the Colab notebook):

Query: “Who was the first person to win two Nobel prizes?”

Responses:

Neutrino
Western culture
Reality television

These could contain the answer somewhere in the document, but they are certainly not the best documents for this query. Let’s dig in a bit more, and find the first 100 results. To save space, I’ll only note the top 20 titles.

Neutrino
Western culture
Reality television
Peter Mullan
Indiana Pacers
William Regal
Nobel Prize
Nobel Prize
Nobel Prize
Noble gas
Nobel Prize in Literature
D.C. United
Nobel Prize in Literature
2021-2022 Manchester United F.C. season
Nobel Prize
Nobel Prize
Zach LaVine
2011 Formula One World Championship
2021-2022 Manchester United F.C. season
Christians

Ok, there’s a high chance that the answer is there. Let’s see if Rerank can help us find it. The following function calls the Rerank endpoint. Its inputs are the query, the responses, and the number of responses we’d like to retrieve.

def rerank_responses(query, responses, num_responses=3):
    reranked_responses = co.rerank(
        query = query,
        documents = responses,
        top_n = num_responses,
        model = 'rerank-english-v3.0',
        return_documents=True
    )
    return reranked_responses

Rerank will output the result, as well as the relevance score. Let’s look at the top 3 results.

Query: “Who was the first person to win two Nobel prizes?”

Responses:

Nobel Prize: “Five people have received two Nobel Prizes. Marie Curie received the …”
Relevance score: 1.00
Nobel Prize: “In terms of the most prestigious awards in STEM fields, only a small …”
Relevance score: 0.97
Nobel Prize in Literature: “There are also prizes for honouring the lifetime achievement of writers …”
Relevance score: 0.87

Well, that certainly improved the keyword search results! Even though the third result doesn’t work, the first two retrieved the correct article that contains the answer. Notice that the relevance score for both is close to 1.

Conclusion

ReRank is a very useful method to find the most relevant responses to a particular query. It is very useful as a way to improve keyword search for dense retrieval. In this lab, we used it to vastly improve the results of keyword search, by first using keyword search to retrieve 100 potential documents that may contain the answer, and then using ReRank to retrieve the top 3 among those. We encourage you to try ReRank to improve the other searches we performed in the previous labs, and check your results!