Best Practices for using Rerank

Optimizing Performance

In the following table, you’ll find recommendations for getting the best Rerank performance.

Rerank-v3.5 and Rerank-v3.0

Constraint	Minimum	Maximum	Default Value
Number of Documents	1	10,000	N/A
Max Number of Chunks	1	N/A	1
Number of Tokens per Document	1	N/A (see below for more info)	N/A
Number of Tokens per Query	1	2048	N/A

Document Chunking

For rerank-v3.5 and rerank-v3.0, the model breaks documents into 4093 token chunks. For example, if your query is 100 tokens and your document is 10,000 tokens, your document will be broken into the following chunks:

relevance_score_1 = <padding_tokens, query[0,99], document[0,3992]>
relevance_score_2 = <padding_tokens, query[0,99], document[3993,7985]>
relevance_score_3 = <padding_tokens, query[0,99], document[7986,9999]>
relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)

If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.

Max Number of Documents

When using rerank-v3.5 and rerank-v3.0 models, the endpoint will throw an error if the user attempts to pass more than 10,000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >10,000.

If Number of documents * max_chunks_per_doc exceeds 10,000, the endpoint will return an error. By default, the max_chunks_per_doc is set to 1 for rerank models.

Queries

Our rerank-v3.5 and rerank-v3.0 models are trained with a context length of 4096 tokens. The model takes both the query and the document into account when calculating against this limit, and the query can account for up to half of the full context length. If your query is larger than 2048 tokens, in other words, it will be truncated to the first 2048 tokens (leaving the other 2048 for the document(s)).

Semi-Structured Data Support

Our rerank-v3.5 and rerank-v3.0 models support semi-structured data reranking through a list of JSON objects. The rank_fields parameter will default to a field parameter called text unless otherwise specified. If the rank_fields parameter is unspecified and none of your JSON objects have a text field, the endpoint will return an error.

JSON

1 [
2 	{
3 		"Title": "How to fix a dishwasher",
4 		"Author": "John Smith",
5 		"Date":"August 1st 2023",
6 		"Content": "Fixing a dishwasher depends on the specific problem you're facing. Here are some common issues and their potential solutions:...."
7 	},
8 	{
9 		"Title": "How to fix a leaky sink",
10 		"Date": "July 25th 2024",
11 		"Content": "Fixing a leaky sink will depend on the source of the leak. Here are general steps you can take to address common types of sink leaks:....."
12 	}
13 ]

Looking at the example above, passing in rank_fields=["Title","Content"] would mean the model considers both the title and content for ranking. The rank_fields parameter is not commutative, which means rank_fields=["Title","Content"] can lead to different results than rank_fields=["Content","Title"].

Interpreting Results

The most important output from the Rerank API endpoint is the absolute rank exposed in the response object. The score is query dependent, and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.

[
	RerankResult<text: Ottawa, index: 1, relevance_score: 0.9109375>,
	RerankResult<text: Toronto, index: 2, relevance_score: 0.7128906>,
	RerankResult<text: Ontario, index: 3, relevance_score: 0.04421997>
]

Relevance scores are normalized to be in the range [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:

Select a set of 30-50 representative queries Q=[q_0, … q_n] from your domain.
For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs: sample_inputs=[(q_0, d_0), …, (q_n, d_n)] .
Pass all tuples in sample_inputs through the rerank endpoint in a loop, and gather relevance scores sample_scores=[s0, ..., s_n] .

The average of sample_scores can then be used as a reference when deciding a threshold for filtering out irrelevant documents.