Best Practices for using Rerank
Optimizing Performance
In the following table, you’ll find recommendations for getting the best Rerank performance.
Rerank-v3.5 and Rerank-v3.0
Document Chunking
For rerank-v3.5
and rerank-v3.0
, the model breaks documents into 4093 token chunks. For example, if your query is 100 tokens and your document is 10,000 tokens, your document will be broken into the following chunks:
relevance_score_1 = <padding_tokens, query[0,99], document[0,3992]>
relevance_score_2 = <padding_tokens, query[0,99], document[3993,7985]>
relevance_score_3 = <padding_tokens, query[0,99], document[7986,9999]>
relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)
If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.
Max Number of Documents
When using rerank-v3.5
and rerank-v3.0
models, the endpoint will throw an error if the user attempts to pass more than 10,000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >10,000
.
If Number of documents * max_chunks_per_doc
exceeds 10,000
, the endpoint will return an error. By default, the max_chunks_per_doc
is set to 1
for rerank
models.
Queries
Our rerank-v3.5
and rerank-v3.0
models are trained with a context length of 4096 tokens. The model takes both the query and the document into account when calculating against this limit, and the query can account for up to half of the full context length. If your query is larger than 2048 tokens, in other words, it will be truncated to the first 2048 tokens (leaving the other 2048 for the document(s)).
Semi-Structured Data Support
Our rerank-v3.5
and rerank-v3.0
models support semi-structured data reranking through a list of JSON objects. The rank_fields
parameter will default to a field parameter called text
unless otherwise specified. If the rank_fields
parameter is unspecified and none of your JSON objects have a text
field, the endpoint will return an error.
Looking at the example above, passing in rank_fields=["Title","Content"]
would mean the model considers both the title and content for ranking. The rank_fields
parameter is not commutative, which means rank_fields=["Title","Content"]
can lead to different results than rank_fields=["Content","Title"]
.
Interpreting Results
The most important output from the Rerank API endpoint is the absolute rank exposed in the response object. The score is query dependent, and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.
Relevance scores are normalized to be in the range [0, 1]
. Scores close to 1
indicate a high relevance to the query, and scores closer to 0
indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:
- Select a set of 30-50 representative queries
Q=[q_0, … q_n]
from your domain. - For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs:
sample_inputs=[(q_0, d_0), …, (q_n, d_n)]
. - Pass all tuples in
sample_inputs
through the rerank endpoint in a loop, and gather relevance scoressample_scores=[s0, ..., s_n]
.
The average of sample_scores
can then be used as a reference when deciding a threshold for filtering out irrelevant documents.