Rerank Best Practices
Optimizing Performance
In the following two tables, you’ll find recommendations for getting the best Rerank performance, organized by model family.
Rerank-v3.0
Rerank-v2.0
Document Chunking
For rerank-v3.0
, the model breaks documents into 4094 token chunks. For example, if your query is 100 tokens and your document is 10,000 tokens, your document will be broken into the following chunks:
relevance_score_1 = <query[0,99], document[0,3994]>
relevance_score_2 = <query[0,99], document[3995,7989]>
relevance_score_3 = <query[0,99],document[7990,9999]>
relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)
If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.
Max Number of Documents
Rerank-v3.0 Models
When using rerank-v3.0
models, the endpoint will throw an error if the user tries to pass more than 1000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >1000
.
If Number of documents * max_chunks_per_doc
exceeds 1000
, the endpoint will return an error. By default, the max_chunks_per_doc
is set to 1
for rerank-v3.0
models; given that the model has a context length of 4096, the maximum number of tokens for each call would be 4,096,000.
Rerank-v2.0 Models
When using rerank-v2.0
, the endpoint will throw an error if the user tries to pass more than 10,000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >10,000
.
If Number of documents * max_chunks_per_doc
exceeds 10,000
, the endpoint will return an error. By default, the max_chunks_per_doc
is set to 10
for rerank-v2.0
models; given that the model has a context length of 512, the maximum number of tokens for each call would be 5,120,000.
Queries
Our rerank-v3.0
models are trained with a context length of 4096 tokens. The model takes into account both the input from the query and document. If your query is larger than 2048 tokens, it will be truncated to the first 2048 tokens. For v2.0 models, if your query is larger than 256 tokens, it will be truncated to the first 256 tokens.
Semi-Structured Data Support
Our rerank-v3.0
models support semi-structured data reranking through a list of JSON objects. The rank_fields
parameter will default to a field parameter called text
unless otherwise specified. If the rank_fields
parameter is unspecified and none of your JSON objects have a text
field, the endpoint will return an error.
Looking at the example above, passing in rank_fields=["Title","Content"]
would mean the model considers both the title and content for ranking. The rank_fields
parameter is not commutative, which means rank_fields=["Title","Content"]
can lead to different results than rank_fields=["Content","Title"]
.
Interpreting Results
The most important output from the Rerank API endpoint is the absolute rank exposed in the response object. The score is query dependent, and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.
Relevance scores are normalized to be in the range [0, 1]
. Scores close to 1
indicate a high relevance to the query, and scores closer to 0
indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:
- Select a set of 30-50 representative queries
Q=[q_0, … q_n]
from your domain. - For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs:
sample_inputs=[(q_0, d_0), …, (q_n, d_n)]
. - Pass all tuples in
sample_inputs
through the rerank endpoint in a loop, and gather relevance scoressample_scores=[s0, ..., s_n]
.
The average of sample_scores
can then be used as a reference when deciding a threshold for filtering out irrelevant documents.