Rerank Best Practices
Document Chunking
Under the hood, the Rerank API turns user input into text chunks. Every chunk will include the query
and a portion of the document text. Chunk size depends on the model.
For example, if
- the selected model is
rerank-v3.5
, which has context length (aka max chunk size) of 4096 tokens - the query is 100 tokens
- there is one document and it is 10,000 tokens long
- document truncation is disabled by setting
max_tokens_per_doc
parameter to 10,000 tokens
Then the document will be broken into the following three chunks:
And the final relevance score for that document will be computed as the highest score among those chunks:
If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.
Queries
Our rerank-v3.5
and rerank-v3.0
models are trained with a context length of 4096 tokens. The model takes both the query and the document into account when calculating against this limit, and the query can account for up to half of the full context length. If your query is larger than 2048 tokens, in other words, it will be truncated to the first 2048 tokens (leaving the other 2048 for the document(s)).
Structured Data Support
Our Rerank models support reranking structured data formatted as a list of YAML strings. Note that since long document strings get truncated, the order of the keys is especially important. When constructing the YAML string from a dictionary, make sure to maintain the order. In Python that is done by setting sort_keys=False
when using yaml.dump
.
Example:
Interpreting Results
The most important output from the Rerank API endpoint is the absolute rank exposed in the response object. The score is query dependent, and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.
Relevance scores are normalized to be in the range [0, 1]
. Scores close to 1
indicate a high relevance to the query, and scores closer to 0
indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:
- Select a set of 30-50 representative queries
Q=[q_0, … q_n]
from your domain. - For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs:
sample_inputs=[(q_0, d_0), …, (q_n, d_n)]
. - Pass all tuples in
sample_inputs
through the rerank endpoint in a loop, and gather relevance scoressample_scores=[s0, ..., s_n]
.
The average of sample_scores
can then be used as a reference when deciding a threshold for filtering out irrelevant documents.