Reranking Best Practices
Optimizing Performance
Cohere recommends the following tips for optimal endpoint performance:
Constraint | Minimum | Maximum |
---|---|---|
Number of Documents | 1 | 1000 |
Number of Tokens per Document | 1 | N/A (see below for more info) |
Number of Tokens per Query | 1 | 256 |
Document Chunking
Cohere breaks documents int 510 token chunks. For example, if your query is 50 tokens and your document is 1024 tokens, your document will be broken into the following chunks:
relevance_score_1 = <query[0,50], document[0,460]
relevance_score_2 = <query[0,50], document[460,920]
relevance_score_3 = <query[0,50],document[920,1024]
relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)
If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.
Queries
Our models are trained with a context length of 510 tokens - the model takes into account both the input from the query and document. If your query is larger than 256 tokens, it will be truncated to the first 256 tokens.
Interpreting Results
The most important output from co.rerank() is the absolute rank that is exposed in the response object. The score is query dependent and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Toronto.
[
RerankResult<text: Ottawa, index: 1, relevance_score: 0.9109375>,
RerankResult<text: Toronto, index: 2, relevance_score: 0.7128906>,
RerankResult<text: Ontario, index: 3, relevance_score: 0.04421997>
]
Relevance scores are normalized to be in [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to zero indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:
- Select a set of 30-50 representative queries
Q=[q_0, … q_n]
from your domain. - For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs
sample_inputs=[(q_0, d_0), …, (q_n, d_n)]
. - Pass all tuples in
sample_inputs
through the rerank endpoint in a loop, and gather relevance scoressample_scores=[s0, ..., s_n]
.
The average of sample_scores
can then be used as a reference when deciding a threshold for filtering out irrelevant documents.
Updated 29 days ago