Reranking Best Practices

Optimizing Performance

Cohere recommends the following tips for optimal endpoint performance:

Number of Documents11000
Number of Tokens per Document1N/A (see below for more info)
Number of Tokens per Query1256

Document Chunking

Cohere breaks documents into 512 token chunks. For example, if your query is 50 tokens and your document is 1024 tokens, your document will be broken into the following chunks:

  1. relevance_score_1 = <query[0,50], document[0,460]
  2. relevance_score_2 = <query[0,50], document[460,920]
  3. relevance_score_3 = <query[0,50],document[920,1024]
  4. relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)

If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.

Max Number of Documents

By default, the endpoint will error if the user tries to pass more than 1000 documents at a time because max_chunks_per_doc has a default of 10. The way we calculate the maximum number of documents that can be passed to the endpoint is with this inequality: Number of documents * max_chunks_per_doc >10,000. If Number of documents * max_chunks_per_doc exceeds 10,000, the endpoint will return an error.


Our models are trained with a context length of 510 tokens - the model takes into account both the input from the query and document. If your query is larger than 256 tokens, it will be truncated to the first 256 tokens.

Interpreting Results

The most important output from co.rerank() is the absolute rank that is exposed in the response object. The score is query dependent and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.

	RerankResult<text: Ottawa, index: 1, relevance_score: 0.9109375>, 
	RerankResult<text: Toronto, index: 2, relevance_score: 0.7128906>, 
  RerankResult<text: Ontario, index: 3, relevance_score: 0.04421997>

Relevance scores are normalized to be in [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to zero indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:

  • Select a set of 30-50 representative queries Q=[q_0, … q_n] from your domain.
  • For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs sample_inputs=[(q_0, d_0), …, (q_n, d_n)] .
  • Pass all tuples in sample_inputs through the rerank endpoint in a loop, and gather relevance scores sample_scores=[s0, ..., s_n] .

The average of sample_scores can then be used as a reference when deciding a threshold for filtering out irrelevant documents.