Cohere recommends the following tips for optimal endpoint performance:
|Number of Documents||1||1000|
|Number of Tokens per Document||1||N/A (see below for more info)|
|Number of Tokens per Query||1||256|
Cohere breaks documents into 512 token chunks. For example, if your query is 50 tokens and your document is 1024 tokens, your document will be broken into the following chunks:
relevance_score_1 = <query[0,50], document[0,460]
relevance_score_2 = <query[0,50], document[460,920]
relevance_score_3 = <query[0,50],document[920,1024]
relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)
If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.
By default, the endpoint will error if the user tries to pass more than 1000 documents at a time because
max_chunks_per_doc has a default of 10. The way we calculate the maximum number of documents that can be passed to the endpoint is with this inequality:
Number of documents * max_chunks_per_doc >10,000. If
Number of documents * max_chunks_per_doc exceeds
10,000, the endpoint will return an error.
Our models are trained with a context length of 510 tokens - the model takes into account both the input from the query and document. If your query is larger than 256 tokens, it will be truncated to the first 256 tokens.
The most important output from co.rerank() is the absolute rank that is exposed in the response object. The score is query dependent and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.
[ RerankResult<text: Ottawa, index: 1, relevance_score: 0.9109375>, RerankResult<text: Toronto, index: 2, relevance_score: 0.7128906>, RerankResult<text: Ontario, index: 3, relevance_score: 0.04421997> ]
Relevance scores are normalized to be in [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to zero indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:
- Select a set of 30-50 representative queries
Q=[q_0, … q_n]from your domain.
- For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs
sample_inputs=[(q_0, d_0), …, (q_n, d_n)].
- Pass all tuples in
sample_inputsthrough the rerank endpoint in a loop, and gather relevance scores
sample_scores=[s0, ..., s_n].
The average of
sample_scores can then be used as a reference when deciding a threshold for filtering out irrelevant documents.
Updated about 1 month ago