Optimizing Performance

In the following two tables, you’ll find recommendations for getting the best Rerank performance, organized by model family.

Rerank-v3.0

ConstraintMinimumMaximumDefault Value
Number of Documents11000N/A
Max Number of Chunks1N/A1
Number of Tokens per Document1N/A (see below for more info)N/A
Number of Tokens per Query12048N/A

Rerank-v2.0

ConstraintMinimumMaximumDefault Value
Number of Documents110,000N/A
Max Number of Chunks1N/A10
Number of Tokens per Document1N/A (see below for more info)N/A
Number of Tokens per Query1256N/A

Document Chunking

For rerank-v3.0, the model breaks documents into 4094 token chunks. For example, if your query is 100 tokens and your document is 10,000 tokens, your document will be broken into the following chunks:

  1. relevance_score_1 = <query[0,99], document[0,3994]>
  2. relevance_score_2 = <query[0,99], document[3995,7989]>
  3. relevance_score_3 = <query[0,99],document[7990,9999]>
  4. relevance_score = max(relevance_score_1, relevance_score_2, relevance_score_3)

If you would like more control over how chunking is done, we recommend that you chunk your documents yourself.

Max Number of Documents

Rerank-v3.0 Models

When using rerank-v3.0 models, the endpoint will throw an error if the user tries to pass more than 1000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >1000.

If Number of documents * max_chunks_per_doc exceeds 1000, the endpoint will return an error. By default, the max_chunks_per_doc is set to 1 for rerank-v3.0 models; given that the model has a context length of 4096, the maximum number of tokens for each call would be 4,096,000.

Rerank-v2.0 Models

When using rerank-v2.0, the endpoint will throw an error if the user tries to pass more than 10,000 documents at a time. The maximum number of documents that can be passed to the endpoint is calculated with the following inequality: Number of documents * max_chunks_per_doc >10,000.

If Number of documents * max_chunks_per_doc exceeds 10,000, the endpoint will return an error. By default, the max_chunks_per_doc is set to 10 for rerank-v2.0 models; given that the model has a context length of 512, the maximum number of tokens for each call would be 5,120,000.

Queries

Our rerank-v3.0 models are trained with a context length of 4096 tokens. The model takes into account both the input from the query and document. If your query is larger than 2048 tokens, it will be truncated to the first 2048 tokens. For v2.0 models, if your query is larger than 256 tokens, it will be truncated to the first 256 tokens.

Semi-Structured Data Support

Our rerank-v3.0 models support semi-structured data reranking through a list of JSON objects. The rank_fields parameter will default to a field parameter called text unless otherwise specified. If the rank_fields parameter is unspecified and none of your JSON objects have a text field, the endpoint will return an error.

JSON
1[
2 {
3 "Title":"How to fix a dishwasher"
4 "Author":"John Smith"
5 "Date":"August 1st 2023"
6 "Content": "Fixing a dishwasher depends on the specific problem you're facing. Here are some common issues and their potential solutions:...."
7 },
8 {
9 "Title":"How to fix a leaky sink"
10 "Date":"July 25th 2024"
11 "Content": "Fixing a leaky sink will depend on the source of the leak. Here are general steps you can take to address common types of sink leaks:....."
12 },.....
13]

Looking at the example above, passing in rank_fields=["Title","Content"] would mean the model considers both the title and content for ranking. The rank_fields parameter is not commutative, which means rank_fields=["Title","Content"] can lead to different results than rank_fields=["Content","Title"].

Interpreting Results

The most important output from the Rerank API endpoint is the absolute rank exposed in the response object. The score is query dependent, and could be higher or lower depending on the query and passages sent in. In the example below, what matters is that Ottawa is more relevant than Toronto, but the user should not assume that Ottawa is two times more relevant than Ontario.

PYTHON
1[
2 RerankResult<text: Ottawa, index: 1, relevance_score: 0.9109375>,
3 RerankResult<text: Toronto, index: 2, relevance_score: 0.7128906>,
4 RerankResult<text: Ontario, index: 3, relevance_score: 0.04421997>
5]

Relevance scores are normalized to be in the range [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance. To find a threshold on the scores to determine whether a document is relevant or not, we recommend going through the following process:

  • Select a set of 30-50 representative queries Q=[q_0, … q_n] from your domain.
  • For each query provide a document that is considered borderline relevant to the query for your specific use case, and create a list of (query, document) pairs: sample_inputs=[(q_0, d_0), …, (q_n, d_n)] .
  • Pass all tuples in sample_inputs through the rerank endpoint in a loop, and gather relevance scores sample_scores=[s0, ..., s_n] .

The average of sample_scores can then be used as a reference when deciding a threshold for filtering out irrelevant documents.