For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
    • Improving the Rerank Fine-tuning Results
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Refining Data Quality
  • Troubleshooting

Improving the Rerank Fine-tuning Results

Was this page helpful?
Edit this page
Previous
Built with

Cohere’s fine-tuning feature was deprecated on September 15, 2025

There are several things to take into account to achieve the best fine-tuned rerank model, most of which revolve around refining the quality of your data.

Refining Data Quality

  • Text cleaning: Improving the quality of the data is often the best investment you can make when solving a problem with machine learning. If the text contains symbols, URLs, or HTML code which are not needed for a specific task, for example, make sure to remove them from the trained file (and from the text you later send to the trained model).
  • Number of examples: The minimum number of labeled examples is 256. Be aware, however, that the more examples you can include, the better!
  • Diversity of queries: It is important that the queries that are used to fine-tune the model are diverse in nature - simply paraphrasing the query for different examples will not lead to an increase in model quality. Moreover, it is important that the queries mirror the expected queries from users.
  • Diversity of documents: It is important that the documents that are used to fine-tune the model are diverse in nature. Try to make sure the model is aware of all the different types of documents it might encounter in your the course.
  • Hard Negatives: While hard negatives are optional, having more hard negative will greatly improve the quality of the fine-tuned model. What’s more, we advise structuring hard negatives to be semantically similar, rather than obviously incorrect documents (e.g. we would not recommend mining hard negatives using BM25 or non-semantic approaches).
  • Length of texts: The context size for text is 510 tokens, which includes both the query and the relevant passage. Currently, if the sum of the tokens between the query and relevant passage is longer than 510 tokens, we will include the entire query, and as many relevant passage tokens as possible until 510 is reached. As a result, if you are trying to fine-tune with longer documents, it is recommended that you ensure the relevant passage contains the most relevant snippet from the long document, and the snippet and query together are less than 510 tokens when concatenated.
  • High quality test set: In the data upload step, include a separate test set of examples that you want to see the model benchmarked on. These can be examples that were manually written or verified.

Troubleshooting

We have a dedicated guide for troubleshooting fine-tuned models which is consistent for all the different model types and endpoints. Check it out here.