For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
    • Improving the Classify Fine-tuning Results
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Refining data quality
  • Troubleshooting

Improving the Classify Fine-tuning Results

Was this page helpful?
Edit this page
Previous
Built with

Cohere’s fine-tuning feature was deprecated on September 15, 2025

There are several things you need to take into account to achieve the best fine-tuned model for Classification, all of which are based on giving the model higher-quality data.

Refining data quality

  • Text cleaning: Improving the quality of the data is often the best investment you can make when solving a problem with machine learning. If the text contains symbols, URLs, or HTML code which are not needed for a specific task, for example, make sure to remove them from the trained file (and from the text you later send to the trained model).
  • Number of examples: The minimum number of labeled examples is 40. Be aware, however, that the more examples you can include, the better!
  • Number of examples per class: In addition to the overall number of examples, it’s important to have many examples of each class in the dataset. You should include at least five examples per label.
  • Mix of examples in dataset: We recommend that you have a balanced (roughly equal) number of examples per class in your dataset. Imbalanced data is a well-known problem in machine learning, and can lead to sub-optimal results.
  • Length of texts: The context size for text is currently 512 tokens. Any tokens beyond this limit are truncated.
  • Deduplication: Ensure that each labeled example in your dataset is unique.
  • High quality test set: In the data upload step, upload a separate test set of examples that you want to see the model benchmarked on. These can be examples that were manually written or verified.

Troubleshooting

We have a dedicated guide for troubleshooting fine-tuned models which is consistent for all the different model types and endpoints. Check it out here.