For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Embed
    • Rerank
    • Aya
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Streaming Responses
    • Structured Outputs
    • Predictable Outputs
    • Advanced Generation Parameters
    • Retrieval Augmented Generation (RAG)
    • Tool Use
    • Tokens and Tokenizers
      • Prompt Truncation
    • Migrating from the Generate API to the Chat API
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Text GenerationPrompt Engineering

How Does Prompt Truncation Work?

LLMs come with limitations; specifically, they can only handle so much text as input. This means that you will often need to figure out which part of a document or chat history to keep, and which ones to omit.

To make this easier, the Chat API comes with a helpful prompt_truncation parameter. When prompt_truncation is set to AUTO, the API will automatically break up the documents into smaller chunks, rerank those chunks according to how relevant they are, and then start dropping the least relevant documents until the text fits within the model’s context length limit.

Note: The last few messages in the chat history will never be truncated or dropped. The RAG API will throw a 400 Too Many Tokens error if it can’t fit those messages along with a single document under the context limit.

Was this page helpful?
Edit this page
Previous

Migrating from the Generate API to the Chat API

Next
Built with