For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Embed
    • Rerank
    • Aya
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Streaming Responses
    • Structured Outputs
    • Predictable Outputs
    • Advanced Generation Parameters
    • Retrieval Augmented Generation (RAG)
    • Tool Use
    • Tokens and Tokenizers
    • Migrating from the Generate API to the Chat API
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • What is a Token?
  • Tokenizers
  • The tokenize and detokenize API endpoints
  • Tokenization in Python SDK
  • Caching and Optimization
  • Downloading a Tokenizer
  • Getting a Local Tokenizer
Text Generation

A Guide to Tokens and Tokenizers

Was this page helpful?
Edit this page
Previous

Migrating from the Generate API to the Chat API

Next
Built with

What is a Token?

Our language models understand “tokens” rather than characters or bytes. One token can be a part of a word, an entire word, or punctuation. Very common words like “water” will have their own unique tokens. A longer, less frequent word might be encoded into 2-3 tokens, e.g. “waterfall” gets encoded into two tokens, one for “water” and one for “fall”. Note that tokenization is sensitive to whitespace and capitalization.

Here are some references to calibrate how many tokens are in a text:

  • One word tends to be about 2-3 tokens.
  • A paragraph is about 128 tokens.
  • This short article you’re reading now has about 300 tokens.

The number of tokens per word depends on the complexity of the text. Simple text may approach one token per word on average, while complex texts may use less common words that require 3-4 tokens per word on average.

Our vocabulary of tokens is created using byte pair encoding, which you can read more about here.

Tokenizers

A tokenizer is a tool used to convert text into tokens and vice versa. Tokenizers are model specific: the tokenizer for one Cohere model is not compatible with the a different Cohere model, because they were trained using different tokenization methods.

Tokenizers are often used to count how many tokens a text contains. This is useful because models can handle only a certain number of tokens in one go. This limitation is known as “context length,” and the number varies from model to model.

The tokenize and detokenize API endpoints

Cohere offers the tokenize and detokenize API endpoints for converting between text and tokens for the specified model. The hosted tokenizer saves users from needing to download their own tokenizer, but this may result in higher latency from a network call.

Tokenization in Python SDK

Cohere Tokenizers are publicly hosted and can be used locally to avoid network calls. If you are using the Python SDK, the tokenize and detokenize functions will take care of downloading and caching the tokenizer for you

PYTHON
1import cohere
2
3co = cohere.Client(api_key="<API KEY>")
4
5co.tokenize(text="caterpillar", model="command-a-03-2025")
6# -> [74, 2340,107771]

Notice that this downloads the tokenizer config for the specified model, which might take a couple of seconds for the initial request.

Caching and Optimization

The cache for the tokenizer configuration is declared for each client instance. This means that starting a new process will re-download the configurations again.

If you are doing development work before going to production with your application, this might be slow if you are just experimenting by redefining the client initialization. Cohere API offers endpoints for tokenize and detokenize which avoids downloading the tokenizer configuration file. In the Python SDK, these can be accessed by setting offline=False like so:

PYTHON
1import cohere
2
3co = cohere.Client(api_key="<API KEY>")
4
5co.tokenize(
6 text="caterpillar", model="command-a-03-2025", offline=False
7)
8# -> [74, 2340,107771], no tokenizer config was downloaded

Downloading a Tokenizer

Alternatively, the latest version of the tokenizer can be downloaded manually:

PYTHON
1# pip install tokenizers
2
3from tokenizers import Tokenizer
4import requests
5
6# download the tokenizer
7
8# use /models/<id> endpoint for latest URL
9tokenizer_url = "https://..."
10
11
12response = requests.get(tokenizer_url)
13tokenizer = Tokenizer.from_str(response.text)
14
15tokenizer.encode(sequence="...", add_special_tokens=False)

The URL for the tokenizer should be obtained dynamically by calling the Models API. Here is a sample response for the Command R 08-2024 model:

JSON
1{
2 "name": "command-a-03-2025",
3 ...
4 "tokenizer_url": "https://storage.googleapis.com/cohere-public/tokenizers/command-a-03-2025.json"
5}

Getting a Local Tokenizer

We commonly have requests for local tokenizers that don’t necessitate using the Cohere API. Hugging Face hosts options for the command-nightly and multilingual embedding models.