For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • How Are Costs Calculated for Different Cohere Models?
  • What’s the Difference Between “billed” Tokens and Generic Tokens?
  • Trial Usage and Production Usage
Going to Production

How Does Cohere's Pricing Work?

Was this page helpful?
Edit this page
Previous

Integrating Embedding Models with Other Tools

Use Cohere's Embeddings with the tools you love.
Next
Built with

If you’re looking to scale use cases in production, Cohere models are some of the most cost-efficient options on the market today. This page contains information about how Cohere’s pricing model operates, for each of our major model offerings.

You can find up-to-date prices for each of our generation, rerank, and embed models on the dedicated pricing page.

How Are Costs Calculated for Different Cohere Models?

Our generative models, such as Command A+, Command A, Command R7B, Command R, and Command R+, are priced on a per-token basis. Be aware that input tokens (i.e. tokens generated from text sent to the model) and output tokens (i.e. text generated by the model) are priced differently.

Our Rerank models are priced based on the quantity of searches, and our Embedding models are priced based on the number of tokens embedded.

What’s the Difference Between “billed” Tokens and Generic Tokens?

When using the Chat API endpoint, the response will contain the total count of input and output tokens, as well as the count of billed tokens. Here’s an example:

JSON
1{
2 "billed_units": {
3 "input_tokens": 6772,
4 "output_tokens": 248
5 },
6 "tokens": {
7 "input_tokens": 7596,
8 "output_tokens": 645
9 }
10}

The rerank and embed models have their own, slightly different versions, and it may not be obvious why there are separate input and output values under billed_units. To clarify, the billed input and output tokens are the tokens that you’re actually billed for. The reason these values can be different from the overall "tokens" value is that there are situations in which Cohere adds tokens under the hood, and there are others in which a particular model has been trained to do so (i.e. when outputting special tokens). Since these are tokens you don’t have control over, you are not charged for them.

Trial Usage and Production Usage

Cohere makes a distinction between “trial” and “production” usage of an API key.

With respect to pricing, the key thing to understand is that trial API key usage is free, but limited. Developers wanting to test different applications or build proofs of concept can use all of Cohere’s models and APIs can do so with a trial key by simply signing up for a Cohere account here.