How Does Cohere’s Pricing Work?
If you’re looking to scale use cases in production, Cohere models are some of the most cost-efficient options on the market today. This page contains information about how Cohere’s pricing model operates, for each of our major model offerings.
How Are Costs Calculated for Different Cohere Models?
Our generative models, such as Command R and Command R+, are priced on a per-token basis. Be aware that input tokens (i.e. tokens generated from text sent to the model) and output tokens (i.e. text generated by the model) are priced differently.
Our Rerank models are priced based on the quantity of searches, and our Embedding models are priced based on the number of tokens embedded.
You can find up-to-date prices on our dedicated pricing page.
What’s the Difference Between “billed” Tokens and Generic Tokens?
When using the Chat API endpoint, the response will contain the total count of input and output tokens, as well as the count of billed tokens. Here’s an example:
The rerank and embed models have their own, slightly different versions, and it may not be obvious why there are separate input and output values under billed_units
. To clarify, the billed input and output tokens are the tokens that you’re actually billed for. The reason these values can be different from the overall "tokens"
value is that there are situations in which Cohere adds tokens under the hood, and there are others in which a particular model has been trained to do so (i.e. when outputting special tokens). Since these are tokens you don’t have control over, you are not charged for them.
Trial Usage and Production Usage
Cohere makes a distinction between “trial” and “production” usage of an API key.
With respect to pricing, the key thing to understand is that trial API key usage is free, but limited. Developers wanting to test different applications or build proofs of concept can use all of Cohere’s models and APIs can do so with a trial key by simply signing up for a Cohere account here.