Reasoning Capabilities

Reasoning models represent an advanced approach to AI that enables more sophisticated problem-solving capabilities. Cohere’s reasoning models are hybrid, meaning reasoning can be enabled (in which case they generate internal reasoning processes before delivering their final responses) or disabled (in which case they function the way any other LLM would).

How Reasoning Models Work

When a reasoning model processes a request, it first works internally to break the problem down step-by-step. This reasoning process happens in dedicated “thinking” content blocks where the model works through its analysis, planning, and logical steps. Only after completing this internal reasoning does the model produce its final text response, and this allows them to tackle complex tasks with deeper analysis.

The key benefit is that reasoning models can handle complex problems—such as leveraging tools and agentic problem solving in the 23 supported languages—by first working through the problem internally before presenting a well-reasoned solution. This approach leads to more accurate and thorough responses, while pushing the boundary for the complexity of problems the model is able to solve.

Getting Started

Models with Reasoning capabilities are accessible via the Chat API. Here’s an example:

1 from cohere import ClientV2
2 
3 co = ClientV2(api_key="<YOUR_API_KEY>")
4 
5 prompt = """
6 Alice has 3 brothers and she also has 2 sisters. How many sisters does Alice's brother have?
7 """
8 
9 response = co.chat(
10     model="command-a-reasoning-08-2025",
11     messages=[
12         {
13             "role": "user",
14             "content": prompt,
15         }
16     ],
17 )
18 
19 for content in response.message.content:
20 	if content.type == "thinking":
21 		print("Thinking:", content.thinking)
22 
23 	if content.type == "text":
24 		print("Response:", content.text)

Enabling / Disabling Reasoning Capabilities

For reasoning models, thinking is enabled by default. To disable it, send the following value to the "thinking" parameter:

PYTHON

1 thinking={ 
2     "type": "disabled" # turns off thinking. It is set to "enabled" by default.
3 }

Thinking Budgets

A thinking token budget can also be specified, to set an upper limit on how many thinking tokens the model can produce. Our recommendation is to use unlimited thinking (i.e. reasoning = on). However, if you plan to use thinking budgets, please make sure to leave at least 1K tokens for the response. For example, if you want the model to reason until the maximum limit, we recommend 31K as the token budget.

When the budget is exceeded, the model will immediately proceed with the final response.

PYTHON

1 thinking = {
2     "token_budget": 500  # limits the model's thinking output to at most 500 tokens
3 }

Use Cases and Applications

Reasoning models excel at tasks that benefit from step-by-step analysis, including:

Agentic Use Cases: Taking autonomous actions and interacting with the environment to solve problems.
Tool Use: Able to leverage a variety of tools, such as search engines and APIs.
Multilingual: Able to reason over multilingual inputs, providing support to user queries in 23 different languages.

Technical Implementation

The reasoning process is controlled through specific parameters that allow developers to:

Enable or disable reasoning capabilities
Set token budgets to control the depth of reasoning
Stream responses to see reasoning and final answers in real-time

This architecture makes reasoning models particularly valuable for applications requiring high accuracy, transparency in reasoning, and the ability to handle complex, multi-faceted problems that benefit from systematic analysis.