Usage patterns for tool use (function calling)

The tool use feature of the Chat endpoint comes with a set of capabilities that enable developers to implement a variety of tool use scenarios. This section describes the different patterns of tool use implementation supported by these capabilities. Each pattern can be implemented on its own or in combination with the others.

Setup

First, import the Cohere library and create a client.

Cohere platform

Private deployment

PYTHON

1 # ! pip install -U cohere
2 import cohere
3 
4 co = cohere.ClientV2(
5     "COHERE_API_KEY"
6 )  # Get your free API key here: https://dashboard.cohere.com/api-keys

We’ll use the same get_weather tool as in the previous example.

PYTHON

1 def get_weather(location):
2     # Implement any logic here
3     return [{"temperature": "20C"}]
4     # Return a list of objects e.g. [{"url": "abc.com", "text": "..."}, {"url": "xyz.com", "text": "..."}]
5 
6 
7 functions_map = {"get_weather": get_weather}
8 
9 tools = [
10     {
11         "type": "function",
12         "function": {
13             "name": "get_weather",
14             "description": "gets the weather of a given location",
15             "parameters": {
16                 "type": "object",
17                 "properties": {
18                     "location": {
19                         "type": "string",
20                         "description": "the location to get the weather, example: San Francisco.",
21                     }
22                 },
23                 "required": ["location"],
24             },
25         },
26     },
27 ]

Parallel tool calling

The model can determine that more than one tool call is required, where it will call multiple tools in parallel. This can be calling the same tool multiple times or different tools for any number of calls.

In the example below, the user asks for the weather in Toronto and New York. This requires calling the get_weather function twice, one for each location. This is reflected in the model’s response, where two parallel tool calls are generated.

1 messages = [
2     {
3         "role": "user",
4         "content": "What's the weather in Toronto and New York?",
5     }
6 ]
7 
8 response = co.chat(
9     model="command-a-03-2025", messages=messages, tools=tools
10 )
11 
12 if response.message.tool_calls:
13     messages.append(
14         {
15             "role": "assistant",
16             "tool_plan": response.message.tool_plan,
17             "tool_calls": response.message.tool_calls,
18         }
19     )
20     print(response.message.tool_plan, "\n")
21     print(response.message.tool_calls)

Example response:

1 I will search for the weather in Toronto and New York.
2 
3 [
4     ToolCallV2(
5         id="get_weather_9b0nr4kg58a8",
6         type="function",
7         function=ToolCallV2Function(
8             name="get_weather", arguments='{"location":"Toronto"}'
9         ),
10     ),
11     ToolCallV2(
12         id="get_weather_0qq0mz9gwnqr",
13         type="function",
14         function=ToolCallV2Function(
15             name="get_weather", arguments='{"location":"New York"}'
16         ),
17     ),
18 ]

State management

When tools are called in parallel, we append the messages list with one single assistant message containing all the tool calls and one tool message for each tool call.

PYTHON

1 import json
2 
3 if response.message.tool_calls:
4     for tc in response.message.tool_calls:
5         tool_result = functions_map[tc.function.name](
6             **json.loads(tc.function.arguments)
7         )
8         tool_content = []
9         for data in tool_result:
10             # Optional: the "document" object can take an "id" field for use in citations, otherwise auto-generated
11             tool_content.append(
12                 {
13                     "type": "document",
14                     "document": {"data": json.dumps(data)},
15                 }
16             )
17         messages.append(
18             {
19                 "role": "tool",
20                 "tool_call_id": tc.id,
21                 "content": tool_content,
22             }
23         )

The sequence of messages is represented in the diagram below.

Directly answering

A key attribute of tool use systems is the model’s ability to choose the right tools for a task. This includes the model’s ability to decide to not use any tool, and instead, respond to a user message directly.

In the example below, the user asks for a simple arithmetic question. The model determines that it does not need to use any of the available tools (only one, get_weather, in this case), and instead, directly answers the user.

1 messages = [{"role": "user", "content": "What's 2+2?"}]
2 
3 response = co.chat(
4     model="command-a-03-2025", messages=messages, tools=tools
5 )
6 
7 if response.message.tool_calls:
8     print(response.message.tool_plan, "\n")
9     print(response.message.tool_calls)
10 
11 else:
12     print(response.message.content[0].text)

Example response:

1 The answer to 2+2 is 4.

State management

When the model opts to respond directly to the user, there will be no items 2 and 3 above (the tool calling and tool response messages). Instead, the final assistant message will contain the model’s direct response to the user.

Note: you can force the model to directly answer every time using the tool_choice parameter, described here

Multi-step tool use

The Chat endpoint supports multi-step tool use, which enables the model to perform sequential reasoning. This is especially useful in agentic workflows that require multiple steps to complete a task.

As an example, suppose a tool use application has access to a web search tool. Given the question “What was the revenue of the most valuable company in the US in 2023?”, it will need to perform a series of steps in a specific order:

Identify the most valuable company in the US in 2023
Then only get the revenue figure now that the company has been identified

To illustrate this, let’s start with the same weather example and add another tool called get_capital_city, which returns the capital city of a given country.

Here’s the function definitions for the tools:

PYTHON

1 def get_weather(location):
2     temperature = {
3         "bern": "22°C",
4         "madrid": "24°C",
5         "brasilia": "28°C",
6     }
7     loc = location.lower()
8     if loc in temperature:
9         return [{"temperature": {loc: temperature[loc]}}]
10     return [{"temperature": {loc: "Unknown"}}]
11 
12 
13 def get_capital_city(country):
14     capital_city = {
15         "switzerland": "bern",
16         "spain": "madrid",
17         "brazil": "brasilia",
18     }
19     country = country.lower()
20     if country in capital_city:
21         return [{"capital_city": {country: capital_city[country]}}]
22     return [{"capital_city": {country: "Unknown"}}]
23 
24 
25 functions_map = {
26     "get_capital_city": get_capital_city,
27     "get_weather": get_weather,
28 }

And here are the corresponding tool schemas:

PYTHON

1 tools = [
2     {
3         "type": "function",
4         "function": {
5             "name": "get_weather",
6             "description": "gets the weather of a given location",
7             "parameters": {
8                 "type": "object",
9                 "properties": {
10                     "location": {
11                         "type": "string",
12                         "description": "the location to get the weather, example: San Francisco.",
13                     }
14                 },
15                 "required": ["location"],
16             },
17         },
18     },
19     {
20         "type": "function",
21         "function": {
22             "name": "get_capital_city",
23             "description": "gets the capital city of a given country",
24             "parameters": {
25                 "type": "object",
26                 "properties": {
27                     "country": {
28                         "type": "string",
29                         "description": "the country to get the capital city for",
30                     }
31                 },
32                 "required": ["country"],
33             },
34         },
35     },
36 ]

Next, we implement the four-step tool use workflow as described in the previous page.

The key difference here is the second (tool calling) and third (tool execution) steps are put in a while loop, which means that a sequence of this pair can happen for a number of times. This stops when the model decides in the tool calling step that no more tool calls are needed, which then triggers the fourth step (response generation).

In this example, the user asks for the temperature in Brazil’s capital city.

PYTHON

1 # Step 1: Get the user message
2 messages = [
3     {
4         "role": "user",
5         "content": "What's the temperature in Brazil's capital city?",
6     }
7 ]
8 
9 # Step 2: Generate tool calls (if any)
10 model = "command-a-03-2025"
11 response = co.chat(
12     model=model, messages=messages, tools=tools, temperature=0.3
13 )
14 
15 while response.message.tool_calls:
16     print("TOOL PLAN:")
17     print(response.message.tool_plan, "\n")
18     print("TOOL CALLS:")
19     for tc in response.message.tool_calls:
20         print(
21             f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
22         )
23     print("=" * 50)
24 
25     messages.append(
26         {
27             "role": "assistant",
28             "tool_plan": response.message.tool_plan,
29             "tool_calls": response.message.tool_calls,
30         }
31     )
32 
33     # Step 3: Get tool results
34     print("TOOL RESULT:")
35     for tc in response.message.tool_calls:
36         tool_result = functions_map[tc.function.name](
37             **json.loads(tc.function.arguments)
38         )
39         tool_content = []
40         print(tool_result)
41         for data in tool_result:
42             # Optional: the "document" object can take an "id" field for use in citations, otherwise auto-generated
43             tool_content.append(
44                 {
45                     "type": "document",
46                     "document": {"data": json.dumps(data)},
47                 }
48             )
49         messages.append(
50             {
51                 "role": "tool",
52                 "tool_call_id": tc.id,
53                 "content": tool_content,
54             }
55         )
56 
57     # Step 4: Generate response and citations
58     response = co.chat(
59         model=model,
60         messages=messages,
61         tools=tools,
62         temperature=0.1,
63     )
64 
65 messages.append(
66     {
67         "role": "assistant",
68         "content": response.message.content[0].text,
69     }
70 )
71 
72 # Print final response
73 print("RESPONSE:")
74 print(response.message.content[0].text)
75 print("=" * 50)
76 
77 # Print citations (if any)
78 verbose_source = (
79     True  # Change to True to display the contents of a source
80 )
81 if response.message.citations:
82     print("CITATIONS:\n")
83     for citation in response.message.citations:
84         print(
85             f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
86         )
87         print("Sources:")
88         for idx, source in enumerate(citation.sources):
89             print(f"{idx+1}. {source.id}")
90             if verbose_source:
91                 print(f"{source.tool_output}")
92         print("\n")

The model first determines that it needs to find out the capital city of Brazil. Once it has this information, it proceeds with the next step in the sequence, which is to look up the temperature of that city.

This is reflected in the model’s response, where two tool calling-result pairs are generated in a sequence.

Example response:

1 TOOL PLAN:
2 First, I will search for the capital city of Brazil. Then, I will search for the temperature in that city. 
3 
4 TOOL CALLS:
5 Tool name: get_capital_city | Parameters: {"country":"Brazil"}
6 ==================================================
7 TOOL RESULT:
8 [{'capital_city': {'brazil': 'brasilia'}}]
9 TOOL PLAN:
10 I have found that the capital city of Brazil is Brasilia. Now, I will search for the temperature in Brasilia. 
11 
12 TOOL CALLS:
13 Tool name: get_weather | Parameters: {"location":"Brasilia"}
14 ==================================================
15 TOOL RESULT:
16 [{'temperature': {'brasilia': '28°C'}}]
17 RESPONSE:
18 The temperature in Brasilia, the capital city of Brazil, is 28°C.
19 ==================================================
20 CITATIONS:
21 
22 Start: 60| End:65| Text:'28°C.' 
23 Sources:
24 1. get_weather_p0dage9q1nv4:0
25 {'temperature': '{"brasilia":"28°C"}'}

State management

In a multi-step tool use scenario, instead of just one occurence of assistant-tool messages, there will be a sequence of assistant-tool messages to reflect the multiple steps of tool calling involved.

Forcing tool usage

This feature is only compatible with the Command R7B and newer models.

As shown in the previous examples, during the tool calling step, the model may decide to either:

make tool call(s)
or, respond to a user message directly.

You can, however, force the model to choose one of these options. This is done via the tool_choice parameter.

You can force the model to make tool call(s), i.e. to not respond directly, by setting the tool_choice parameter to REQUIRED.
Alternatively, you can force the model to respond directly, i.e. to not make tool call(s), by setting the tool_choice parameter to NONE.

By default, if you don’t specify the tool_choice parameter, then it is up to the model to decide whether to make tool calls or respond directly.

PYTHON

1 response = co.chat(
2     model="command-a-03-2025",
3     messages=messages,
4     tools=tools,
5     tool_choice="REQUIRED" # optional, to force tool calls
6     # tool_choice="NONE" # optional, to force a direct response
7 )

State management

Here’s the sequence of messages when tool_choice is set to REQUIRED.

Here’s the sequence of messages when tool_choice is set to NONE.

Chatbots (multi-turn)

Building chatbots requires maintaining the memory or state of a conversation over multiple turns. To do this, we can keep appending each turn of a conversation to the messages list.

As an example, here’s the messages list from the first turn of a conversation.

PYTHON

1 from cohere import ToolCallV2, ToolCallV2Function
2 
3 messages = [
4     {"role": "user", "content": "What's the weather in Toronto?"},
5     {
6         "role": "assistant",
7         "tool_plan": "I will search for the weather in Toronto.",
8         "tool_calls": [
9             ToolCallV2(
10                 id="get_weather_1byjy32y4hvq",
11                 type="function",
12                 function=ToolCallV2Function(
13                     name="get_weather",
14                     arguments='{"location":"Toronto"}',
15                 ),
16             )
17         ],
18     },
19     {
20         "role": "tool",
21         "tool_call_id": "get_weather_1byjy32y4hvq",
22         "content": [
23             {
24                 "type": "document",
25                 "document": {"data": '{"temperature": "20C"}'},
26             }
27         ],
28     },
29     {"role": "assistant", "content": "It's 20°C in Toronto."},
30 ]

Then, in the second turn, when provided with a rather vague follow-up user message, the model correctly infers that the context is about the weather.

1 messages.append({"role": "user", "content": "What about London?"})
2 
3 response = co.chat(
4     model="command-a-03-2025", messages=messages, tools=tools
5 )
6 
7 if response.message.tool_calls:
8     messages.append(
9         {
10             "role": "assistant",
11             "tool_plan": response.message.tool_plan,
12             "tool_calls": response.message.tool_calls,
13         }
14     )
15     print(response.message.tool_plan, "\n")
16     print(response.message.tool_calls)

Example response:

1 I will search for the weather in London. 
2 
3 [ToolCallV2(id='get_weather_8hwpm7d4wr14', type='function', function=ToolCallV2Function(name='get_weather', arguments='{"location":"London"}'))]

State management

The sequence of messages is represented in the diagram below.