Performing Tasks Sequentially with Cohere’s RAG

Open in Colab

Compare two user queries to a RAG chatbot, “What was Apple’s revenue in 2023?” and “What was the revenue of the most valuable company in the US in 2023?“.

While the first query is straightforward to handle, the second query requires breaking down into two steps:

  1. Identify the most valuable company in the US in 2023
  2. Get the revenue of the company in 2023

These steps need to happen in a sequence rather than all at once. This is because the information retrieved from the first step is required to inform the second step.

This is an example of sequential reasoning. In this tutorial, we’ll learn how agentic RAG with Cohere handles sequential reasoning, and in particular:

  • Multi-step tool calling
  • Multi-step, parallel tool calling
  • Self-correction

We’ll learn these by building an agent that answers questions about using Cohere.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

We also need to import the tool definitions that we’ll use in this tutorial.

Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly.
PYTHON
1! pip install cohere langchain langchain-community pydantic -qq
PYTHON
1import json
2import os
3import cohere
4
5from tool_def import (
6 search_developer_docs,
7 search_developer_docs_tool,
8 search_internet,
9 search_internet_tool,
10 search_code_examples,
11 search_code_examples_tool,
12)
13
14co = cohere.ClientV2(
15 "COHERE_API_KEY"
16) # Get your free API key: https://dashboard.cohere.com/api-keys
17
18os.environ["TAVILY_API_KEY"] = (
19 "TAVILY_API_KEY" # We'll need the Tavily API key to perform internet search. Get your API key: https://app.tavily.com/home
20)

Setting up the tools

We set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON
1functions_map = {
2 "search_developer_docs": search_developer_docs,
3 "search_internet": search_internet,
4 "search_code_examples": search_code_examples,
5}

Running an agentic RAG workflow

We create a run_agent function to run the agentic RAG workflow, the same as in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON
1tools = [
2 search_developer_docs_tool,
3 search_internet_tool,
4 search_code_examples_tool,
5]
PYTHON
1system_message = """## Task and Context
2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need."""
PYTHON
1model = "command-a-03-2025"
2
3
4def run_agent(query, messages=None):
5 if messages is None:
6 messages = []
7
8 if "system" not in {m.get("role") for m in messages}:
9 messages.append({"role": "system", "content": system_message})
10
11 # Step 1: get user message
12 print(f"QUESTION:\n{query}")
13 print("=" * 50)
14
15 messages.append({"role": "user", "content": query})
16
17 # Step 2: Generate tool calls (if any)
18 response = co.chat(
19 model=model, messages=messages, tools=tools, temperature=0.3
20 )
21
22 while response.message.tool_calls:
23
24 print("TOOL PLAN:")
25 print(response.message.tool_plan, "\n")
26 print("TOOL CALLS:")
27 for tc in response.message.tool_calls:
28 print(
29 f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
30 )
31 print("=" * 50)
32
33 messages.append(
34 {
35 "role": "assistant",
36 "tool_calls": response.message.tool_calls,
37 "tool_plan": response.message.tool_plan,
38 }
39 )
40
41 # Step 3: Get tool results
42 for tc in response.message.tool_calls:
43 tool_result = functions_map[tc.function.name](
44 **json.loads(tc.function.arguments)
45 )
46 tool_content = []
47 for data in tool_result:
48 tool_content.append(
49 {
50 "type": "document",
51 "document": {"data": json.dumps(data)},
52 }
53 )
54 # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
55 messages.append(
56 {
57 "role": "tool",
58 "tool_call_id": tc.id,
59 "content": tool_content,
60 }
61 )
62
63 # Step 4: Generate response and citations
64 response = co.chat(
65 model=model,
66 messages=messages,
67 tools=tools,
68 temperature=0.3,
69 )
70
71 messages.append(
72 {
73 "role": "assistant",
74 "content": response.message.content[0].text,
75 }
76 )
77
78 # Print final response
79 print("RESPONSE:")
80 print(response.message.content[0].text)
81 print("=" * 50)
82
83 # Print citations (if any)
84 verbose_source = (
85 False # Change to True to display the contents of a source
86 )
87 if response.message.citations:
88 print("CITATIONS:\n")
89 for citation in response.message.citations:
90 print(
91 f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
92 )
93 print("Sources:")
94 for idx, source in enumerate(citation.sources):
95 print(f"{idx+1}. {source.id}")
96 if verbose_source:
97 print(f"{source.tool_output}")
98 print("\n")
99
100 return messages

Multi-step tool calling

Let’s ask the agent a few questions, starting with this one about a specific feature. The user is asking about two things: a feature to reorder search results and code examples for that feature.

In this case, the agent first needs to identify what that feature is before it can answer the second part of the question.

This is reflected in the agent’s tool plan, which describes the steps it will take to answer the question.

So, it first calls the search_developer_docs tool to find the feature.

It then discovers that the feature is Rerank. Using this information, it calls the search_code_examples tool to find code examples for that feature.

Finally, it uses the retrieved information to answer both parts of the user’s question.

PYTHON
1messages = run_agent(
2 "What's the Cohere feature to reorder search results? Do you have any code examples on that?"
3)
1QUESTION:
2What's the Cohere feature to reorder search results? Do you have any code examples on that?
3==================================================
4TOOL PLAN:
5I will search for the Cohere feature to reorder search results. Then I will search for code examples on that.
6
7TOOL CALLS:
8Tool name: search_developer_docs | Parameters: {"query":"reorder search results"}
9==================================================
10TOOL PLAN:
11I found that the Rerank endpoint is the feature that reorders search results. I will now search for code examples on that.
12
13TOOL CALLS:
14Tool name: search_code_examples | Parameters: {"query":"rerank endpoint"}
15==================================================
16RESPONSE:
17The Rerank endpoint is the feature that reorders search results. Unfortunately, I could not find any code examples on that.
18==================================================
19CITATIONS:
20
21Start: 4| End:19| Text:'Rerank endpoint'
22Sources:
231. search_developer_docs_53tfk9zgwgzt:0

Multi-step, parallel tool calling

In Part 2, we saw how the Cohere API supports parallel tool calling, and in this tutorial, we looked at sequential tool calling. That also means that both scenarios can happen at the same time.

Here’s an example. Suppose we ask the agent to find the CEOs of the companies with the top 3 highest market capitalization.

In the first step, it searches the Internet for information about the 3 companies with the highest market capitalization.

And in the second step, it performs parallel searches for the CEOs of the 3 identified companies.

PYTHON
1messages = run_agent(
2 "Who are the CEOs of the companies with the top 3 highest market capitalization."
3)
1QUESTION:
2Who are the CEOs of the companies with the top 3 highest market capitalization.
3==================================================
4TOOL PLAN:
5I will search for the top 3 companies with the highest market capitalization. Then, I will search for the CEOs of those companies.
6
7TOOL CALLS:
8Tool name: search_internet | Parameters: {"query":"top 3 companies with highest market capitalization"}
9==================================================
10TOOL PLAN:
11The top 3 companies with the highest market capitalization are Apple, Microsoft, and Nvidia. I will now search for the CEOs of these companies.
12
13TOOL CALLS:
14Tool name: search_internet | Parameters: {"query":"Apple CEO"}
15Tool name: search_internet | Parameters: {"query":"Microsoft CEO"}
16Tool name: search_internet | Parameters: {"query":"Nvidia CEO"}
17==================================================
18RESPONSE:
19The CEOs of the top 3 companies with the highest market capitalization are:
201. Tim Cook of Apple
212. Satya Nadella of Microsoft
223. Jensen Huang of Nvidia
23==================================================
24CITATIONS:
25
26Start: 79| End:87| Text:'Tim Cook'
27Sources:
281. search_internet_0f8wyxfc3hmn:0
292. search_internet_0f8wyxfc3hmn:1
303. search_internet_0f8wyxfc3hmn:2
31
32
33Start: 91| End:96| Text:'Apple'
34Sources:
351. search_internet_kb9qgs1ps69e:0
36
37
38Start: 100| End:113| Text:'Satya Nadella'
39Sources:
401. search_internet_wy4mn7286a88:0
412. search_internet_wy4mn7286a88:1
423. search_internet_wy4mn7286a88:2
43
44
45Start: 117| End:126| Text:'Microsoft'
46Sources:
471. search_internet_kb9qgs1ps69e:0
48
49
50Start: 130| End:142| Text:'Jensen Huang'
51Sources:
521. search_internet_q9ahz81npfqz:0
532. search_internet_q9ahz81npfqz:1
543. search_internet_q9ahz81npfqz:2
554. search_internet_q9ahz81npfqz:3
56
57
58Start: 146| End:152| Text:'Nvidia'
59Sources:
601. search_internet_kb9qgs1ps69e:0

Self-correction

The concept of sequential reasoning is useful in a broader sense, particularly where the agent needs to adapt and change its plan midway in a task.

In other words, it allows the agent to self-correct.

To illustrate this, let’s look at an example. Here, the user is asking about the authors of the sentence BERT paper.

The agent attempted to find required information via the search_developer_docs tool.

However, we know that the tool doesn’t contain this information because we have only added a small sample of documents.

As a result, the agent, having received the documents back without any relevant information, decides to search the internet instead. This is also helped by the fact that we have added specific instructions in the search_internet tool to search the internet for information not found in the developer documentation.

It finally has the information it needs, and uses it to answer the user’s question.

This highlights another important aspect of agentic RAG, which allows a RAG system to be flexible. This is achieved by powering the retrieval component with an LLM.

On the other hand, a standard RAG system would typically hand-engineer this, and hence, is more rigid.

PYTHON
1messages = run_agent(
2 "Who are the authors of the sentence BERT paper?"
3)
1QUESTION:
2Who are the authors of the sentence BERT paper?
3==================================================
4TOOL PLAN:
5I will search for the authors of the sentence BERT paper.
6
7TOOL CALLS:
8Tool name: search_developer_docs | Parameters: {"query":"authors of the sentence BERT paper"}
9==================================================
10TOOL PLAN:
11I was unable to find any information about the authors of the sentence BERT paper. I will now search for 'sentence BERT paper authors'.
12
13TOOL CALLS:
14Tool name: search_internet | Parameters: {"query":"sentence BERT paper authors"}
15==================================================
16RESPONSE:
17The authors of the Sentence-BERT paper are Nils Reimers and Iryna Gurevych.
18==================================================
19CITATIONS:
20
21Start: 43| End:55| Text:'Nils Reimers'
22Sources:
231. search_internet_z8t19852my9q:0
242. search_internet_z8t19852my9q:1
253. search_internet_z8t19852my9q:2
264. search_internet_z8t19852my9q:3
275. search_internet_z8t19852my9q:4
28
29
30Start: 60| End:75| Text:'Iryna Gurevych.'
31Sources:
321. search_internet_z8t19852my9q:0
332. search_internet_z8t19852my9q:1
343. search_internet_z8t19852my9q:2
354. search_internet_z8t19852my9q:3
365. search_internet_z8t19852my9q:4

Summary

In this tutorial, we learned about:

  • How multi-step tool calling works
  • How multi-step, parallel tool calling works
  • How multi-step tool calling enables an agent to self-correct, and hence, be more flexible

However, up until now, we have only worked with purely unstructured data, the type of data we typically encounter in a standard RAG system.

In the coming chapters, we’ll add another complexity to the agentic RAG system – working with semi-structured and structured data. This adds another dimension to the agent’s flexibility, which is dealing with a more diverse set of data sources.

In Part 4, we’ll learn how to build an agent that can perform faceted queries over semi-structured data.

Built with