Performing Tasks Sequentially with Cohere's RAG

Open in Colab

Compare two user queries to a RAG chatbot, “What was Apple’s revenue in 2023?” and “What was the revenue of the most valuable company in the US in 2023?”.

While the first query is straightforward to handle, the second query requires breaking down into two steps:

Identify the most valuable company in the US in 2023
Get the revenue of the company in 2023

These steps need to happen in a sequence rather than all at once. This is because the information retrieved from the first step is required to inform the second step.

This is an example of sequential reasoning. In this tutorial, we’ll learn how agentic RAG with Cohere handles sequential reasoning, and in particular:

Multi-step tool calling
Multi-step, parallel tool calling
Self-correction

We’ll learn these by building an agent that answers questions about using Cohere.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

We also need to import the tool definitions that we’ll use in this tutorial.

Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly.

PYTHON

1 ! pip install cohere langchain langchain-community pydantic -qq

PYTHON

1 import json
2 import os
3 import cohere
4 
5 from tool_def import (
6     search_developer_docs,
7     search_developer_docs_tool,
8     search_internet,
9     search_internet_tool,
10     search_code_examples,
11     search_code_examples_tool,
12 )
13 
14 co = cohere.ClientV2(
15     "COHERE_API_KEY"
16 )  # Get your free API key: https://dashboard.cohere.com/api-keys
17 
18 os.environ["TAVILY_API_KEY"] = (
19     "TAVILY_API_KEY"  # We'll need the Tavily API key to perform internet search. Get your API key: https://app.tavily.com/home
20 )

Setting up the tools

We set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON

1 functions_map = {
2     "search_developer_docs": search_developer_docs,
3     "search_internet": search_internet,
4     "search_code_examples": search_code_examples,
5 }

Running an agentic RAG workflow

We create a run_agent function to run the agentic RAG workflow, the same as in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON

1 tools = [
2     search_developer_docs_tool,
3     search_internet_tool,
4     search_code_examples_tool,
5 ]

PYTHON

1 system_message = """## Task and Context
2 You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need."""

PYTHON

1 model = "command-a-03-2025"
2 
3 
4 def run_agent(query, messages=None):
5     if messages is None:
6         messages = []
7 
8     if "system" not in {m.get("role") for m in messages}:
9         messages.append({"role": "system", "content": system_message})
10 
11     # Step 1: get user message
12     print(f"QUESTION:\n{query}")
13     print("=" * 50)
14 
15     messages.append({"role": "user", "content": query})
16 
17     # Step 2: Generate tool calls (if any)
18     response = co.chat(
19         model=model, messages=messages, tools=tools, temperature=0.3
20     )
21 
22     while response.message.tool_calls:
23 
24         print("TOOL PLAN:")
25         print(response.message.tool_plan, "\n")
26         print("TOOL CALLS:")
27         for tc in response.message.tool_calls:
28             print(
29                 f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
30             )
31         print("=" * 50)
32 
33         messages.append(
34             {
35                 "role": "assistant",
36                 "tool_calls": response.message.tool_calls,
37                 "tool_plan": response.message.tool_plan,
38             }
39         )
40 
41         # Step 3: Get tool results
42         for tc in response.message.tool_calls:
43             tool_result = functions_map[tc.function.name](
44                 **json.loads(tc.function.arguments)
45             )
46             tool_content = []
47             for data in tool_result:
48                 tool_content.append(
49                     {
50                         "type": "document",
51                         "document": {"data": json.dumps(data)},
52                     }
53                 )
54                 # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
55             messages.append(
56                 {
57                     "role": "tool",
58                     "tool_call_id": tc.id,
59                     "content": tool_content,
60                 }
61             )
62 
63         # Step 4: Generate response and citations
64         response = co.chat(
65             model=model,
66             messages=messages,
67             tools=tools,
68             temperature=0.3,
69         )
70 
71     messages.append(
72         {
73             "role": "assistant",
74             "content": response.message.content[0].text,
75         }
76     )
77 
78     # Print final response
79     print("RESPONSE:")
80     print(response.message.content[0].text)
81     print("=" * 50)
82 
83     # Print citations (if any)
84     verbose_source = (
85         False  # Change to True to display the contents of a source
86     )
87     if response.message.citations:
88         print("CITATIONS:\n")
89         for citation in response.message.citations:
90             print(
91                 f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
92             )
93             print("Sources:")
94             for idx, source in enumerate(citation.sources):
95                 print(f"{idx+1}. {source.id}")
96                 if verbose_source:
97                     print(f"{source.tool_output}")
98             print("\n")
99 
100     return messages

Multi-step tool calling

Let’s ask the agent a few questions, starting with this one about a specific feature. The user is asking about two things: a feature to reorder search results and code examples for that feature.

In this case, the agent first needs to identify what that feature is before it can answer the second part of the question.

This is reflected in the agent’s tool plan, which describes the steps it will take to answer the question.

So, it first calls the search_developer_docs tool to find the feature.

It then discovers that the feature is Rerank. Using this information, it calls the search_code_examples tool to find code examples for that feature.

Finally, it uses the retrieved information to answer both parts of the user’s question.

PYTHON

1 messages = run_agent(
2     "What's the Cohere feature to reorder search results? Do you have any code examples on that?"
3 )

1 QUESTION:
2 What's the Cohere feature to reorder search results? Do you have any code examples on that?
3 ==================================================
4 TOOL PLAN:
5 I will search for the Cohere feature to reorder search results. Then I will search for code examples on that. 
6 
7 TOOL CALLS:
8 Tool name: search_developer_docs | Parameters: {"query":"reorder search results"}
9 ==================================================
10 TOOL PLAN:
11 I found that the Rerank endpoint is the feature that reorders search results. I will now search for code examples on that. 
12 
13 TOOL CALLS:
14 Tool name: search_code_examples | Parameters: {"query":"rerank endpoint"}
15 ==================================================
16 RESPONSE:
17 The Rerank endpoint is the feature that reorders search results. Unfortunately, I could not find any code examples on that.
18 ==================================================
19 CITATIONS:
20 
21 Start: 4| End:19| Text:'Rerank endpoint' 
22 Sources:
23 1. search_developer_docs_53tfk9zgwgzt:0

Multi-step, parallel tool calling

In Part 2, we saw how the Cohere API supports parallel tool calling, and in this tutorial, we looked at sequential tool calling. That also means that both scenarios can happen at the same time.

Here’s an example. Suppose we ask the agent to find the CEOs of the companies with the top 3 highest market capitalization.

In the first step, it searches the Internet for information about the 3 companies with the highest market capitalization.

And in the second step, it performs parallel searches for the CEOs of the 3 identified companies.

PYTHON

1 messages = run_agent(
2     "Who are the CEOs of the companies with the top 3 highest market capitalization."
3 )

1 QUESTION:
2 Who are the CEOs of the companies with the top 3 highest market capitalization.
3 ==================================================
4 TOOL PLAN:
5 I will search for the top 3 companies with the highest market capitalization. Then, I will search for the CEOs of those companies. 
6 
7 TOOL CALLS:
8 Tool name: search_internet | Parameters: {"query":"top 3 companies with highest market capitalization"}
9 ==================================================
10 TOOL PLAN:
11 The top 3 companies with the highest market capitalization are Apple, Microsoft, and Nvidia. I will now search for the CEOs of these companies. 
12 
13 TOOL CALLS:
14 Tool name: search_internet | Parameters: {"query":"Apple CEO"}
15 Tool name: search_internet | Parameters: {"query":"Microsoft CEO"}
16 Tool name: search_internet | Parameters: {"query":"Nvidia CEO"}
17 ==================================================
18 RESPONSE:
19 The CEOs of the top 3 companies with the highest market capitalization are:
20 1. Tim Cook of Apple
21 2. Satya Nadella of Microsoft
22 3. Jensen Huang of Nvidia
23 ==================================================
24 CITATIONS:
25 
26 Start: 79| End:87| Text:'Tim Cook' 
27 Sources:
28 1. search_internet_0f8wyxfc3hmn:0
29 2. search_internet_0f8wyxfc3hmn:1
30 3. search_internet_0f8wyxfc3hmn:2
31 
32 
33 Start: 91| End:96| Text:'Apple' 
34 Sources:
35 1. search_internet_kb9qgs1ps69e:0
36 
37 
38 Start: 100| End:113| Text:'Satya Nadella' 
39 Sources:
40 1. search_internet_wy4mn7286a88:0
41 2. search_internet_wy4mn7286a88:1
42 3. search_internet_wy4mn7286a88:2
43 
44 
45 Start: 117| End:126| Text:'Microsoft' 
46 Sources:
47 1. search_internet_kb9qgs1ps69e:0
48 
49 
50 Start: 130| End:142| Text:'Jensen Huang' 
51 Sources:
52 1. search_internet_q9ahz81npfqz:0
53 2. search_internet_q9ahz81npfqz:1
54 3. search_internet_q9ahz81npfqz:2
55 4. search_internet_q9ahz81npfqz:3
56 
57 
58 Start: 146| End:152| Text:'Nvidia' 
59 Sources:
60 1. search_internet_kb9qgs1ps69e:0

Self-correction

The concept of sequential reasoning is useful in a broader sense, particularly where the agent needs to adapt and change its plan midway in a task.

In other words, it allows the agent to self-correct.

To illustrate this, let’s look at an example. Here, the user is asking about the authors of the sentence BERT paper.

The agent attempted to find required information via the search_developer_docs tool.

However, we know that the tool doesn’t contain this information because we have only added a small sample of documents.

As a result, the agent, having received the documents back without any relevant information, decides to search the internet instead. This is also helped by the fact that we have added specific instructions in the search_internet tool to search the internet for information not found in the developer documentation.

It finally has the information it needs, and uses it to answer the user’s question.

This highlights another important aspect of agentic RAG, which allows a RAG system to be flexible. This is achieved by powering the retrieval component with an LLM.

On the other hand, a standard RAG system would typically hand-engineer this, and hence, is more rigid.

PYTHON

1 messages = run_agent(
2     "Who are the authors of the sentence BERT paper?"
3 )

1 QUESTION:
2 Who are the authors of the sentence BERT paper?
3 ==================================================
4 TOOL PLAN:
5 I will search for the authors of the sentence BERT paper. 
6 
7 TOOL CALLS:
8 Tool name: search_developer_docs | Parameters: {"query":"authors of the sentence BERT paper"}
9 ==================================================
10 TOOL PLAN:
11 I was unable to find any information about the authors of the sentence BERT paper. I will now search for 'sentence BERT paper authors'. 
12 
13 TOOL CALLS:
14 Tool name: search_internet | Parameters: {"query":"sentence BERT paper authors"}
15 ==================================================
16 RESPONSE:
17 The authors of the Sentence-BERT paper are Nils Reimers and Iryna Gurevych.
18 ==================================================
19 CITATIONS:
20 
21 Start: 43| End:55| Text:'Nils Reimers' 
22 Sources:
23 1. search_internet_z8t19852my9q:0
24 2. search_internet_z8t19852my9q:1
25 3. search_internet_z8t19852my9q:2
26 4. search_internet_z8t19852my9q:3
27 5. search_internet_z8t19852my9q:4
28 
29 
30 Start: 60| End:75| Text:'Iryna Gurevych.' 
31 Sources:
32 1. search_internet_z8t19852my9q:0
33 2. search_internet_z8t19852my9q:1
34 3. search_internet_z8t19852my9q:2
35 4. search_internet_z8t19852my9q:3
36 5. search_internet_z8t19852my9q:4

Summary

In this tutorial, we learned about:

How multi-step tool calling works
How multi-step, parallel tool calling works
How multi-step tool calling enables an agent to self-correct, and hence, be more flexible

However, up until now, we have only worked with purely unstructured data, the type of data we typically encounter in a standard RAG system.

In the coming chapters, we’ll add another complexity to the agentic RAG system – working with semi-structured and structured data. This adds another dimension to the agent’s flexibility, which is dealing with a more diverse set of data sources.

In Part 4, we’ll learn how to build an agent that can perform faceted queries over semi-structured data.