Performing Tasks Sequentially

Open in Colab

Compare two user queries posed to a RAG chatbot:

  • “What was Apple’s revenue in 2023?"
  • "What was the revenue of the most valuable company in the US in 2023?”

While the first query is straightforward, the second query requires breaking down into two steps:

  1. Identify the most valuable company in the US in 2023
  2. Get the revenue of the company in 2023

These steps need to happen in a sequence rather than all at once, because the information retrieved from the first step is required for the second step.

This is an example of sequential reasoning. In this tutorial, we’ll learn how agentic RAG with Cohere handles sequential reasoning, and in particular:

  • Multi-step tool calling
  • Multi-step, parallel tool calling
  • Self-correction

We’ll learn these by building an agent that answers questions about using Cohere.

Setup

First, we need to install the cohere library and create a Cohere client.

PYTHON
1import json
2import os
3import cohere
4
5from tool_def import (
6 search_developer_docs,
7 search_developer_docs_tool,
8 search_internet,
9 search_internet_tool,
10 search_code_examples,
11 search_code_examples_tool,
12)
13
14co = cohere.ClientV2("COHERE_API_KEY") # Get your free API key: https://dashboard.cohere.com/api-keys

Note: the source code for tool definitions can be found here

Setting up the tools

We set up the same set of tools as in Part 1, so check that out if you want further details on how to set up the tools.

PYTHON
1functions_map = {
2 "search_developer_docs": search_developer_docs,
3 "search_internet": search_internet,
4 "search_code_examples": search_code_examples,
5}

Running an agentic RAG workflow

We create a run_agent function to run the agentic RAG workflow, as in Part 1.

PYTHON
1tools = [
2 search_developer_docs_tool,
3 search_internet_tool,
4 search_code_examples_tool
5]
PYTHON
1system_message="""## Task and Context
2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need."""
PYTHON
1model = "command-r-plus-08-2024"
2
3def run_agent(query, messages=None):
4 if messages is None:
5 messages = []
6
7 if "system" not in {m.get("role") for m in messages}:
8 messages.append({"role": "system", "content": system_message})
9
10 # Step 1: get user message
11 print(f"QUESTION:\n{query}")
12 print("="*50)
13
14 messages.append({"role": "user", "content": query})
15
16 # Step 2: Generate tool calls (if any)
17 response = co.chat(
18 model=model,
19 messages=messages,
20 tools=tools,
21 temperature=0.1
22 )
23
24 while response.message.tool_calls:
25
26 print("TOOL PLAN:")
27 print(response.message.tool_plan,"\n")
28 print("TOOL CALLS:")
29 for tc in response.message.tool_calls:
30 print(f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}")
31 print("="*50)
32
33 messages.append({"role": "assistant", "tool_calls": response.message.tool_calls, "tool_plan": response.message.tool_plan})
34
35 # Step 3: Get tool results
36 for tc in response.message.tool_calls:
37 tool_result = functions_map[tc.function.name](**json.loads(tc.function.arguments))
38 tool_content = []
39 for data in tool_result:
40 tool_content.append({"type": "document", "document": {"data": json.dumps(data)}})
41 # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
42 messages.append({"role": "tool", "tool_call_id": tc.id, "content": tool_content})
43
44 # Step 4: Generate response and citations
45 response = co.chat(
46 model=model,
47 messages=messages,
48 tools=tools,
49 temperature=0.1
50 )
51
52 messages.append({"role": "assistant", "content": response.message.content[0].text})
53
54 # Print final response
55 print("RESPONSE:")
56 print(response.message.content[0].text)
57 print("="*50)
58
59 # Print citations (if any)
60 verbose_source = False # Change to True to display the contents of a source
61 if response.message.citations:
62 print("CITATIONS:\n")
63 for citation in response.message.citations:
64 print(f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' ")
65 print("Sources:")
66 for idx, source in enumerate(citation.sources):
67 print(f"{idx+1}. {source.id}")
68 if verbose_source:
69 print(f"{source.tool_output}")
70 print("\n")
71
72 return messages

Multi-step tool calling

Let’s ask the agent a few questions, starting with this one about a specific feature. The user is asking about two things:

  • A feature to reorder search results, and;
  • Code examples for that feature;

In this case, the agent first needs to identify what that feature is before it can answer the second part of the question.

This is reflected in the agent’s tool plan, which describes the steps it will take to answer the question.

So, it first calls the search_developer_docs tool to find the feature.

It then discovers that the feature is Rerank. Using this information, it calls the search_code_examples tool to find code examples for that feature.

Finally, it uses the retrieved information to answer both parts of the user’s question.

PYTHON
1messages = run_agent("What's the Cohere feature to reorder search results? Do you have any code examples on that?")
QUESTION:
What's the Cohere feature to reorder search results? Do you have any code examples on that?
==================================================
TOOL PLAN:
I will search for the Cohere feature to reorder search results. Then I will search for code examples on that.
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"reorder search results"}
==================================================
TOOL PLAN:
I found that the Cohere feature to reorder search results is called the Rerank endpoint. I will now search for code examples on this.
TOOL CALLS:
Tool name: search_code_examples | Parameters: {"query":"Rerank endpoint"}
==================================================
RESPONSE:
The Cohere feature to reorder search results is the Rerank endpoint. This endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score.
Here is a code example that uses the Rerank endpoint:
RAG With Chat Embed and Rerank via Pinecone
==================================================
CITATIONS:
Start: 52| End:68| Text:'Rerank endpoint.'
Sources:
1. search_developer_docs_07rw24b2sa29:0
Start: 83| End:192| Text:'takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score.'
Sources:
1. search_developer_docs_07rw24b2sa29:0
Start: 249| End:292| Text:'RAG With Chat Embed and Rerank via Pinecone'
Sources:
1. search_code_examples_p6g6g21ev0re:2

Multi-step, parallel tool calling

In Part 2, we saw how the Cohere API supports tool calling in parallel and now in a sequence. That also means that both scenarios can happen at the same time.

Here’s an examples. Suppose we ask the agent to find the leaders of the top 3 countries with the largest oil reserves.

In the first step, it searches the Internet for information about the 3 countries with the largest oil reserves.

And in the second step, it performs parallel searches for the leaders of the 3 identified countries.

PYTHON
1messages = run_agent("Who are the leaders of the top 3 countries with the largest oil reserves.")
QUESTION:
Who are the leaders of the top 3 countries with the largest oil reserves.
==================================================
TOOL PLAN:
I will search for the top 3 countries with the largest oil reserves. Then I will search for the leaders of each of these countries.
TOOL CALLS:
Tool name: search_internet | Parameters: {"query":"top 3 countries with the largest oil reserves"}
==================================================
TOOL PLAN:
I have found that the top three countries with the largest oil reserves are Venezuela, Saudi Arabia and Canada. Now I need to find out who the leaders of these countries are.
TOOL CALLS:
Tool name: search_internet | Parameters: {"query":"Who is the leader of Venezuela?"}
Tool name: search_internet | Parameters: {"query":"Who is the leader of Saudi Arabia?"}
Tool name: search_internet | Parameters: {"query":"Who is the leader of Canada?"}
==================================================
RESPONSE:
The top three countries with the largest oil reserves are Venezuela, Saudi Arabia, and Canada.
The leader of Venezuela is Nicolás Maduro. Maduro was born on November 23, 1962, in Caracas, Venezuela. He won a special election in April 2013 to serve out the remainder of the term of Venezuelan President Hugo Chávez, who had died in March. Maduro, a former labor leader, became the interim president following Chávez's death.
The leader of Saudi Arabia is Mohammed bin Salman. He was born on August 31, 1985, and is the eldest child of Salman bin Abdulaziz Al Saud and his third wife, Fahda bint Falah bin Sultan bin Hathleen al-Ajmi, the daughter of the head of a powerful Arabian tribe, known as the Al Ajman.
The leader of Canada is Justin Trudeau. He was born on December 25, 1971, in Ottawa, Canada. He is the oldest son of former prime minister Pierre Trudeau and his wife, Margaret. Trudeau is the 23rd Prime Minister of Canada and the proud father of Xavier, Ella-Grace, and Hadrien.
==================================================
CITATIONS:
Start: 58| End:67| Text:'Venezuela'
Sources:
1. search_internet_bf18bye7vnst:0
2. search_internet_bf18bye7vnst:1
3. search_internet_bf18bye7vnst:3
Start: 69| End:81| Text:'Saudi Arabia'
Sources:
1. search_internet_bf18bye7vnst:0
2. search_internet_bf18bye7vnst:1
3. search_internet_bf18bye7vnst:3
Start: 87| End:94| Text:'Canada.'
Sources:
1. search_internet_bf18bye7vnst:0
2. search_internet_bf18bye7vnst:3
Start: 123| End:138| Text:'Nicolás Maduro.'
Sources:
1. search_internet_m3014vh1k2sn:0
2. search_internet_m3014vh1k2sn:1
3. search_internet_m3014vh1k2sn:2
4. search_internet_m3014vh1k2sn:3
Start: 158| End:199| Text:'November 23, 1962, in Caracas, Venezuela.'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 209| End:239| Text:'special election in April 2013'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 303| End:314| Text:'Hugo Chávez'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 324| End:338| Text:'died in March.'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 349| End:368| Text:'former labor leader'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 381| End:398| Text:'interim president'
Sources:
1. search_internet_m3014vh1k2sn:0
Start: 456| End:476| Text:'Mohammed bin Salman.'
Sources:
1. search_internet_sph85190s567:1
2. search_internet_sph85190s567:2
3. search_internet_sph85190s567:3
4. search_internet_sph85190s567:4
Start: 492| End:507| Text:'August 31, 1985'
Sources:
1. search_internet_sph85190s567:2
Start: 520| End:711| Text:'eldest child of Salman bin Abdulaziz Al Saud and his third wife, Fahda bint Falah bin Sultan bin Hathleen al-Ajmi, the daughter of the head of a powerful Arabian tribe, known as the Al Ajman.'
Sources:
1. search_internet_sph85190s567:2
Start: 737| End:752| Text:'Justin Trudeau.'
Sources:
1. search_internet_b3xre9say1kk:0
2. search_internet_b3xre9say1kk:1
3. search_internet_b3xre9say1kk:2
4. search_internet_b3xre9say1kk:3
5. search_internet_b3xre9say1kk:4
Start: 768| End:805| Text:'December 25, 1971, in Ottawa, Canada.'
Sources:
1. search_internet_b3xre9say1kk:0
2. search_internet_b3xre9say1kk:2
3. search_internet_b3xre9say1kk:4
Start: 816| End:890| Text:'oldest son of former prime minister Pierre Trudeau and his wife, Margaret.'
Sources:
1. search_internet_b3xre9say1kk:4
Start: 906| End:935| Text:'23rd Prime Minister of Canada'
Sources:
1. search_internet_b3xre9say1kk:0
2. search_internet_b3xre9say1kk:1
3. search_internet_b3xre9say1kk:2
Start: 944| End:992| Text:'proud father of Xavier, Ella-Grace, and Hadrien.'
Sources:
1. search_internet_b3xre9say1kk:0
2. search_internet_b3xre9say1kk:2

Self-correction

The concept of sequential reasoning is useful in a broader sense, particularly where the agent needs to adapt and change its plan midway in a task.

In other words, it allows the agent to self-correct.

To illustrate this, let’s look at an example. Here, the user is asking about the Cohere safety mode feature.

Given the nature of the question, the agent correctly identifies that it needs to find required information via the search_developer_docs tool.

However, we know that the tool doesn’t contain this information because we have only added a small sample of documents.

As a result, the agent, having received the documents back without any relevant information, decides to search the internet instead. This is also helped by the fact that we have added specific instructions in the search_internet tool to search the internet for information not found in the developer documentation.

It finally has the information it needs, and uses it to answer the user’s question.

This highlights another important aspect of agentic RAG, which allows a RAG system to be flexible. This is achieved by powering the retrieval component with an LLM.

On the other hand, a standard RAG system would typically hand-engineer this, and hence, is more rigid.

PYTHON
1messages = run_agent("How does the Cohere safety mode feature work.")
QUESTION:
How does the Cohere safety mode feature work.
==================================================
TOOL PLAN:
I will search for 'How does the Cohere safety mode feature work?'
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"How does the Cohere safety mode feature work?"}
==================================================
TOOL PLAN:
I could not find any information about the Cohere safety mode feature in the developer documentation. I will now search the internet to see if I can find any information.
TOOL CALLS:
Tool name: search_internet | Parameters: {"query":"How does the Cohere safety mode feature work?"}
==================================================
RESPONSE:
Cohere's Safety Modes aim to illustrate what model behaviours will look like under specific scenarios, thereby introducing a nuanced approach that is sensitive to context. By transparently communicating the strengths and boundaries of each mode, Cohere intends to set clear usage expectations while keeping safety as its top priority.
Safety Modes work with Cohere's newest refreshed models, but not with older iterations. Users can switch between modes by simply adding the safety_mode parameter and choosing one of the options below.
==================================================
CITATIONS:
Start: 29| End:101| Text:'illustrate what model behaviours will look like under specific scenarios'
Sources:
1. search_internet_64qs25r4ssd6:1
Start: 125| End:171| Text:'nuanced approach that is sensitive to context.'
Sources:
1. search_internet_64qs25r4ssd6:1
Start: 175| End:244| Text:'transparently communicating the strengths and boundaries of each mode'
Sources:
1. search_internet_64qs25r4ssd6:1
Start: 264| End:292| Text:'set clear usage expectations'
Sources:
1. search_internet_64qs25r4ssd6:1
Start: 299| End:334| Text:'keeping safety as its top priority.'
Sources:
1. search_internet_64qs25r4ssd6:1
Start: 349| End:423| Text:'work with Cohere's newest refreshed models, but not with older iterations.'
Sources:
1. search_internet_64qs25r4ssd6:0
Start: 424| End:536| Text:'Users can switch between modes by simply adding the safety_mode parameter and choosing one of the options below.'
Sources:
1. search_internet_64qs25r4ssd6:0

Summary

In this tutorial, we learned about:

  • How multi-step tool calling works
  • How multi-step, parallel tool calling works
  • How multi-step tool calling enables an agent to self-correct, and hence, be more flexible

However, up until now, we have only worked with purely unstructured data, the type of data we typically encounter in a standard RAG system.

In the coming chapters, we’ll add another layer of complexity to the agentic RAG system – working with semi-structured and structured data. This adds another dimension to the agent’s flexibility, which is dealing with a more diverse set of data sources.

In Part 4, we’ll learn how to build an agent that can perform faceted queries over semi-structured data.