Performing Tasks Sequentially with Cohere’s RAG
Compare two user queries to a RAG chatbot, “What was Apple’s revenue in 2023?” and “What was the revenue of the most valuable company in the US in 2023?“.
While the first query is straightforward to handle, the second query requires breaking down into two steps:
- Identify the most valuable company in the US in 2023
- Get the revenue of the company in 2023
These steps need to happen in a sequence rather than all at once. This is because the information retrieved from the first step is required to inform the second step.
This is an example of sequential reasoning. In this tutorial, we’ll learn how agentic RAG with Cohere handles sequential reasoning, and in particular:
- Multi-step tool calling
- Multi-step, parallel tool calling
- Self-correction
We’ll learn these by building an agent that answers questions about using Cohere.
Setup
To get started, first we need to install the cohere
library and create a Cohere client.
We also need to import the tool definitions that we’ll use in this tutorial.
tool_def.py
file in the same directory as this notebook for the imports to work correctly. Setting up the tools
We set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.
Running an agentic RAG workflow
We create a run_agent
function to run the agentic RAG workflow, the same as in Part 1. If you want further details on how to set up the tools, check out Part 1.
Multi-step tool calling
Let’s ask the agent a few questions, starting with this one about a specific feature. The user is asking about two things: a feature to reorder search results and code examples for that feature.
In this case, the agent first needs to identify what that feature is before it can answer the second part of the question.
This is reflected in the agent’s tool plan, which describes the steps it will take to answer the question.
So, it first calls the search_developer_docs
tool to find the feature.
It then discovers that the feature is Rerank. Using this information, it calls the search_code_examples
tool to find code examples for that feature.
Finally, it uses the retrieved information to answer both parts of the user’s question.
Multi-step, parallel tool calling
In Part 2, we saw how the Cohere API supports parallel tool calling, and in this tutorial, we looked at sequential tool calling. That also means that both scenarios can happen at the same time.
Here’s an example. Suppose we ask the agent to find the CEOs of the companies with the top 3 highest market capitalization.
In the first step, it searches the Internet for information about the 3 companies with the highest market capitalization.
And in the second step, it performs parallel searches for the CEOs of the 3 identified companies.
Self-correction
The concept of sequential reasoning is useful in a broader sense, particularly where the agent needs to adapt and change its plan midway in a task.
In other words, it allows the agent to self-correct.
To illustrate this, let’s look at an example. Here, the user is asking about the authors of the sentence BERT paper.
The agent attempted to find required information via the search_developer_docs
tool.
However, we know that the tool doesn’t contain this information because we have only added a small sample of documents.
As a result, the agent, having received the documents back without any relevant information, decides to search the internet instead. This is also helped by the fact that we have added specific instructions in the search_internet
tool to search the internet for information not found in the developer documentation.
It finally has the information it needs, and uses it to answer the user’s question.
This highlights another important aspect of agentic RAG, which allows a RAG system to be flexible. This is achieved by powering the retrieval component with an LLM.
On the other hand, a standard RAG system would typically hand-engineer this, and hence, is more rigid.
Summary
In this tutorial, we learned about:
- How multi-step tool calling works
- How multi-step, parallel tool calling works
- How multi-step tool calling enables an agent to self-correct, and hence, be more flexible
However, up until now, we have only worked with purely unstructured data, the type of data we typically encounter in a standard RAG system.
In the coming chapters, we’ll add another complexity to the agentic RAG system – working with semi-structured and structured data. This adds another dimension to the agent’s flexibility, which is dealing with a more diverse set of data sources.
In Part 4, we’ll learn how to build an agent that can perform faceted queries over semi-structured data.