Generating Parallel Queries
Compare two user queries posed to a RAG chatbot:
- “What was Apple’s revenue in 2023?"
- "What were Apple’s and Google’s revenue in 2023?“.
The first query is straightforward, as we can perform retrieval using pretty much the same query.
But the second query is more complex. We need to break it down into two separate queries, one for Apple and one for Google.
This is an example that requires query expansion, in which the agentic RAG application will expand a single query into a set of queries and then run them in parallel. Here are some of the ways in which that can work:
- Basic query expansion
- Query expansion over multiple data sources
- Query expansion in multi-turn conversations
We’ll learn about these options by building an agent that answers questions about using Cohere.
Setup
To get started, first we need to install the cohere
library and create a Cohere client.
We also need to import the tool definitions from the tool_def.py
file.
tool_def.py
file in the same directory as this notebook for the imports to work correctly. Note: the source code for tool definitions can be found here
Setting up the tools
We set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.
Running an agentic RAG workflow
We create a run_agent
function to run the agentic RAG workflow, just as we did in Part 1. If you want further details on how to set up the tools, check out Part 1.
Query expansion
Let’s ask the agent a few questions, starting with this one about the Chat endpoint and the RAG feature.
Firstly, the agent rightly chooses the search_developer_docs
tool to retrieve the information it needs.
Additionally, because the question asks about two different things, retrieving information using the same query as the user’s may not be the most optimal approach. Instead, the query needs to be expanded or split into multiple parts, each retrieving its own set of documents.
Thus, the agent expands the original query into two queries.
This is enabled by the parallel tool calling feature that comes with the Chat endpoint.
This results in a richer and more representative list of documents retrieved, and therefore a more accurate and comprehensive answer.
Query expansion over multiple data sources
The earlier example focused on a single data source, the Cohere developer documentation. However, the agentic RAG can also perform query expansion over multiple data sources.
Here, the agent is asked a question that contains two parts: first asking for an explanation of the Embed endpoint and then asking for code examples.
It correctly identifies that this requires both searching the developer documentation and the code examples. Thus, it generates two queries, one for each data source, and performs two separate searches in parallel.
Its response then contains information referenced from both data sources.
Query expansion in multi-turn conversations
A RAG chatbot needs to be able to infer the user’s intent for a given query, which is sometimes based on vague context.
This is especially important in multi-turn conversations, where the user’s intent may not be clear from a single query.
For example, in the first turn, a user might ask “What is A” and in the second, they might ask “Compare that with B and C”. So, the agent needs to be able to infer that the user’s intent is to compare A with B and C.
Let’s see an illustration of this. First, note that the run_agent
function is already set up to handle multi-turn conversations. It can take messages from the previous conversation turns and append them to the messages
list.
In the first turn, the user asks about the Chat endpoint, to which the agent duly responds.
In the second turn, the user asks a question that has two parts: first, how it’s different from RAG, and then, for code examples.
We pass the messages from the previous conversation turn to the run_agent
function.
Because of this, the agent is able to infer that the question is referring to the Chat endpoint even though the user didn’t explicitly mention it.
The agent then expands the query into two separate queries, one for the search_code_examples
tool and one for the search_developer_docs
tool.
Summary
In this tutorial, we learned about:
- How query expansion works in an agentic RAG system
- How query expansion works over multiple data sources
- How query expansion works in multi-turn conversations
Having said that, we may encounter even more complex queries—especially those that require sequential reasoning over multiple steps.
In Part 3, we’ll learn how the agentic RAG system can perform sequential reasoning.