Generating Parallel Queries

Open in Colab

Compare two user queries posed to a RAG chatbot:

  • “What was Apple’s revenue in 2023?"
  • "What were Apple’s and Google’s revenue in 2023?“.

The first query is straightforward, as we can perform retrieval using pretty much the same query.

But the second query is more complex. We need to break it down into two separate queries, one for Apple and one for Google.

This is an example that requires query expansion, in which the agentic RAG application will expand a single query into a set of queries and then run them in parallel. Here are some of the ways in which that can work:

  • Basic query expansion
  • Query expansion over multiple data sources
  • Query expansion in multi-turn conversations

We’ll learn about these options by building an agent that answers questions about using Cohere.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

We also need to import the tool definitions from the tool_def.py file.

Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly.
PYTHON
1import json
2import os
3import cohere
4
5from tool_def import (
6 search_developer_docs,
7 search_developer_docs_tool,
8 search_internet,
9 search_internet_tool,
10 search_code_examples,
11 search_code_examples_tool,
12)
13
14# Get your free API key: https://dashboard.cohere.com/api-keys
15co = cohere.ClientV2("COHERE_API_KEY")

Note: the source code for tool definitions can be found here

Setting up the tools

We set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON
1functions_map = {
2 "search_developer_docs": search_developer_docs,
3 "search_internet": search_internet,
4 "search_code_examples": search_code_examples,
5}

Running an agentic RAG workflow

We create a run_agent function to run the agentic RAG workflow, just as we did in Part 1. If you want further details on how to set up the tools, check out Part 1.

PYTHON
1tools = [
2 search_developer_docs_tool,
3 search_internet_tool,
4 search_code_examples_tool,
5]
PYTHON
1system_message = """## Task and Context
2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need."""
PYTHON
1model = "command-r-plus-08-2024"
2
3
4def run_agent(query, messages=None):
5 if messages is None:
6 messages = []
7
8 if "system" not in {m.get("role") for m in messages}:
9 messages.append({"role": "system", "content": system_message})
10
11 # Step 1: get user message
12 print(f"QUESTION:\n{query}")
13 print("=" * 50)
14
15 messages.append({"role": "user", "content": query})
16
17 # Step 2: Generate tool calls (if any)
18 response = co.chat(
19 model=model, messages=messages, tools=tools, temperature=0.1
20 )
21
22 while response.message.tool_calls:
23
24 print("TOOL PLAN:")
25 print(response.message.tool_plan, "\n")
26 print("TOOL CALLS:")
27 for tc in response.message.tool_calls:
28 print(
29 f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
30 )
31 print("=" * 50)
32
33 messages.append(
34 {
35 "role": "assistant",
36 "tool_calls": response.message.tool_calls,
37 "tool_plan": response.message.tool_plan,
38 }
39 )
40
41 # Step 3: Get tool results
42 for tc in response.message.tool_calls:
43 tool_result = functions_map[tc.function.name](
44 **json.loads(tc.function.arguments)
45 )
46 tool_content = []
47 for data in tool_result:
48 tool_content.append(
49 {
50 "type": "document",
51 "document": {"data": json.dumps(data)},
52 }
53 )
54 # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
55 messages.append(
56 {
57 "role": "tool",
58 "tool_call_id": tc.id,
59 "content": tool_content,
60 }
61 )
62
63 # Step 4: Generate response and citations
64 response = co.chat(
65 model=model,
66 messages=messages,
67 tools=tools,
68 temperature=0.1,
69 )
70
71 messages.append(
72 {
73 "role": "assistant",
74 "content": response.message.content[0].text,
75 }
76 )
77
78 # Print final response
79 print("RESPONSE:")
80 print(response.message.content[0].text)
81 print("=" * 50)
82
83 # Print citations (if any)
84 verbose_source = (
85 False # Change to True to display the contents of a source
86 )
87 if response.message.citations:
88 print("CITATIONS:\n")
89 for citation in response.message.citations:
90 print(
91 f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
92 )
93 print("Sources:")
94 for idx, source in enumerate(citation.sources):
95 print(f"{idx+1}. {source.id}")
96 if verbose_source:
97 print(f"{source.tool_output}")
98 print("\n")
99
100 return messages

Query expansion

Let’s ask the agent a few questions, starting with this one about the Chat endpoint and the RAG feature.

Firstly, the agent rightly chooses the search_developer_docs tool to retrieve the information it needs.

Additionally, because the question asks about two different things, retrieving information using the same query as the user’s may not be the most optimal approach. Instead, the query needs to be expanded or split into multiple parts, each retrieving its own set of documents.

Thus, the agent expands the original query into two queries.

This is enabled by the parallel tool calling feature that comes with the Chat endpoint.

This results in a richer and more representative list of documents retrieved, and therefore a more accurate and comprehensive answer.

PYTHON
1messages = run_agent("Explain the Chat endpoint and the RAG feature")
QUESTION:
Explain the Chat endpoint and the RAG feature
==================================================
TOOL PLAN:
I will search for 'Chat endpoint' and 'RAG feature' to find the information I need to answer the question.
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"Chat endpoint"}
Tool name: search_developer_docs | Parameters: {"query":"RAG feature"}
==================================================
RESPONSE:
The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.
Retrieval Augmented Generation (RAG) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response.
==================================================
CITATIONS:
Start: 4| End:17| Text:'Chat endpoint'
Sources:
1. search_developer_docs_7qmwn64d2e8v:3
2. search_developer_docs_0tyvan5t0f7t:3
Start: 32| End:56| Text:'conversational interface'
Sources:
1. search_developer_docs_7qmwn64d2e8v:3
2. search_developer_docs_0tyvan5t0f7t:3
Start: 67| End:102| Text:'users to send messages to the model'
Sources:
1. search_developer_docs_7qmwn64d2e8v:3
2. search_developer_docs_0tyvan5t0f7t:3
Start: 107| End:130| Text:'receive text responses.'
Sources:
1. search_developer_docs_7qmwn64d2e8v:3
2. search_developer_docs_0tyvan5t0f7t:3
Start: 132| End:162| Text:'Retrieval Augmented Generation'
Sources:
1. search_developer_docs_7qmwn64d2e8v:4
2. search_developer_docs_0tyvan5t0f7t:4
Start: 163| End:168| Text:'(RAG)'
Sources:
1. search_developer_docs_7qmwn64d2e8v:4
2. search_developer_docs_0tyvan5t0f7t:4
Start: 174| End:200| Text:'method for generating text'
Sources:
1. search_developer_docs_7qmwn64d2e8v:4
2. search_developer_docs_0tyvan5t0f7t:4
Start: 207| End:266| Text:'additional information fetched from an external data source'
Sources:
1. search_developer_docs_7qmwn64d2e8v:4
2. search_developer_docs_0tyvan5t0f7t:4
Start: 278| End:324| Text:'greatly increase the accuracy of the response.'
Sources:
1. search_developer_docs_7qmwn64d2e8v:4
2. search_developer_docs_0tyvan5t0f7t:4

Query expansion over multiple data sources

The earlier example focused on a single data source, the Cohere developer documentation. However, the agentic RAG can also perform query expansion over multiple data sources.

Here, the agent is asked a question that contains two parts: first asking for an explanation of the Embed endpoint and then asking for code examples.

It correctly identifies that this requires both searching the developer documentation and the code examples. Thus, it generates two queries, one for each data source, and performs two separate searches in parallel.

Its response then contains information referenced from both data sources.

PYTHON
1messages = run_agent(
2 "What is the Embed endpoint? Give me some code tutorials"
3)
QUESTION:
What is the Embed endpoint? Give me some code tutorials
==================================================
TOOL PLAN:
I will search for 'What is the Embed endpoint?' and 'code tutorials for Embed endpoint'.
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"What is the Embed endpoint?"}
Tool name: search_code_examples | Parameters: {"query":"code tutorials for Embed endpoint"}
==================================================
RESPONSE:
The Embed endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents.
In addition to embed-english-v3.0, we offer a best-in-class multilingual model embed-multilingual-v3.0 with support for over 100 languages.
Here are some code tutorials for the Embed endpoint:
- Wikipedia Semantic Search with Cohere Embedding Archives
- RAG With Chat Embed and Rerank via Pinecone
==================================================
CITATIONS:
Start: 19| End:43| Text:'returns text embeddings.'
Sources:
1. search_developer_docs_n6n5ka20389e:1
Start: 47| End:162| Text:'embedding is a list of floating point numbers that captures semantic information about the text that it represents.'
Sources:
1. search_developer_docs_n6n5ka20389e:1
Start: 179| End:197| Text:'embed-english-v3.0'
Sources:
1. search_developer_docs_n6n5ka20389e:2
Start: 210| End:303| Text:'best-in-class multilingual model embed-multilingual-v3.0 with support for over 100 languages.'
Sources:
1. search_developer_docs_n6n5ka20389e:2
Start: 360| End:416| Text:'Wikipedia Semantic Search with Cohere Embedding Archives'
Sources:
1. search_code_examples_680znd4ycmm3:1
Start: 419| End:462| Text:'RAG With Chat Embed and Rerank via Pinecone'
Sources:
1. search_code_examples_680znd4ycmm3:2

Query expansion in multi-turn conversations

A RAG chatbot needs to be able to infer the user’s intent for a given query, which is sometimes based on vague context.

This is especially important in multi-turn conversations, where the user’s intent may not be clear from a single query.

For example, in the first turn, a user might ask “What is A” and in the second, they might ask “Compare that with B and C”. So, the agent needs to be able to infer that the user’s intent is to compare A with B and C.

Let’s see an illustration of this. First, note that the run_agent function is already set up to handle multi-turn conversations. It can take messages from the previous conversation turns and append them to the messages list.

In the first turn, the user asks about the Chat endpoint, to which the agent duly responds.

PYTHON
1messages = run_agent("What is the Chat endpoint?")
QUESTION:
What is the Chat endpoint?
==================================================
TOOL PLAN:
I will search for 'What is the Chat endpoint?'
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"What is the Chat endpoint?"}
==================================================
RESPONSE:
The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.
==================================================
CITATIONS:
Start: 18| End:130| Text:'facilitates a conversational interface, allowing users to send messages to the model and receive text responses.'
Sources:
1. search_developer_docs_91yqedvwtgkj:3

In the second turn, the user asks a question that has two parts: first, how it’s different from RAG, and then, for code examples.

We pass the messages from the previous conversation turn to the run_agent function.

Because of this, the agent is able to infer that the question is referring to the Chat endpoint even though the user didn’t explicitly mention it.

The agent then expands the query into two separate queries, one for the search_code_examples tool and one for the search_developer_docs tool.

PYTHON
1messages = run_agent(
2 "How is it different from RAG? Also any code tutorials?", messages
3)
QUESTION:
How is it different from RAG? Also any code tutorials?
==================================================
TOOL PLAN:
I will search for 'How is Chat endpoint different from RAG?' and 'Chat endpoint code tutorials'.
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"How is Chat endpoint different from RAG?"}
Tool name: search_code_examples | Parameters: {"query":"Chat endpoint code tutorials"}
==================================================
RESPONSE:
The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses. Retrieval Augmented Generation (RAG) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response.
Here are some code tutorials:
- RAG with Chat Embed and Rerank via Pinecone
- Build chatbots that know your business with MongoDB and Cohere
==================================================
CITATIONS:
Start: 32| End:56| Text:'conversational interface'
Sources:
1. search_developer_docs_93b1gz27dq9d:3
Start: 76| End:102| Text:'send messages to the model'
Sources:
1. search_developer_docs_93b1gz27dq9d:3
Start: 107| End:130| Text:'receive text responses.'
Sources:
1. search_developer_docs_93b1gz27dq9d:3
Start: 131| End:161| Text:'Retrieval Augmented Generation'
Sources:
1. search_developer_docs_93b1gz27dq9d:4
Start: 162| End:167| Text:'(RAG)'
Sources:
1. search_developer_docs_93b1gz27dq9d:4
Start: 184| End:199| Text:'generating text'
Sources:
1. search_developer_docs_93b1gz27dq9d:4
Start: 206| End:265| Text:'additional information fetched from an external data source'
Sources:
1. search_developer_docs_93b1gz27dq9d:4
Start: 277| End:323| Text:'greatly increase the accuracy of the response.'
Sources:
1. search_developer_docs_93b1gz27dq9d:4
Start: 357| End:400| Text:'RAG with Chat Embed and Rerank via Pinecone'
Sources:
1. search_code_examples_qj3q45zxk8gz:2
Start: 403| End:465| Text:'Build chatbots that know your business with MongoDB and Cohere'
Sources:
1. search_code_examples_qj3q45zxk8gz:3

Summary

In this tutorial, we learned about:

  • How query expansion works in an agentic RAG system
  • How query expansion works over multiple data sources
  • How query expansion works in multi-turn conversations

Having said that, we may encounter even more complex queries—especially those that require sequential reasoning over multiple steps.

In Part 3, we’ll learn how the agentic RAG system can perform sequential reasoning.

Built with