Routing Queries to Data Sources

Open in Colab

Imagine a RAG system that can search over diverse sources, such as a website, a database, and a set of documents.

In a standard RAG setting, the application would aggregate retrieved documents from all the different sources it is connected to, which may contribute noise from less relevant documents.

Additionally, it doesn’t take into consideration that the a given data source might be less or more relevant to a query than others.

An agentic RAG system can solve this problem by routing queries to the most relevant tools based on the query’s nature. This is done by leveraging the tool use capabilities of the Chat endpoint.

In this tutorial, we’ll cover:

  • Setting up the tools
  • Running an agentic RAG workflow
  • Routing queries to tools

We’ll build an agent that can answer questions about using Cohere, equipped with a number of different tools.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

We also need to import the tool definitions from the tool_def.py file.

Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly.
PYTHON
1import json
2import os
3import cohere
4
5from tool_def import (
6 search_developer_docs,
7 search_developer_docs_tool,
8 search_internet,
9 search_internet_tool,
10 search_code_examples,
11 search_code_examples_tool,
12)
13
14# Get your free API key: https://dashboard.cohere.com/api-keys
15co = cohere.ClientV2("COHERE_API_KEY")

Setting up the tools

In an agentic RAG system, each data source is represented as a tool. A tool is broadly any function or service that can receive and send objects to the model. But in the case of RAG, this becomes a more specific case of a tool that takes a query as input and returns a set of documents.

Here, we are defining a Python function for each tool, but more broadly, the tool can be any function or service that can receive and send objects. Here are some specifics:

  • search_developer_docs: Searches Cohere developer documentation. In this tutorial, we are creating a small list of sample documents for simplicity, and will return the same list for every query. In practice, you will want to implement a search function, probably leveraging semantic search.
  • search_internet: Performs an internet search using Tavily search, which we take from LangChain’s implementation.
  • search_code_examples: Searches for Cohere code examples and tutorials. Here we are also creating a small list of sample documents for simplicity.

These functions are mapped to a dictionary called functions_map for easy access.

Check out this documentation on parameter types in tool use for further reading.

PYTHON
1from langchain_community.tools.tavily_search import (
2 TavilySearchResults,
3)
4
5
6def search_developer_docs(query: str) -> dict:
7
8 developer_docs = [
9 {
10 "text": "## The Rerank endpoint\nThis endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score."
11 },
12 {
13 "text": "## The Embed endpoint\nThis endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents.."
14 },
15 {
16 "text": "## Embed endpoint multilingual support\nIn addition to embed-english-v3.0 we offer a best-in-class multilingual model embed-multilingual-v3.0 with support for over 100 languages."
17 },
18 {
19 "text": "## The Chat endpoint\nThis endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses."
20 },
21 {
22 "text": "## Retrieval Augmented Generation (RAG)\nRAG is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response."
23 },
24 {
25 "text": "## The temperature parameter\nTemperature is a number used to tune the degree of randomness of a generated text."
26 },
27 ]
28
29 return developer_docs
30
31
32def search_internet(query: str) -> dict:
33 tool = TavilySearchResults(
34 max_results=5,
35 search_depth="advanced",
36 include_answer=True,
37 include_raw_content=True,
38 )
39 documents = tool.invoke({"query": query})
40
41 return documents
42
43
44def search_code_examples(query: str) -> dict:
45
46 code_examples = [
47 {"content": "Calendar Agent with Native Multi Step Tool"},
48 {
49 "content": "Wikipedia Semantic Search with Cohere Embedding Archives"
50 },
51 {"content": "RAG With Chat Embed and Rerank via Pinecone"},
52 {
53 "content": "Build Chatbots That Know Your Business with MongoDB and Cohere"
54 },
55 {"content": "Advanced Document Parsing For Enterprises"},
56 ]
57
58 return code_examples
59
60
61functions_map = {
62 "search_developer_docs": search_developer_docs,
63 "search_internet": search_internet,
64 "search_code_examples": search_code_examples,
65}

The second and final setup step is to define the tool schemas in a format that can be passed to the Chat endpoint. A tool schema must contain the name, description, and parameters fields, in the format shown below.

This schema informs the LLM about what the tool does, which enables an LLM to decide whether to use a particular tool. Therefore, the more descriptive and specific the schema, the more likely the LLM will make the right tool call decisions.

PYTHON
1search_developer_docs_tool = {
2 "type": "function",
3 "function": {
4 "name": "search_developer_docs",
5 "description": "Searches the Cohere developer documentation. Use this tool for queries related to the Cohere API, SDKs, or other developer resources.",
6 "parameters": {
7 "type": "object",
8 "properties": {
9 "query": {
10 "type": "string",
11 "description": "The search query.",
12 }
13 },
14 "required": ["query"],
15 },
16 },
17}
18
19search_internet_tool = {
20 "type": "function",
21 "function": {
22 "name": "search_internet",
23 "description": "Searches the internet. Use this tool for general queries that would not be found in the developer documentation.",
24 "parameters": {
25 "type": "object",
26 "properties": {
27 "query": {
28 "type": "string",
29 "description": "The search query.",
30 }
31 },
32 "required": ["query"],
33 },
34 },
35}
36
37search_code_examples_tool = {
38 "type": "function",
39 "function": {
40 "name": "search_code_examples",
41 "description": "Searches code examples and tutorials on using Cohere.",
42 "parameters": {
43 "type": "object",
44 "properties": {
45 "query": {
46 "type": "string",
47 "description": "The search query.",
48 }
49 },
50 "required": ["query"],
51 },
52 },
53}

Running an agentic RAG workflow

We can now run an agentic RAG workflow using tools. We can think of the system as consisting of four components:

  • The user
  • The application
  • The LLM
  • The tools

At its most basic, these four components interact in a workflow through four steps:

  • Step 1: Get user message – The LLM gets the user message (via the application)
  • Step 2: Tool planning and calling – The LLM makes a decision on the tools to call (if any) and generates - the tool calls
  • Step 3: Tool execution - The application executes the tools and the sends the results to the LLM
  • Step 4: Response and citation generation – The LLM generates the response and citations to back to the user

We wrap all these steps in a function called run_agent.

PYTHON
1tools = [
2 search_developer_docs_tool,
3 search_internet_tool,
4 search_code_examples_tool,
5]
PYTHON
1system_message = """## Task and Context
2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need."""
PYTHON
1model = "command-r-plus-08-2024"
2
3
4def run_agent(query, messages=None):
5 if messages is None:
6 messages = []
7
8 if "system" not in {m.get("role") for m in messages}:
9 messages.append({"role": "system", "content": system_message})
10
11 # Step 1: get user message
12 print(f"QUESTION:\n{query}")
13 print("=" * 50)
14
15 messages.append({"role": "user", "content": query})
16
17 # Step 2: Generate tool calls (if any)
18 response = co.chat(
19 model=model, messages=messages, tools=tools, temperature=0.1
20 )
21
22 while response.message.tool_calls:
23
24 print("TOOL PLAN:")
25 print(response.message.tool_plan, "\n")
26 print("TOOL CALLS:")
27 for tc in response.message.tool_calls:
28 print(
29 f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
30 )
31 print("=" * 50)
32
33 messages.append(
34 {
35 "role": "assistant",
36 "tool_calls": response.message.tool_calls,
37 "tool_plan": response.message.tool_plan,
38 }
39 )
40
41 # Step 3: Get tool results
42 for tc in response.message.tool_calls:
43 tool_result = functions_map[tc.function.name](
44 **json.loads(tc.function.arguments)
45 )
46 tool_content = []
47 for data in tool_result:
48 tool_content.append(
49 {
50 "type": "document",
51 "document": {"data": json.dumps(data)},
52 }
53 )
54 # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
55 messages.append(
56 {
57 "role": "tool",
58 "tool_call_id": tc.id,
59 "content": tool_content,
60 }
61 )
62
63 # Step 4: Generate response and citations
64 response = co.chat(
65 model=model,
66 messages=messages,
67 tools=tools,
68 temperature=0.1,
69 )
70
71 messages.append(
72 {
73 "role": "assistant",
74 "content": response.message.content[0].text,
75 }
76 )
77
78 # Print final response
79 print("RESPONSE:")
80 print(response.message.content[0].text)
81 print("=" * 50)
82
83 # Print citations (if any)
84 verbose_source = (
85 False # Change to True to display the contents of a source
86 )
87 if response.message.citations:
88 print("CITATIONS:\n")
89 for citation in response.message.citations:
90 print(
91 f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
92 )
93 print("Sources:")
94 for idx, source in enumerate(citation.sources):
95 print(f"{idx+1}. {source.id}")
96 if verbose_source:
97 print(f"{source.tool_output}")
98 print("\n")
99
100 return messages

Routing queries to tools

Let’s ask the agent a few questions, starting with one about the Embed endpoint.

Because this concerns a specific feature, the agent decides to use the search_developer_docs tool (instead of retrieving from all the data sources it’s connected to).

It first generates a tool plan that describes how it will handle the query. Then, it generates a call to the search_developer_docs tool with the associated query parameter.

The tool does indeed contain the information asked by the user, which the agent then uses to generate its response.

PYTHON
1messages = run_agent("How many languages does Embed support?")
QUESTION:
How many languages does Embed support?
==================================================
TOOL PLAN:
I will search for 'How many languages does Embed support?'
TOOL CALLS:
Tool name: search_developer_docs | Parameters: {"query":"How many languages does Embed support?"}
==================================================
RESPONSE:
The Embed endpoint supports over 100 languages.
==================================================
CITATIONS:
Start: 28| End:47| Text:'over 100 languages.'
Sources:
1. search_developer_docs_1s5qxhyswydy:2

Let’s now ask the agent a question about the authors of the sentence BERT paper. This information is not likely to be found in the developer documentation or code examples because it is not Cohere-specific, so we can expect the agent to use the internet search tool.

And this is exactly what the agent does. This time, it decides to use the search_internet tool, triggers the search through Tavily search, and uses the results to generate its response.

PYTHON
1messages = run_agent(
2 "Who are the authors of the sentence BERT paper?"
3)
QUESTION:
Who are the authors of the sentence BERT paper?
==================================================
TOOL PLAN:
I will search for the authors of the sentence BERT paper.
TOOL CALLS:
Tool name: search_internet | Parameters: {"query":"authors of the sentence BERT paper"}
==================================================
RESPONSE:
Nils Reimers and Iryna Gurevych are the authors of the sentence BERT paper.
==================================================
CITATIONS:
Start: 0| End:4| Text:'Nils'
Sources:
1. search_internet_5am6cjesgdry:1
Start: 5| End:12| Text:'Reimers'
Sources:
1. search_internet_5am6cjesgdry:0
2. search_internet_5am6cjesgdry:1
3. search_internet_5am6cjesgdry:3
Start: 17| End:22| Text:'Iryna'
Sources:
1. search_internet_5am6cjesgdry:1
Start: 23| End:31| Text:'Gurevych'
Sources:
1. search_internet_5am6cjesgdry:0
2. search_internet_5am6cjesgdry:1
3. search_internet_5am6cjesgdry:3

Let’s ask the agent a final question, this time about tutorials that are relevant for enterprises.

Again, the agent uses the context of the query to decide on the most relevant tool. In this case, it selects the search_code_examples tool and provides a response based on the information found.

PYTHON
1messages = run_agent(
2 "Any tutorials that are relevant for enterprises?"
3)
QUESTION:
Any tutorials that are relevant for enterprises?
==================================================
TOOL PLAN:
I will search for 'tutorials for enterprises'.
TOOL CALLS:
Tool name: search_code_examples | Parameters: {"query":"tutorials for enterprises"}
==================================================
RESPONSE:
I found one tutorial that is relevant for enterprises: Advanced Document Parsing For Enterprises.
==================================================
CITATIONS:
Start: 55| End:97| Text:'Advanced Document Parsing For Enterprises.'
Sources:
1. search_code_examples_zkx3c2z7gzrs:4

Summary

In this tutorial, we learned about:

  • How to set up tools in an agentic RAG system
  • How to run an agentic RAG workflow
  • How to automatically route queries to the most relevant data sources

However, so far we have only seen rather simple queries. In practice, we may run into a complex query that needs to simplified, optimized, or split (etc.) before we can perform the retrieval.

In Part 2, we’ll learn how to build an agentic RAG system that can expand user queries into parallel queries.

Built with