Routing Queries to Data Sources
Imagine a RAG system that can search over diverse sources, such as a website, a database, and a set of documents.
In a standard RAG setting, the application would aggregate retrieved documents from all the different sources it is connected to, which may contribute noise from less relevant documents.
Additionally, it doesn’t take into consideration that the a given data source might be less or more relevant to a query than others.
An agentic RAG system can solve this problem by routing queries to the most relevant tools based on the query’s nature. This is done by leveraging the tool use capabilities of the Chat endpoint.
In this tutorial, we’ll cover:
- Setting up the tools
- Running an agentic RAG workflow
- Routing queries to tools
We’ll build an agent that can answer questions about using Cohere, equipped with a number of different tools.
Setup
To get started, first we need to install the cohere
library and create a Cohere client.
We also need to import the tool definitions from the tool_def.py
file.
tool_def.py
file in the same directory as this notebook for the imports to work correctly. Setting up the tools
In an agentic RAG system, each data source is represented as a tool. A tool is broadly any function or service that can receive and send objects to the model. But in the case of RAG, this becomes a more specific case of a tool that takes a query as input and returns a set of documents.
Here, we are defining a Python function for each tool, but more broadly, the tool can be any function or service that can receive and send objects. Here are some specifics:
search_developer_docs
: Searches Cohere developer documentation. In this tutorial, we are creating a small list of sample documents for simplicity, and will return the same list for every query. In practice, you will want to implement a search function, probably leveraging semantic search.search_internet
: Performs an internet search using Tavily search, which we take from LangChain’s implementation.search_code_examples
: Searches for Cohere code examples and tutorials. Here we are also creating a small list of sample documents for simplicity.
These functions are mapped to a dictionary called functions_map
for easy access.
Check out this documentation on parameter types in tool use for further reading.
The second and final setup step is to define the tool schemas in a format that can be passed to the Chat endpoint. A tool schema must contain the name
, description
, and parameters
fields, in the format shown below.
This schema informs the LLM about what the tool does, which enables an LLM to decide whether to use a particular tool. Therefore, the more descriptive and specific the schema, the more likely the LLM will make the right tool call decisions.
Running an agentic RAG workflow
We can now run an agentic RAG workflow using tools. We can think of the system as consisting of four components:
- The user
- The application
- The LLM
- The tools
At its most basic, these four components interact in a workflow through four steps:
- Step 1: Get user message – The LLM gets the user message (via the application)
- Step 2: Tool planning and calling – The LLM makes a decision on the tools to call (if any) and generates - the tool calls
- Step 3: Tool execution - The application executes the tools and the sends the results to the LLM
- Step 4: Response and citation generation – The LLM generates the response and citations to back to the user
We wrap all these steps in a function called run_agent
.
Routing queries to tools
Let’s ask the agent a few questions, starting with one about the Embed endpoint.
Because this concerns a specific feature, the agent decides to use the search_developer_docs
tool (instead of retrieving from all the data sources it’s connected to).
It first generates a tool plan that describes how it will handle the query. Then, it generates a call to the search_developer_docs
tool with the associated query
parameter.
The tool does indeed contain the information asked by the user, which the agent then uses to generate its response.
Let’s now ask the agent a question about the authors of the sentence BERT paper. This information is not likely to be found in the developer documentation or code examples because it is not Cohere-specific, so we can expect the agent to use the internet search tool.
And this is exactly what the agent does. This time, it decides to use the search_internet
tool, triggers the search through Tavily search, and uses the results to generate its response.
Let’s ask the agent a final question, this time about tutorials that are relevant for enterprises.
Again, the agent uses the context of the query to decide on the most relevant tool. In this case, it selects the search_code_examples
tool and provides a response based on the information found.
Summary
In this tutorial, we learned about:
- How to set up tools in an agentic RAG system
- How to run an agentic RAG workflow
- How to automatically route queries to the most relevant data sources
However, so far we have only seen rather simple queries. In practice, we may run into a complex query that needs to simplified, optimized, or split (etc.) before we can perform the retrieval.
In Part 2, we’ll learn how to build an agentic RAG system that can expand user queries into parallel queries.