Agentic Multi-Stage RAG with Cohere Tools API
Motivation
Retrieval augmented generation (RAG) has been a go-to use case that enterprises have been adopting with large language models (LLMs). Even though it works well in general, there are edge cases where this can fail. Most commonly, when the retrieved document mentions the query but actually refers to another document, the model will fail to generate the correct answer.
We propose an agentic RAG system that leverages tool use to continue to retrieve documents if correct ones were not retrieved at first try. This is ideal for use cases where accuracy is a top priority and latency is not. For example, lawyers trying to find the most accurate answer from their contracts are willing to wait a few more seconds to get the answer instead of getting wrong answers fast.
Objective
This notebook, we will explore how we can build a simple agentic RAG using Cohere’s native API. We have prepared a fake dataset to demonstrate the use case. We ask three questions that require different depths of retrieval. We will see how the model answers the question between simple and agentic RAG.
Disclaimer
One of the challenges in building a RAG system is that it has many moving pieces: vector database, type of embedding model, use of reranker, number of retrieved documents, chunking strategy, and more. These components can make debugging and evaluating RAG systems difficult. Since this notebook focuses on the concept of agentic RAG, it will simplify other parts of the RAG system. For example, we will only retrieve top 1 document to demonstrate what happens when retrieved document does not contain the answer needed.
Result
As you will see more below, the multi-stage retrieval is achieved by adding a new function reference_extractor()
that extracts other references in the documents and updating the instruction so the agent continues to retrieve more documents.
Setup
Data
We leveraged data from Washington Department of Transportation and modified to fit the need of this demo.
title | body | combined | embeddings | |
---|---|---|---|---|
0 | Bicycle law | \n Traffic Infractions and fees - For a… | Title: Bicycle law\nBody: \n Traffic In… | [-0.024673462, -0.034729004, 0.0418396, 0.0121… |
1 | Bicycle helmet requirement | Currently, there is no state law requiring hel… | Title: Bicycle helmet requirement\nBody: Curre… | [-0.019180298, -0.037384033, 0.0027389526, -0… |
2 | Section 21a | helmet rules by location: These are city and c… | Title: Section 21a\nBody: helmet rules by loca… | [0.031097412, 0.0007619858, -0.023010254, -0.0… |
3 | Section 3b | Traffic infraction - A person operating a bicy… | Title: Section 3b\nBody: Traffic infraction - … | [0.015602112, -0.016143799, 0.032958984, 0.000… |
Tools
Following functions and tools will be used in the subsequent tasks.
RAG function
Agentic RAG - cohere_agent()
Question 1 - single-stage retrieval
Here we are asking a question that can be answered easily with single-stage retrieval. Both regular and agentic RAG should be able to answer this question easily. Below is the comparsion of the response.
Simple RAG
Agentic RAG
Question 2 - double-stage retrieval
The second question requires a double-stage retrieval because top matched document references another document. You will see below that the agentic RAG is unable to produce the correct answer initially. But when given proper tools and instructions, it finds the correct answer.
Simple RAG
Agentic RAG
Produces same quality answer as the simple rag.
Agentic RAG - New Tools
In order for the model to retrieve correct documents, we do two things:
- New reference_extractor() function is added. This function finds the references to other documents when given query and documents.
- We update the instruction that directs the agent to keep retrieving relevant documents.