Shaan Desai

Motivation

Retrieval-augmented generation (RAG) allows language models to generate grounded answers to questions about documents. However, the complexity of the documents can significantly influence overall RAG performance. For instance, the documents may be PDFs that contain a mix of text and tables.

More broadly, the implementation of a RAG pipeline - including parsing and chunking of documents, along with the embedding and retrieval of the chunks - is critical to the accuracy of grounded answers. Additionally, it is sometimes not sufficient to merely retrieve the answers; a user may want further postprocessing performed on the output. This use case would benefit from giving the model access to tools.

Objective

In this notebook, we will guide you through best practices for setting up a RAG pipeline to process documents that contain both tables and text. We will also demonstrate how to create a ReAct agent with a Cohere model, and then give the agent access to a RAG pipeline tool to improve accuracy. The general structure of the notebook is as follows:

individual components around parsing, retrieval and generation are covered for documents with mixed tabular and textual data
a class object is created that can be used to instantiate the pipeline with parametric input
the RAG pipeline is then used as a tool for a Cohere ReACT agent

Reference Documents

We recommend the following notebook as a guide to semi-structured RAG.

We also recommend the following notebook to explore various parsing techniques for PDFs.

Various LangChain-supported parsers can be found here.

Install Dependencies

PYTHON

1 # there may be other dependencies that will need installation
2 # ! pip install --quiet langchain langchain_cohere langchain_experimental
3 # !pip --quiet install faiss-cpu tiktoken
4 # !pip install pypdf
5 # !pip install pytesseract
6 # !pip install opencv-python --upgrade
7 # !pip install "unstructured[all-docs]"
8 # !pip install chromadb

PYTHON

1 # LLM
2 import os
3 from langchain.text_splitter import RecursiveCharacterTextSplitter
4 from langchain_community.document_loaders import WebBaseLoader
5 from langchain_community.vectorstores import FAISS
6 from langchain_cohere import CohereEmbeddings
7 from pydantic import BaseModel
8 from unstructured.partition.pdf import partition_pdf
9 from langchain_community.document_loaders import PyPDFLoader
10 import os
11 from typing import Any
12 import uuid
13 from langchain.retrievers.multi_vector import MultiVectorRetriever
14 from langchain.storage import InMemoryStore
15 from langchain_community.vectorstores import Chroma
16 from langchain_core.documents import Document
17 import cohere, json
18 import pandas as pd
19 from datasets import load_dataset
20 from joblib import Parallel, delayed
21 
22 os.environ['COHERE_API_KEY'] = ""

Parsing

To improve RAG performance on PDFs with mixed types (text and tables), we investigated a number of parsing and chunking strategies from various libraries:

We have found that the best option for parsing is unstructured.io since the parser can:

separate tables from text
automatically chunk the tables and text by title during the parsing step so that similar elements are grouped

PYTHON

1 # UNSTRUCTURED pdf loader
2 # Get elements
3 raw_pdf_elements = partition_pdf(
4     filename="city_ny_popular_fin_report.pdf",
5     # Unstructured first finds embedded image blocks
6     extract_images_in_pdf=False,
7     # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles
8     # Titles are any sub-section of the document
9     infer_table_structure=True,
10     # Post processing to aggregate text once we have the title
11     chunking_strategy="by_title",
12     # Chunking params to aggregate text blocks
13     # Attempt to create a new chunk 3800 chars
14     # Attempt to keep chunks > 2000 chars
15     max_characters=4000,
16     new_after_n_chars=3800,
17     combine_text_under_n_chars=2000,
18     image_output_dir_path='.',
19 )

PYTHON

1 # extract table and textual objects from parser
2 class Element(BaseModel):
3     type: str
4     text: Any
5 
6 # Categorize by type
7 categorized_elements = []
8 for element in raw_pdf_elements:
9     if "unstructured.documents.elements.Table" in str(type(element)):
10         categorized_elements.append(Element(type="table", text=str(element)))
11     elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
12         categorized_elements.append(Element(type="text", text=str(element)))
13 
14 # Tables
15 table_elements = [e for e in categorized_elements if e.type == "table"]
16 print(len(table_elements))
17 
18 # Text
19 text_elements = [e for e in categorized_elements if e.type == "text"]
20 print(len(text_elements))

Output

14
24

Vector Store Setup

There are many options for setting up a vector store. Here, we show how to do so using Chroma and Langchain’s Multi-vector retrieval. As the name implies, multi-vector retrieval allows us to store multiple vectors per document; for instance, for a single document chunk, one could keep embeddings for both the chunk itself, and a summary of that document. A summary may be able to distill more accurately what a chunk is about, leading to better retrieval.

You can read more about this here: https://python.langchain.com/docs/how_to/multi_vector/

Below, we demonstrate the following process:

summaries of each chunk are embedded
during inference, the multi-vector retrieval returns the full context document related to the summary

PYTHON

1 co = cohere.Client()
2 def get_chat_output(message, preamble, chat_history, model, temp, documents=None):
3     return co.chat(
4     message=message,
5     preamble=preamble,
6     chat_history=chat_history,
7     documents=documents,
8     model=model,
9     temperature=temp
10     ).text
11 
12 def parallel_proc_chat(prompts,preamble,chat_history=None,model='command-a-03-2025',temp=0.1,n_jobs=10):
13     """Parallel processing of chat endpoint calls."""
14     responses = Parallel(n_jobs=n_jobs, prefer="threads")(delayed(get_chat_output)(prompt,preamble,chat_history,model,temp) for prompt in prompts)
15     return responses
16 
17 def rerank_cohere(query, returned_documents,model:str="rerank-multilingual-v3.0",top_n:int=3):
18     response = co.rerank(
19         query=query,
20         documents=returned_documents,
21         top_n=top_n,
22         model=model,
23         return_documents=True
24     )
25     top_chunks_after_rerank = [results.document.text for results in response.results]
26     return top_chunks_after_rerank

PYTHON

1 # generate table and text summaries
2 prompt_text = """You are an assistant tasked with summarizing tables and text. \
3 Give a concise summary of the table or text. Table or text chunk: {element}. Only provide the summary and no other text."""
4 
5 table_prompts = [prompt_text.format(element=i.text) for i in table_elements]
6 table_summaries = parallel_proc_chat(table_prompts,None)
7 text_prompts = [prompt_text.format(element=i.text) for i in text_elements]
8 text_summaries = parallel_proc_chat(text_prompts,None)
9 tables = [i.text for i in table_elements]
10 texts = [i.text for i in text_elements]

PYTHON

1 # The vectorstore to use to index the child chunks
2 vectorstore = Chroma(collection_name="summaries", embedding_function=CohereEmbeddings())
3 # The storage layer for the parent documents
4 store = InMemoryStore()
5 id_key = "doc_id"
6 # The retriever (empty to start)
7 retriever = MultiVectorRetriever(
8     vectorstore=vectorstore,
9     docstore=store,
10     id_key=id_key,
11 )
12 # Add texts
13 doc_ids = [str(uuid.uuid4()) for _ in texts]
14 summary_texts = [
15     Document(page_content=s, metadata={id_key: doc_ids[i]})
16     for i, s in enumerate(text_summaries)
17 ]
18 retriever.vectorstore.add_documents(summary_texts)
19 retriever.docstore.mset(list(zip(doc_ids, texts)))
20 # Add tables
21 table_ids = [str(uuid.uuid4()) for _ in tables]
22 summary_tables = [
23     Document(page_content=s, metadata={id_key: table_ids[i]})
24     for i, s in enumerate(table_summaries)
25 ]
26 retriever.vectorstore.add_documents(summary_tables)
27 retriever.docstore.mset(list(zip(table_ids, tables)))

RAG Pipeline

With our database in place, we can run queries against it. The query process can be broken down into the following steps:

augment the query, this really helps retrieve all the relevant information
use each augmented query to retrieve the top k docs and then rerank them
concatenate all the shortlisted/reranked docs and pass them to the generation model

PYTHON

1 def process_query(query, retriever):
2     """Runs query augmentation, retrieval, rerank and final generation in one call."""
3     augmented_queries=co.chat(message=query,model='command-a-03-2025',temperature=0.2, search_queries_only=True)
4         #augment queries
5     if augmented_queries.search_queries:
6         reranked_docs=[]
7         for itm in augmented_queries.search_queries:
8             docs=retriever.invoke(itm.text)
9             temp_rerank = rerank_cohere(itm.text,docs)
10             reranked_docs.extend(temp_rerank)
11         documents = [{"title": f"chunk {i}", "snippet": reranked_docs[i]} for i in range(len(reranked_docs))]
12     else:
13         #no queries will be run through RAG
14         documents = None
15 
16     preamble = """
17 ## Task &amp; Context
18 You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
19 
20 ## Style Guide
21 Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
22 """
23     model = 'command-a-03-2025'
24     temp = 0.2
25 
26 
27 
28     response = co.chat(
29       message=query,
30       documents=documents,
31       preamble=preamble,
32       model=model,
33       temperature=temp
34     )
35 
36     final_answer_docs="""The final answer is from the documents below:
37 
38     {docs}""".format(docs=str(response.documents))
39 
40     final_answer = response.text
41     return final_answer, final_answer_docs

Example

We can now test out a query. In this example, the final answer can be found on page 12 of the PDF, which aligns with the response provided by the model:

PYTHON

1 query = "what are the charges for services in 2022"
2 final_answer, final_answer_docs = process_query(query, retriever)
3 print(final_answer)
4 print(final_answer_docs)
5 
6 
7 chat_history=[{'role':"USER", 'message':query},{'role':"CHATBOT", 'message':f'The final answer is: {final_answer}.' + final_answer_docs}]

Output

The charges for services in 2022 were $5,266 million.
The final answer is from the documents below:
    [{'id': 'doc_0', 'snippet': 'Program and General Revenues FY 2023 FY 2022 FY 2021 Category (in millions) Charges for Services (CS) $5,769 $5,266 $5,669 Operating Grants and Contributions (OGC) 27,935 31,757 28,109 Capital Grants and Contributions (CGC) 657 656 675 Real Estate Taxes (RET) 31,502 29,507 31,421 Sales and Use Taxes (SUT) 10,577 10,106 7,614 Personal Income Taxes (PIT) 15,313 15,520 15,795 Income Taxes, Other (ITO) 13,181 9,521 9,499 Other Taxes* (OT) 3,680 3,777 2,755 Investment Income* (II) 694 151 226 Unrestricted Federal and State Aid (UFSA) 234 549 108 Other* (O) Total Program and General Revenues - Primary Government 2,305 $110,250 $107,535 $104,176 708 725', 'title': 'chunk 0'}]

Chat History Management

In the example below, we ask a follow up question that relies on the chat history, but does not require a rerun of the RAG pipeline.

We detect questions that do not require RAG by examining the search_queries object returned by calling co.chat to generate candidate queries to answer our question. If this object is empty, then the model has determined that a document query is not needed to answer the question.

In the example below, the else statement is invoked based on query2. We still pass in the chat history, allowing the question to be answered with only the prior context.

PYTHON

1 query2='divide this by two'
2 augmented_queries=co.chat(message=query2,model='command-a-03-2025',temperature=0.2, search_queries_only=True)
3 if augmented_queries.search_queries:
4     print('RAG is needed')
5     final_answer, final_answer_docs = process_query(query, retriever)
6     print(final_answer)
7 else:
8     print('RAG is not needed')
9     response = co.chat(
10       message=query2,
11       model='command-a-03-2025',
12       chat_history=chat_history,
13       temperature=0.3
14     )
15 
16     print("Final answer:")
17     print(response.text)

Output

    RAG is not needed
    Final answer:
    The result of dividing the charges for services in 2022 by two is $2,633.

RAG Pipeline Class

Here, we connect all of the pieces discussed above into one class object, which is then used as a tool for a Cohere ReAct agent. This class definition consolidates and clarify the key parameters used to define the RAG pipeline.

PYTHON

1 co = cohere.Client()

PYTHON

1 class Element(BaseModel):
2     type: str
3     text: Any
4 
5 class RAG_pipeline():
6     def __init__(self,paths):
7         self.embedding_model="embed-v4.0"
8         self.generation_model="command-a-03-2025"
9         self.summary_model="command-a-03-2025"
10         self.rerank_model="rerank-multilingual-v3.0"
11         self.num_docs_to_retrieve = 10
12         self.top_k_rerank=3
13         self.temperature=0.2
14         self.preamble="""
15 ## Task &amp; Context
16 You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
17 
18 ## Style Guide
19 Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
20 """
21         self.n_jobs=10 #number of parallel processes to run summarization of chunks
22         self.extract_images_in_pdf=False
23         self.infer_table_structure=True
24         self.chunking_strategy="by_title"
25         self.max_characters=4000
26         self.new_after_n_chars=3800
27         self.combine_text_under_n_chars=2000
28         self.image_output_dir_path='.'
29         self.paths = paths
30         self.parse_and_build_retriever()
31 
32     def parse_and_build_retriever(self,):
33         #step1, parse pdfs
34         # if condition just for debugging since perf_audit.pdf is parsed in the prev step, no need to rerun
35         parsed_pdf_list=self.parse_pdfs(self.paths)
36         #separate tables and text
37         extracted_tables, extracted_text = self.extract_text_and_tables(parsed_pdf_list)
38         #generate summaries for everything
39         tables, table_summaries, texts, text_summaries=self.generate_summaries(extracted_tables,extracted_text)
40         self.tables = tables
41         self.table_summaries = table_summaries
42         self.texts = texts
43         self.text_summaries=text_summaries
44         #setup the multivector retriever
45         self.make_retriever(tables, table_summaries, texts, text_summaries)
46 
47     def extract_text_and_tables(self,parsed_pdf_list):
48         # extract table and textual objects from parser
49         # Categorize by type
50         all_table_elements = []
51         all_text_elements = []
52         for raw_pdf_elements in parsed_pdf_list:
53             categorized_elements = []
54             for element in raw_pdf_elements:
55                 if "unstructured.documents.elements.Table" in str(type(element)):
56                     categorized_elements.append(Element(type="table", text=str(element)))
57                 elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
58                     categorized_elements.append(Element(type="text", text=str(element)))
59 
60             # Tables
61             table_elements = [e for e in categorized_elements if e.type == "table"]
62             print(len(table_elements))
63 
64             # Text
65             text_elements = [e for e in categorized_elements if e.type == "text"]
66             print(len(text_elements))
67             all_table_elements.extend(table_elements)
68             all_text_elements.extend(text_elements)
69 
70         return all_table_elements, all_text_elements
71 
72     def parse_pdfs(self, paths):
73 
74         path_raw_elements = []
75         for path in paths:
76             raw_pdf_elements = partition_pdf(
77             filename=path,
78             # Unstructured first finds embedded image blocks
79             extract_images_in_pdf=self.extract_images_in_pdf,
80             # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles
81             # Titles are any sub-section of the document
82             infer_table_structure=self.infer_table_structure,
83             # Post processing to aggregate text once we have the title
84             chunking_strategy=self.chunking_strategy,
85             # Chunking params to aggregate text blocks
86             # Attempt to create a new chunk 3800 chars
87             # Attempt to keep chunks > 2000 chars
88             max_characters=self.max_characters,
89             new_after_n_chars=self.new_after_n_chars,
90             combine_text_under_n_chars=self.combine_text_under_n_chars,
91             image_output_dir_path=self.image_output_dir_path,
92             )
93             path_raw_elements.append(raw_pdf_elements)
94         print('PDFs parsed')
95         return path_raw_elements
96 
97 
98     def get_chat_output(self,message, preamble, model, temp):
99         # print("**message")
100         # print(message)
101 
102         response=co.chat(
103             message=message,
104             preamble=preamble,
105             model=model,
106             temperature=temp
107             ).text
108         # print("**output")
109         # print(response)
110         return response
111 
112     def parallel_proc_chat(self,prompts,preamble,model,temp,n_jobs):
113         """Parallel processing of chat endpoint calls."""
114         responses = Parallel(n_jobs=n_jobs, prefer="threads")(delayed(self.get_chat_output)(prompt,preamble,model,temp) for prompt in prompts)
115         return responses
116 
117     def rerank_cohere(self,query, returned_documents,model, top_n):
118         response = co.rerank(
119             query=query,
120             documents=returned_documents,
121             top_n=top_n,
122             model=model,
123             return_documents=True
124         )
125         top_chunks_after_rerank = [results.document.text for results in response.results]
126         return top_chunks_after_rerank
127 
128     def generate_summaries(self,table_elements,text_elements):
129         # generate table and text summaries
130 
131         summarize_prompt = """You are an assistant tasked with summarizing tables and text. \
132         Give a concise summary of the table or text. Table or text chunk: {element}. Only provide the summary and no other text."""
133 
134         table_prompts = [summarize_prompt.format(element=i.text) for i in table_elements]
135         table_summaries = self.parallel_proc_chat(table_prompts,self.preamble,self.summary_model,self.temperature,self.n_jobs)
136         text_prompts = [summarize_prompt.format(element=i.text) for i in text_elements]
137         text_summaries = self.parallel_proc_chat(text_prompts,self.preamble,self.summary_model,self.temperature,self.n_jobs)
138         tables = [i.text for i in table_elements]
139         texts = [i.text for i in text_elements]
140         print('summaries generated')
141         return tables, table_summaries, texts, text_summaries
142 
143     def make_retriever(self,tables, table_summaries, texts, text_summaries):
144         # The vectorstore to use to index the child chunks
145         vectorstore = Chroma(collection_name="summaries", embedding_function=CohereEmbeddings())
146         # The storage layer for the parent documents
147         store = InMemoryStore()
148         id_key = "doc_id"
149         # The retriever (empty to start)
150         retriever = MultiVectorRetriever(
151             vectorstore=vectorstore,
152             docstore=store,
153             id_key=id_key,
154             search_kwargs={"k": self.num_docs_to_retrieve}
155         )
156         # Add texts
157         doc_ids = [f'text_{i}' for i in range(len(texts))]#[str(uuid.uuid4()) for _ in texts]
158         summary_texts = [
159             Document(page_content=s, metadata={id_key: doc_ids[i]})
160             for i, s in enumerate(text_summaries)
161         ]
162         retriever.vectorstore.add_documents(summary_texts,ids=doc_ids)
163         retriever.docstore.mset(list(zip(doc_ids, texts)))
164         # Add tables
165         table_ids = [f'table_{i}' for i in range(len(texts))]#[str(uuid.uuid4()) for _ in tables]
166         summary_tables = [
167             Document(page_content=s, metadata={id_key: table_ids[i]})
168             for i, s in enumerate(table_summaries)
169         ]
170         retriever.vectorstore.add_documents(summary_tables,ids=table_ids)
171         retriever.docstore.mset(list(zip(table_ids, tables)))
172         self.retriever = retriever
173         print('retriever built')
174 
175     def process_query(self,query):
176         """Runs query augmentation, retrieval, rerank and generation in one call."""
177         augmented_queries=co.chat(message=query,model=self.generation_model,temperature=self.temperature, search_queries_only=True)
178         #augment queries
179         if augmented_queries.search_queries:
180             reranked_docs=[]
181             for itm in augmented_queries.search_queries:
182                 docs=self.retriever.invoke(itm.text)
183                 temp_rerank = self.rerank_cohere(itm.text,docs,model=self.rerank_model,top_n=self.top_k_rerank)
184                 reranked_docs.extend(temp_rerank)
185             documents = [{"title": f"chunk {i}", "snippet": reranked_docs[i]} for i in range(len(reranked_docs))]
186         else:
187             documents = None
188 
189         response = co.chat(
190           message=query,
191           documents=documents,
192           preamble=self.preamble,
193           model=self.generation_model,
194           temperature=self.temperature
195         )
196 
197         final_answer_docs="""The final answer is from the documents below:
198 
199         {docs}""".format(docs=str(response.documents))
200 
201         final_answer = response.text
202         return final_answer, final_answer_docs

PYTHON

1 rag_object=RAG_pipeline(paths=["city_ny_popular_fin_report.pdf"])

This function will be deprecated in a future release and unstructured will simply use the DEFAULT_MODEL from unstructured_inference.model.base to set default model name

Output

PDFs parsed
14
24
summaries generated
retriever built

Cohere ReAct Agent with RAG Tool

Finally, we build a simple agent that utilizes the RAG pipeline defined above. We do this by granting the agent access to two tools:

the end-to-end RAG pipeline
a Python interpreter

The intention behind coupling these tools is to enable the model to perform mathematical and other postprocessing operations on RAG outputs using Python.

PYTHON

1 from langchain.agents import Tool
2 from langchain_experimental.utilities import PythonREPL
3 from langchain.agents import AgentExecutor
4 from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
5 from langchain_core.prompts import ChatPromptTemplate
6 from langchain_cohere.chat_models import ChatCohere
7 from langchain.tools.retriever import create_retriever_tool
8 from langchain_core.pydantic_v1 import BaseModel, Field
9 from langchain_core.tools import tool
10 
11 class react_agent():
12     def __init__(self,rag_retriever,model="command-a-03-2025",temperature=0.2):
13         self.llm = ChatCohere(model=model, temperature=temperature)
14         self.preamble="""
15 ## Task &amp; Context
16 You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
17 
18 ## Style Guide
19 Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
20 
21 ## Guidelines
22 You are an expert who answers the user's question.
23 You have access to a vectorsearch tool that will use your query to search through documents and find the relevant answer.
24 You also have access to a python interpreter tool which you can use to run code for mathematical operations.
25 """
26         self.get_tools(rag_retriever)
27         self.build_agent()
28 
29     def get_tools(self,rag_retriever):
30         @tool
31         def vectorsearch(query: str):
32             """Uses the query to search through a list of documents and return the most relevant documents as well as the answer."""
33             final_answer, final_answer_docs=rag_retriever.process_query(query)
34             return final_answer + final_answer_docs
35         vectorsearch.name = "vectorsearch" # use python case
36         vectorsearch.description = "Uses the query to search through a list of documents and return the most relevant documents as well as the answer."
37         class vectorsearch_inputs(BaseModel):
38             query: str = Field(description="the users query")
39         vectorsearch.args_schema = vectorsearch_inputs
40 
41 
42         python_repl = PythonREPL()
43         python_tool = Tool(
44             name="python_repl",
45             description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.",
46             func=python_repl.run,
47         )
48         python_tool.name = "python_interpreter"
49         class ToolInput(BaseModel):
50             code: str = Field(description="Python code to execute.")
51         python_tool.args_schema = ToolInput
52 
53         self.alltools = [vectorsearch,python_tool]
54 
55     def build_agent(self):
56         # Prompt template
57         prompt = ChatPromptTemplate.from_template("{input}")
58         # Create the ReAct agent
59         agent = create_cohere_react_agent(
60             llm=self.llm,
61             tools=self.alltools,
62             prompt=prompt,
63         )
64         self.agent_executor = AgentExecutor(agent=agent, tools=self.alltools, verbose=True,return_intermediate_steps=True)
65 
66 
67     def run_agent(self,query,history=None):
68         if history:
69             response=self.agent_executor.invoke({
70             "input": query,
71             "preamble": self.preamble,
72             "chat_history": history
73         })
74         else:
75             response=self.agent_executor.invoke({
76             "input": query,
77             "preamble": self.preamble,
78         })
79         return response

PYTHON

1 agent_object=react_agent(rag_retriever=rag_object)

PYTHON

1 step1_response=agent_object.run_agent("what are the charges for services in 2022 and 2023")

Output

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
I will search for the charges for services in 2022 and 2023.
{'tool_name': 'vectorsearch', 'parameters': {'query': 'charges for services in 2022 and 2023'}}
[0m[36;1m[1;3mThe charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.The final answer is from the documents below:
        [{'id': 'doc_0', 'snippet': 'Program and General Revenues FY 2023 FY 2022 FY 2021 Category (in millions) Charges for Services (CS) $5,769 $5,266 $5,669 Operating Grants and Contributions (OGC) 27,935 31,757 28,109 Capital Grants and Contributions (CGC) 657 656 675 Real Estate Taxes (RET) 31,502 29,507 31,421 Sales and Use Taxes (SUT) 10,577 10,106 7,614 Personal Income Taxes (PIT) 15,313 15,520 15,795 Income Taxes, Other (ITO) 13,181 9,521 9,499 Other Taxes* (OT) 3,680 3,777 2,755 Investment Income* (II) 694 151 226 Unrestricted Federal and State Aid (UFSA) 234 549 108 Other* (O) Total Program and General Revenues - Primary Government 2,305 $110,250 $107,535 $104,176 708 725', 'title': 'chunk 0'}][0m[32;1m[1;3mRelevant Documents: 0
Cited Documents: 0
Answer: The charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.
Grounded answer: The charges for services in <co: 0="">2022</co:> were <co: 0="">$5,266 million</co:> and in <co: 0="">2023</co:> were <co: 0="">$5,769 million</co:>.[0m
[1m> Finished chain.[0m

Just like earlier, we can also pass chat history to the LangChain agent to refer to for any other queries.

PYTHON

1 from langchain_core.messages import HumanMessage, AIMessage

PYTHON

1 chat_history=[
2 HumanMessage(content=step1_response['input']),
3 AIMessage(content=step1_response['output'])
4 ]

PYTHON

1 agent_object.run_agent("what is the mean of the two values",history=chat_history)

Output

[1m> Entering new AgentExecutor chain...[0m
Python REPL can execute arbitrary code. Use with caution.
[32;1m[1;3m
I will use the Python Interpreter tool to calculate the mean of the two values.
{'tool_name': 'python_interpreter', 'parameters': {'code': 'import numpy as np\n\n# Data\nvalues = [5266, 5769]\n\n# Calculate the mean\nmean_value = np.mean(values)\n\nprint(f"The mean of the two values is: {mean_value:.0f} million")'}}
[0m[33;1m[1;3mThe mean of the two values is: 5518 million
[0m[32;1m[1;3mRelevant Documents: 0
Cited Documents: 0
Answer: The mean of the two values is 5518 million.
Grounded answer: The mean of the two values is <co: 0="">5518 million</co:>.[0m
[1m> Finished chain.[0m

Output

1 {'input': 'what is the mean of the two values',
2 'preamble': "\n## Task &amp; Context\nYou help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.\n\n## Style Guide\nUnless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.\n\n## Guidelines\nYou are an expert who answers the user's question. \nYou have access to a vectorsearch tool that will use your query to search through documents and find the relevant answer.\nYou also have access to a python interpreter tool which you can use to run code for mathematical operations.\n",
3 'chat_history': [HumanMessage(content='what are the charges for services in 2022 and 2023'),
4 AIMessage(content='The charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.')],
5 'output': 'The mean of the two values is 5518 million.',
6 'citations': [CohereCitation(start=30, end=42, text='5518 million', documents=[{'output': 'The mean of the two values is: 5518 million\n'}])],
7 'intermediate_steps': [(AgentActionMessageLog(tool='python_interpreter', tool_input={'code': 'import numpy as np\n\n# Data\nvalues = [5266, 5769]\n\n# Calculate the mean\nmean_value = np.mean(values)\n\nprint(f"The mean of the two values is: {mean_value:.0f} million")'}, log='\nI will use the Python Interpreter tool to calculate the mean of the two values.\n{\'tool_name\': \'python_interpreter\', \'parameters\': {\'code\': \'import numpy as np\\n\\n# Data\\nvalues = [5266, 5769]\\n\\n# Calculate the mean\\nmean_value = np.mean(values)\\n\\nprint(f"The mean of the two values is: {mean_value:.0f} million")\'}}\n', message_log=[AIMessage(content='\nPlan: I will use the Python Interpreter tool to calculate the mean of the two values.\nAction: ```json\n[\n    {\n        "tool_name": "python_interpreter",\n        "parameters": {\n            "code": "import numpy as np\\n\\n# Data\\nvalues = [5266, 5769]\\n\\n# Calculate the mean\\nmean_value = np.mean(values)\\n\\nprint(f\\"The mean of the two values is: {mean_value:.0f} million\\")"\n        }\n    }\n]\n```')]),
8 'The mean of the two values is: 5518 million\n')]}

Conclusion

As you can see, the RAG pipeline can be used as a tool for a Cohere ReAct agent. This allows the agent to access the RAG pipeline for document retrieval and generation, as well as a Python interpreter for postprocessing mathematical operations to improve accuracy. This setup can be used to improve the accuracy of grounded answers to questions about documents that contain both tables and text.