Migrating away from deprecated create_csv_agent in langchain-cohere

How to build a CSV Agent without using deprecated create_csv_agent - langchain-cohere

Starting from version 0.3.5 of the langchain-cohere package, the create_csv_agent abstraction has been deprecated. In this guide, we will demonstrate how to build a custom CSV agent without it.

!pip install langchain langchain-core langchain-experimental langchain-cohere pandas -qq
1# Import packages
2from datetime import datetime
3from io import IOBase
4from typing import List, Optional, Union
5import pandas as pd
6from pydantic import BaseModel, Field
7
8from langchain.agents import AgentExecutor, create_tool_calling_agent
9from langchain_core.language_models import BaseLanguageModel
10from langchain_core.messages import (
11 BaseMessage,
12 HumanMessage,
13 SystemMessage,
14)
15from langchain_core.prompts import (
16 ChatPromptTemplate,
17 MessagesPlaceholder,
18)
19
20from langchain_core.tools import Tool, BaseTool
21from langchain_core.prompts.chat import (
22 BaseMessagePromptTemplate,
23 HumanMessagePromptTemplate,
24)
25
26from langchain_experimental.tools.python.tool import PythonAstREPLTool
27from langchain_cohere.chat_models import ChatCohere
1# Replace this cell with your actual cohere api key
2os.env["COHERE_API_KEY"] = "cohere_api_key"
1# Define prompts that we want to use in the csv agent
2FUNCTIONS_WITH_DF = """
3This is the result of `print(df.head())`:
4{df_head}
5
6Do note that the above df isn't the complete df. It is only the first {number_of_head_rows} rows of the df.
7Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly!
8
9The complete path for the csv files are for the corresponding dataframe is:
10{csv_path}
11""" # noqa E501
12
13FUNCTIONS_WITH_MULTI_DF = """
14This is the result of `print(df.head())` for each dataframe:
15{dfs_head}
16
17Do note that the above dfs aren't the complete df. It is only the first {number_of_head_rows} rows of the df.
18Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly!
19
20The complete path for the csv files are for the corresponding dataframes are:
21{csv_paths}
22""" # noqa E501
23
24PREFIX_FUNCTIONS = """
25You are working with a pandas dataframe in Python. The name of the dataframe is `df`.""" # noqa E501
26
27MULTI_DF_PREFIX_FUNCTIONS = """
28You are working with {num_dfs} pandas dataframes in Python named df1, df2, etc.""" # noqa E501
29
30CSV_PREAMBLE = """## Task And Context
31You use your advanced complex reasoning capabilities to help people by answering their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You may need to use multiple tools in parallel or sequentially to complete your task. You should focus on serving the user's needs as best you can, which will be wide-ranging. The current date is {current_date}
32
33## Style Guide
34Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling
35""" # noqa E501

Define tools necessary for the agent

The below cell introduces a suite of tools that are provided to the csv agent. These tools allow the agent to fcilitate meaningful interactions with uploaded files and providing Python code execution functionality. The toolkit comprises three main components:

File Peek Tool: Offers a convenient way to inspect a CSV file by providing a quick preview of the first few rows in a Markdown format, making it easy to get a glimpse of the data.

File Read Tool: Allows for a comprehensive exploration of the CSV file by reading and presenting its full contents in a user-friendly Markdown format.

Python Interpreter Tool: Enables secure execution of Python code within a sandboxed environment, providing users with the output of the code execution.

1# Define tools that we want the csv agent to have access to
2
3
4def get_file_peek_tool() -> Tool:
5 def file_peek(filename: str, num_rows: int = 5) -> str:
6 """Returns the first textual contents of an uploaded file
7
8 Args:
9 table_path: the table path
10 num_rows: the number of rows of the table to preview.
11 """ # noqa E501
12 if ".csv" in filename:
13 return pd.read_csv(filename).head(num_rows).to_markdown()
14 else:
15 return "the table_path was not recognised"
16
17 class file_peek_inputs(BaseModel):
18 filename: str = Field(
19 description="The name of the attached file to show a peek preview."
20 )
21
22 file_peek_tool = Tool(
23 name="file_peek",
24 description="The name of the attached file to show a peek preview.", # noqa E501
25 func=file_peek,
26 args_schema=file_peek_inputs,
27 )
28
29 return file_peek_tool
30
31
32def get_file_read_tool() -> Tool:
33 def file_read(filename: str) -> str:
34 """Returns the textual contents of an uploaded file, broken up in text chunks
35
36 Args:
37 filename (str): The name of the attached file to read.
38 """ # noqa E501
39 if ".csv" in filename:
40 return pd.read_csv(filename).to_markdown()
41 else:
42 return "the table_path was not recognised"
43
44 class file_read_inputs(BaseModel):
45 filename: str = Field(
46 description="The name of the attached file to read."
47 )
48
49 file_read_tool = Tool(
50 name="file_read",
51 description="Returns the textual contents of an uploaded file, broken up in text chunks", # noqa E501
52 func=file_read,
53 args_schema=file_read_inputs,
54 )
55
56 return file_read_tool
57
58
59def get_python_tool() -> Tool:
60 """Returns a tool that will execute python code and return the output."""
61
62 def python_interpreter(code: str) -> str:
63 """A function that will return the output of the python code.
64
65 Args:
66 code: the python code to run.
67 """
68 return python_repl.run(code)
69
70 python_repl = PythonAstREPLTool()
71 python_tool = Tool(
72 name="python_interpreter",
73 description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.", # noqa E501
74 func=python_interpreter,
75 )
76
77 class PythonToolInput(BaseModel):
78 code: str = Field(description="Python code to execute.")
79
80 python_tool.args_schema = PythonToolInput
81 return python_tool

Create helper functions

In the cell below, we will create some important helper functions that we can call to properly assemble the full prompt that the csv agent can utilize.

1def create_prompt(
2 system_message: Optional[BaseMessage] = SystemMessage(
3 content="You are a helpful AI assistant."
4 ),
5 extra_prompt_messages: Optional[
6 List[BaseMessagePromptTemplate]
7 ] = None,
8) -> ChatPromptTemplate:
9 """Create prompt for this agent.
10
11 Args:
12 system_message: Message to use as the system message that will be the
13 first in the prompt.
14 extra_prompt_messages: Prompt messages that will be placed between the
15 system message and the new human input.
16
17 Returns:
18 A prompt template to pass into this agent.
19 """
20 _prompts = extra_prompt_messages or []
21 messages: List[Union[BaseMessagePromptTemplate, BaseMessage]]
22 if system_message:
23 messages = [system_message]
24 else:
25 messages = []
26
27 messages.extend(
28 [
29 *_prompts,
30 HumanMessagePromptTemplate.from_template("{input}"),
31 MessagesPlaceholder(variable_name="agent_scratchpad"),
32 ]
33 )
34 return ChatPromptTemplate(messages=messages)
35
36
37def _get_csv_head_str(path: str, number_of_head_rows: int) -> str:
38 with open(path, "r") as file:
39 lines = []
40 for _ in range(number_of_head_rows):
41 lines.append(file.readline().strip("\n"))
42 # validate that the head contents are well formatted csv
43
44 return " ".join(lines)
45
46
47def _get_prompt(
48 path: Union[str, List[str]], number_of_head_rows: int
49) -> ChatPromptTemplate:
50 if isinstance(path, str):
51 lines = _get_csv_head_str(path, number_of_head_rows)
52 prompt_message = f"The user uploaded the following attachments:\nFilename: {path}\nWord Count: {count_words_in_file(path)}\nPreview: {lines}" # noqa: E501
53
54 elif isinstance(path, list):
55 prompt_messages = []
56 for file_path in path:
57 lines = _get_csv_head_str(file_path, number_of_head_rows)
58 prompt_messages.append(
59 f"The user uploaded the following attachments:\nFilename: {file_path}\nWord Count: {count_words_in_file(file_path)}\nPreview: {lines}" # noqa: E501
60 )
61 prompt_message = " ".join(prompt_messages)
62
63 prompt = create_prompt(
64 system_message=HumanMessage(prompt_message)
65 )
66 return prompt
67
68
69def count_words_in_file(file_path: str) -> int:
70 try:
71 with open(file_path, "r") as file:
72 content = file.readlines()
73 words = [len(sentence.split()) for sentence in content]
74 return sum(words)
75 except FileNotFoundError:
76 print("File not found.")
77 return 0
78 except Exception as e:
79 print("An error occurred:", str(e))
80 return 0

Build the core csv agent abstraction

The cells below outline the assembly of the various components to build an agent abstraction tailored for intelligent CSV file interactions. We use langchain to provide the agent that has access to the tools declared above, along with additional capabilities to provide additional tools if needed, and put together an agentic abstraction using the prompts defined earlier to deliver an abstraction that can easily be called for natural language querying over csv files!

1# Build the agent abstraction itself
2def create_csv_agent(
3 llm: BaseLanguageModel,
4 path: Union[str, List[str]],
5 extra_tools: List[BaseTool] = [],
6 pandas_kwargs: Optional[dict] = None,
7 prompt: Optional[ChatPromptTemplate] = None,
8 number_of_head_rows: int = 5,
9 verbose: bool = True,
10 return_intermediate_steps: bool = True,
11) -> AgentExecutor:
12 """Create csv agent with the specified language model.
13
14 Args:
15 llm: Language model to use for the agent.
16 path: A string path, or a list of string paths
17 that can be read in as pandas DataFrames with pd.read_csv().
18 number_of_head_rows: Number of rows to display in the prompt for sample data
19 include_df_in_prompt: Display the DataFrame sample values in the prompt.
20 pandas_kwargs: Named arguments to pass to pd.read_csv().
21 prefix: Prompt prefix string.
22 suffix: Prompt suffix string.
23 prompt: Prompt to use for the agent. This takes precedence over the other prompt arguments, such as suffix and prefix.
24 temp_path_dir: Temporary directory to store the csv files in for the python repl.
25 delete_temp_path: Whether to delete the temporary directory after the agent is done. This only works if temp_path_dir is not provided.
26
27 Returns:
28 An AgentExecutor with the specified agent_type agent and access to
29 a PythonREPL and any user-provided extra_tools.
30
31 Example:
32 .. code-block:: python
33
34 from langchain_cohere import ChatCohere, create_csv_agent
35
36 llm = ChatCohere(model="command-r-plus", temperature=0)
37 agent_executor = create_csv_agent(
38 llm,
39 "titanic.csv"
40 )
41 resp = agent_executor.invoke({"input":"How many people were on the titanic?"})
42 print(resp.get("output"))
43 """ # noqa: E501
44 try:
45 import pandas as pd
46 except ImportError:
47 raise ImportError(
48 "pandas package not found, please install with `pip install pandas`."
49 )
50
51 _kwargs = pandas_kwargs or {}
52 if isinstance(path, (str)):
53 df = pd.read_csv(path, **_kwargs)
54
55 elif isinstance(path, list):
56 df = []
57 for item in path:
58 if not isinstance(item, (str, IOBase)):
59 raise ValueError(
60 f"Expected str or file-like object, got {type(path)}"
61 )
62 df.append(pd.read_csv(item, **_kwargs))
63 else:
64 raise ValueError(
65 f"Expected str, list, or file-like object, got {type(path)}"
66 )
67
68 if not prompt:
69 prompt = _get_prompt(path, number_of_head_rows)
70
71 final_tools = [
72 get_file_read_tool(),
73 get_file_peek_tool(),
74 get_python_tool(),
75 ] + extra_tools
76 if "preamble" in llm.__dict__ and not llm.__dict__.get(
77 "preamble"
78 ):
79 llm = ChatCohere(**llm.__dict__)
80 llm.preamble = CSV_PREAMBLE.format(
81 current_date=datetime.now().strftime(
82 "%A, %B %d, %Y %H:%M:%S"
83 )
84 )
85
86 agent = create_tool_calling_agent(
87 llm=llm, tools=final_tools, prompt=prompt
88 )
89 agent_executor = AgentExecutor(
90 agent=agent,
91 tools=final_tools,
92 verbose=verbose,
93 return_intermediate_steps=return_intermediate_steps,
94 )
95 return agent_executor

Using the CSV agent

Let’s create a dummy CSV file for demo

1import csv
2
3# Data to be written to the CSV file
4data = [
5 ["movie", "name", "num_tickets"],
6 ["The Shawshank Redemption", "John", 2],
7 ["The Shawshank Redemption", "Jerry", 2],
8 ["The Shawshank Redemption", "Jack", 4],
9 ["The Shawshank Redemption", "Jeremy", 2],
10 ["Finding Nemo", "Darren", 3],
11 ["Finding Nemo", "Jones", 2],
12 ["Finding Nemo", "King", 1],
13 ["Finding Nemo", "Penelope", 5],
14]
15
16file_path = "movies_tickets.csv"
17
18with open(file_path, "w", newline="") as file:
19 writer = csv.writer(file)
20 writer.writerows(data)
21
22print(f"CSV file created successfully at {file_path}.")

CSV file created successfully at movies_tickets.csv.

Let’s use our CSV Agent to interact with the CSV file

1# Try out an example
2llm = ChatCohere(model="command-r-plus", temperature=0)
3agent_executor = create_csv_agent(llm, "movies_tickets.csv")
4resp = agent_executor.invoke(
5 {"input": "Who all watched Shawshank redemption?"}
6)
7print(resp.get("output"))

The output returned is:

John, Jerry, Jack and Jeremy watched Shawshank Redemption.
Built with