How to build a CSV Agent without using deprecated `create_csv_agent` - langchain-cohere

Starting from version 0.3.5 of the langchain-cohere package, the create_csv_agent abstraction has been deprecated. In this guide, we will demonstrate how to build a custom CSV agent without it.

!pip install langchain langchain-core langchain-experimental langchain-cohere pandas -qq

1 # Import packages
2 from datetime import datetime
3 from io import IOBase
4 from typing import List, Optional, Union
5 import pandas as pd
6 from pydantic import BaseModel, Field
7 
8 from langchain.agents import AgentExecutor, create_tool_calling_agent
9 from langchain_core.language_models import BaseLanguageModel
10 from langchain_core.messages import (
11     BaseMessage,
12     HumanMessage,
13     SystemMessage,
14 )
15 from langchain_core.prompts import (
16     ChatPromptTemplate,
17     MessagesPlaceholder,
18 )
19 
20 from langchain_core.tools import Tool, BaseTool
21 from langchain_core.prompts.chat import (
22     BaseMessagePromptTemplate,
23     HumanMessagePromptTemplate,
24 )
25 
26 from langchain_experimental.tools.python.tool import PythonAstREPLTool
27 from langchain_cohere.chat_models import ChatCohere

1 # Replace this cell with your actual cohere api key
2 os.env["COHERE_API_KEY"] = "cohere_api_key"

1 # Define prompts that we want to use in the csv agent
2 FUNCTIONS_WITH_DF = """
3 This is the result of `print(df.head())`:
4 {df_head}
5 
6 Do note that the above df isn't the complete df. It is only the first {number_of_head_rows} rows of the df.
7 Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly!
8 
9 The complete path for the csv files are for the corresponding dataframe is:
10 {csv_path}
11 """  # noqa E501
12 
13 FUNCTIONS_WITH_MULTI_DF = """
14 This is the result of `print(df.head())` for each dataframe:
15 {dfs_head}
16 
17 Do note that the above dfs aren't the complete df. It is only the first {number_of_head_rows} rows of the df.
18 Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly!
19 
20 The complete path for the csv files are for the corresponding dataframes are:
21 {csv_paths}
22 """  # noqa E501
23 
24 PREFIX_FUNCTIONS = """
25 You are working with a pandas dataframe in Python. The name of the dataframe is `df`."""  # noqa E501
26 
27 MULTI_DF_PREFIX_FUNCTIONS = """
28 You are working with {num_dfs} pandas dataframes in Python named df1, df2, etc."""  # noqa E501
29 
30 CSV_PREAMBLE = """## Task And Context
31 You use your advanced complex reasoning capabilities to help people by answering their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You may need to use multiple tools in parallel or sequentially to complete your task. You should focus on serving the user's needs as best you can, which will be wide-ranging. The current date is {current_date}
32 
33 ## Style Guide
34 Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling
35 """  # noqa E501

Define tools necessary for the agent

The below cell introduces a suite of tools that are provided to the csv agent. These tools allow the agent to fcilitate meaningful interactions with uploaded files and providing Python code execution functionality. The toolkit comprises three main components:

File Peek Tool: Offers a convenient way to inspect a CSV file by providing a quick preview of the first few rows in a Markdown format, making it easy to get a glimpse of the data.

File Read Tool: Allows for a comprehensive exploration of the CSV file by reading and presenting its full contents in a user-friendly Markdown format.

Python Interpreter Tool: Enables secure execution of Python code within a sandboxed environment, providing users with the output of the code execution.

1 # Define tools that we want the csv agent to have access to
2 
3 
4 def get_file_peek_tool() -> Tool:
5     def file_peek(filename: str, num_rows: int = 5) -> str:
6         """Returns the first textual contents of an uploaded file
7 
8         Args:
9             table_path: the table path
10             num_rows: the number of rows of the table to preview.
11         """  # noqa E501
12         if ".csv" in filename:
13             return pd.read_csv(filename).head(num_rows).to_markdown()
14         else:
15             return "the table_path was not recognised"
16 
17     class file_peek_inputs(BaseModel):
18         filename: str = Field(
19             description="The name of the attached file to show a peek preview."
20         )
21 
22     file_peek_tool = Tool(
23         name="file_peek",
24         description="The name of the attached file to show a peek preview.",  # noqa E501
25         func=file_peek,
26         args_schema=file_peek_inputs,
27     )
28 
29     return file_peek_tool
30 
31 
32 def get_file_read_tool() -> Tool:
33     def file_read(filename: str) -> str:
34         """Returns the textual contents of an uploaded file, broken up in text chunks
35 
36         Args:
37             filename (str): The name of the attached file to read.
38         """  # noqa E501
39         if ".csv" in filename:
40             return pd.read_csv(filename).to_markdown()
41         else:
42             return "the table_path was not recognised"
43 
44     class file_read_inputs(BaseModel):
45         filename: str = Field(
46             description="The name of the attached file to read."
47         )
48 
49     file_read_tool = Tool(
50         name="file_read",
51         description="Returns the textual contents of an uploaded file, broken up in text chunks",  # noqa E501
52         func=file_read,
53         args_schema=file_read_inputs,
54     )
55 
56     return file_read_tool
57 
58 
59 def get_python_tool() -> Tool:
60     """Returns a tool that will execute python code and return the output."""
61 
62     def python_interpreter(code: str) -> str:
63         """A function that will return the output of the python code.
64 
65         Args:
66             code: the python code to run.
67         """
68         return python_repl.run(code)
69 
70     python_repl = PythonAstREPLTool()
71     python_tool = Tool(
72         name="python_interpreter",
73         description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.",  # noqa E501
74         func=python_interpreter,
75     )
76 
77     class PythonToolInput(BaseModel):
78         code: str = Field(description="Python code to execute.")
79 
80     python_tool.args_schema = PythonToolInput
81     return python_tool

Create helper functions

In the cell below, we will create some important helper functions that we can call to properly assemble the full prompt that the csv agent can utilize.

1 def create_prompt(
2     system_message: Optional[BaseMessage] = SystemMessage(
3         content="You are a helpful AI assistant."
4     ),
5     extra_prompt_messages: Optional[
6         List[BaseMessagePromptTemplate]
7     ] = None,
8 ) -> ChatPromptTemplate:
9     """Create prompt for this agent.
10 
11     Args:
12         system_message: Message to use as the system message that will be the
13             first in the prompt.
14         extra_prompt_messages: Prompt messages that will be placed between the
15             system message and the new human input.
16 
17     Returns:
18         A prompt template to pass into this agent.
19     """
20     _prompts = extra_prompt_messages or []
21     messages: List[Union[BaseMessagePromptTemplate, BaseMessage]]
22     if system_message:
23         messages = [system_message]
24     else:
25         messages = []
26 
27     messages.extend(
28         [
29             *_prompts,
30             HumanMessagePromptTemplate.from_template("{input}"),
31             MessagesPlaceholder(variable_name="agent_scratchpad"),
32         ]
33     )
34     return ChatPromptTemplate(messages=messages)
35 
36 
37 def _get_csv_head_str(path: str, number_of_head_rows: int) -> str:
38     with open(path, "r") as file:
39         lines = []
40         for _ in range(number_of_head_rows):
41             lines.append(file.readline().strip("\n"))
42         # validate that the head contents are well formatted csv
43 
44         return " ".join(lines)
45 
46 
47 def _get_prompt(
48     path: Union[str, List[str]], number_of_head_rows: int
49 ) -> ChatPromptTemplate:
50     if isinstance(path, str):
51         lines = _get_csv_head_str(path, number_of_head_rows)
52         prompt_message = f"The user uploaded the following attachments:\nFilename: {path}\nWord Count: {count_words_in_file(path)}\nPreview: {lines}"  # noqa: E501
53 
54     elif isinstance(path, list):
55         prompt_messages = []
56         for file_path in path:
57             lines = _get_csv_head_str(file_path, number_of_head_rows)
58             prompt_messages.append(
59                 f"The user uploaded the following attachments:\nFilename: {file_path}\nWord Count: {count_words_in_file(file_path)}\nPreview: {lines}"  # noqa: E501
60             )
61         prompt_message = " ".join(prompt_messages)
62 
63     prompt = create_prompt(
64         system_message=HumanMessage(prompt_message)
65     )
66     return prompt
67 
68 
69 def count_words_in_file(file_path: str) -> int:
70     try:
71         with open(file_path, "r") as file:
72             content = file.readlines()
73             words = [len(sentence.split()) for sentence in content]
74             return sum(words)
75     except FileNotFoundError:
76         print("File not found.")
77         return 0
78     except Exception as e:
79         print("An error occurred:", str(e))
80         return 0

Build the core csv agent abstraction

The cells below outline the assembly of the various components to build an agent abstraction tailored for intelligent CSV file interactions. We use langchain to provide the agent that has access to the tools declared above, along with additional capabilities to provide additional tools if needed, and put together an agentic abstraction using the prompts defined earlier to deliver an abstraction that can easily be called for natural language querying over csv files!

1 # Build the agent abstraction itself
2 def create_csv_agent(
3     llm: BaseLanguageModel,
4     path: Union[str, List[str]],
5     extra_tools: List[BaseTool] = [],
6     pandas_kwargs: Optional[dict] = None,
7     prompt: Optional[ChatPromptTemplate] = None,
8     number_of_head_rows: int = 5,
9     verbose: bool = True,
10     return_intermediate_steps: bool = True,
11 ) -> AgentExecutor:
12     """Create csv agent with the specified language model.
13 
14     Args:
15         llm: Language model to use for the agent.
16         path: A string path, or a list of string paths
17             that can be read in as pandas DataFrames with pd.read_csv().
18         number_of_head_rows: Number of rows to display in the prompt for sample data
19         include_df_in_prompt: Display the DataFrame sample values in the prompt.
20         pandas_kwargs: Named arguments to pass to pd.read_csv().
21         prefix: Prompt prefix string.
22         suffix: Prompt suffix string.
23         prompt: Prompt to use for the agent. This takes precedence over the other prompt arguments, such as suffix and prefix.
24         temp_path_dir: Temporary directory to store the csv files in for the python repl.
25         delete_temp_path: Whether to delete the temporary directory after the agent is done. This only works if temp_path_dir is not provided.
26 
27     Returns:
28         An AgentExecutor with the specified agent_type agent and access to
29         a PythonREPL and any user-provided extra_tools.
30 
31     Example:
32         .. code-block:: python
33 
34             from langchain_cohere import ChatCohere, create_csv_agent
35 
36             llm = ChatCohere(model="command-a-03-2025", temperature=0)
37             agent_executor = create_csv_agent(
38                 llm,
39                 "titanic.csv"
40             )
41             resp = agent_executor.invoke({"input":"How many people were on the titanic?"})
42             print(resp.get("output"))
43     """  # noqa: E501
44     try:
45         import pandas as pd
46     except ImportError:
47         raise ImportError(
48             "pandas package not found, please install with `pip install pandas`."
49         )
50 
51     _kwargs = pandas_kwargs or {}
52     if isinstance(path, (str)):
53         df = pd.read_csv(path, **_kwargs)
54 
55     elif isinstance(path, list):
56         df = []
57         for item in path:
58             if not isinstance(item, (str, IOBase)):
59                 raise ValueError(
60                     f"Expected str or file-like object, got {type(path)}"
61                 )
62             df.append(pd.read_csv(item, **_kwargs))
63     else:
64         raise ValueError(
65             f"Expected str, list, or file-like object, got {type(path)}"
66         )
67 
68     if not prompt:
69         prompt = _get_prompt(path, number_of_head_rows)
70 
71     final_tools = [
72         get_file_read_tool(),
73         get_file_peek_tool(),
74         get_python_tool(),
75     ] + extra_tools
76     if "preamble" in llm.__dict__ and not llm.__dict__.get(
77         "preamble"
78     ):
79         llm = ChatCohere(**llm.__dict__)
80         llm.preamble = CSV_PREAMBLE.format(
81             current_date=datetime.now().strftime(
82                 "%A, %B %d, %Y %H:%M:%S"
83             )
84         )
85 
86     agent = create_tool_calling_agent(
87         llm=llm, tools=final_tools, prompt=prompt
88     )
89     agent_executor = AgentExecutor(
90         agent=agent,
91         tools=final_tools,
92         verbose=verbose,
93         return_intermediate_steps=return_intermediate_steps,
94     )
95     return agent_executor

Using the CSV agent

Let’s create a dummy CSV file for demo

1 import csv
2 
3 # Data to be written to the CSV file
4 data = [
5     ["movie", "name", "num_tickets"],
6     ["The Shawshank Redemption", "John", 2],
7     ["The Shawshank Redemption", "Jerry", 2],
8     ["The Shawshank Redemption", "Jack", 4],
9     ["The Shawshank Redemption", "Jeremy", 2],
10     ["Finding Nemo", "Darren", 3],
11     ["Finding Nemo", "Jones", 2],
12     ["Finding Nemo", "King", 1],
13     ["Finding Nemo", "Penelope", 5],
14 ]
15 
16 file_path = "movies_tickets.csv"
17 
18 with open(file_path, "w", newline="") as file:
19     writer = csv.writer(file)
20     writer.writerows(data)
21 
22 print(f"CSV file created successfully at {file_path}.")

CSV file created successfully at movies_tickets.csv.

Let’s use our CSV Agent to interact with the CSV file

1 # Try out an example
2 llm = ChatCohere(model="command-a-03-2025", temperature=0)
3 agent_executor = create_csv_agent(llm, "movies_tickets.csv")
4 resp = agent_executor.invoke(
5     {"input": "Who all watched Shawshank redemption?"}
6 )
7 print(resp.get("output"))

The output returned is:

John, Jerry, Jack and Jeremy watched Shawshank Redemption.

How to build a CSV Agent without using deprecated create_csv_agent - langchain-cohere

Define tools necessary for the agent

Create helper functions

Build the core csv agent abstraction

Using the CSV agent

How to build a CSV Agent without using deprecated `create_csv_agent` - langchain-cohere