Migrating away from deprecated create_csv_agent in langchain-cohere
How to build a CSV Agent without using deprecated create_csv_agent
- langchain-cohere
Starting from version 0.3.5 of the langchain-cohere package, the create_csv_agent
abstraction has been deprecated.
In this guide, we will demonstrate how to build a custom CSV agent without it.
!pip install langchain langchain-core langchain-experimental langchain-cohere pandas -qq
1 # Import packages 2 from datetime import datetime 3 from io import IOBase 4 from typing import List, Optional, Union 5 import pandas as pd 6 from pydantic import BaseModel, Field 7 8 from langchain.agents import AgentExecutor, create_tool_calling_agent 9 from langchain_core.language_models import BaseLanguageModel 10 from langchain_core.messages import ( 11 BaseMessage, 12 HumanMessage, 13 SystemMessage, 14 ) 15 from langchain_core.prompts import ( 16 ChatPromptTemplate, 17 MessagesPlaceholder, 18 ) 19 20 from langchain_core.tools import Tool, BaseTool 21 from langchain_core.prompts.chat import ( 22 BaseMessagePromptTemplate, 23 HumanMessagePromptTemplate, 24 ) 25 26 from langchain_experimental.tools.python.tool import PythonAstREPLTool 27 from langchain_cohere.chat_models import ChatCohere
1 # Replace this cell with your actual cohere api key 2 os.env["COHERE_API_KEY"] = "cohere_api_key"
1 # Define prompts that we want to use in the csv agent 2 FUNCTIONS_WITH_DF = """ 3 This is the result of `print(df.head())`: 4 {df_head} 5 6 Do note that the above df isn't the complete df. It is only the first {number_of_head_rows} rows of the df. 7 Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly! 8 9 The complete path for the csv files are for the corresponding dataframe is: 10 {csv_path} 11 """ # noqa E501 12 13 FUNCTIONS_WITH_MULTI_DF = """ 14 This is the result of `print(df.head())` for each dataframe: 15 {dfs_head} 16 17 Do note that the above dfs aren't the complete df. It is only the first {number_of_head_rows} rows of the df. 18 Use this as a sample to understand the structure of the df. However, donot use this to make any calculations directly! 19 20 The complete path for the csv files are for the corresponding dataframes are: 21 {csv_paths} 22 """ # noqa E501 23 24 PREFIX_FUNCTIONS = """ 25 You are working with a pandas dataframe in Python. The name of the dataframe is `df`.""" # noqa E501 26 27 MULTI_DF_PREFIX_FUNCTIONS = """ 28 You are working with {num_dfs} pandas dataframes in Python named df1, df2, etc.""" # noqa E501 29 30 CSV_PREAMBLE = """## Task And Context 31 You use your advanced complex reasoning capabilities to help people by answering their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You may need to use multiple tools in parallel or sequentially to complete your task. You should focus on serving the user's needs as best you can, which will be wide-ranging. The current date is {current_date} 32 33 ## Style Guide 34 Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling 35 """ # noqa E501
Define tools necessary for the agent
The below cell introduces a suite of tools that are provided to the csv agent. These tools allow the agent to fcilitate meaningful interactions with uploaded files and providing Python code execution functionality. The toolkit comprises three main components:
File Peek Tool: Offers a convenient way to inspect a CSV file by providing a quick preview of the first few rows in a Markdown format, making it easy to get a glimpse of the data.
File Read Tool: Allows for a comprehensive exploration of the CSV file by reading and presenting its full contents in a user-friendly Markdown format.
Python Interpreter Tool: Enables secure execution of Python code within a sandboxed environment, providing users with the output of the code execution.
1 # Define tools that we want the csv agent to have access to 2 3 4 def get_file_peek_tool() -> Tool: 5 def file_peek(filename: str, num_rows: int = 5) -> str: 6 """Returns the first textual contents of an uploaded file 7 8 Args: 9 table_path: the table path 10 num_rows: the number of rows of the table to preview. 11 """ # noqa E501 12 if ".csv" in filename: 13 return pd.read_csv(filename).head(num_rows).to_markdown() 14 else: 15 return "the table_path was not recognised" 16 17 class file_peek_inputs(BaseModel): 18 filename: str = Field( 19 description="The name of the attached file to show a peek preview." 20 ) 21 22 file_peek_tool = Tool( 23 name="file_peek", 24 description="The name of the attached file to show a peek preview.", # noqa E501 25 func=file_peek, 26 args_schema=file_peek_inputs, 27 ) 28 29 return file_peek_tool 30 31 32 def get_file_read_tool() -> Tool: 33 def file_read(filename: str) -> str: 34 """Returns the textual contents of an uploaded file, broken up in text chunks 35 36 Args: 37 filename (str): The name of the attached file to read. 38 """ # noqa E501 39 if ".csv" in filename: 40 return pd.read_csv(filename).to_markdown() 41 else: 42 return "the table_path was not recognised" 43 44 class file_read_inputs(BaseModel): 45 filename: str = Field( 46 description="The name of the attached file to read." 47 ) 48 49 file_read_tool = Tool( 50 name="file_read", 51 description="Returns the textual contents of an uploaded file, broken up in text chunks", # noqa E501 52 func=file_read, 53 args_schema=file_read_inputs, 54 ) 55 56 return file_read_tool 57 58 59 def get_python_tool() -> Tool: 60 """Returns a tool that will execute python code and return the output.""" 61 62 def python_interpreter(code: str) -> str: 63 """A function that will return the output of the python code. 64 65 Args: 66 code: the python code to run. 67 """ 68 return python_repl.run(code) 69 70 python_repl = PythonAstREPLTool() 71 python_tool = Tool( 72 name="python_interpreter", 73 description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.", # noqa E501 74 func=python_interpreter, 75 ) 76 77 class PythonToolInput(BaseModel): 78 code: str = Field(description="Python code to execute.") 79 80 python_tool.args_schema = PythonToolInput 81 return python_tool
Create helper functions
In the cell below, we will create some important helper functions that we can call to properly assemble the full prompt that the csv agent can utilize.
1 def create_prompt( 2 system_message: Optional[BaseMessage] = SystemMessage( 3 content="You are a helpful AI assistant." 4 ), 5 extra_prompt_messages: Optional[ 6 List[BaseMessagePromptTemplate] 7 ] = None, 8 ) -> ChatPromptTemplate: 9 """Create prompt for this agent. 10 11 Args: 12 system_message: Message to use as the system message that will be the 13 first in the prompt. 14 extra_prompt_messages: Prompt messages that will be placed between the 15 system message and the new human input. 16 17 Returns: 18 A prompt template to pass into this agent. 19 """ 20 _prompts = extra_prompt_messages or [] 21 messages: List[Union[BaseMessagePromptTemplate, BaseMessage]] 22 if system_message: 23 messages = [system_message] 24 else: 25 messages = [] 26 27 messages.extend( 28 [ 29 *_prompts, 30 HumanMessagePromptTemplate.from_template("{input}"), 31 MessagesPlaceholder(variable_name="agent_scratchpad"), 32 ] 33 ) 34 return ChatPromptTemplate(messages=messages) 35 36 37 def _get_csv_head_str(path: str, number_of_head_rows: int) -> str: 38 with open(path, "r") as file: 39 lines = [] 40 for _ in range(number_of_head_rows): 41 lines.append(file.readline().strip("\n")) 42 # validate that the head contents are well formatted csv 43 44 return " ".join(lines) 45 46 47 def _get_prompt( 48 path: Union[str, List[str]], number_of_head_rows: int 49 ) -> ChatPromptTemplate: 50 if isinstance(path, str): 51 lines = _get_csv_head_str(path, number_of_head_rows) 52 prompt_message = f"The user uploaded the following attachments:\nFilename: {path}\nWord Count: {count_words_in_file(path)}\nPreview: {lines}" # noqa: E501 53 54 elif isinstance(path, list): 55 prompt_messages = [] 56 for file_path in path: 57 lines = _get_csv_head_str(file_path, number_of_head_rows) 58 prompt_messages.append( 59 f"The user uploaded the following attachments:\nFilename: {file_path}\nWord Count: {count_words_in_file(file_path)}\nPreview: {lines}" # noqa: E501 60 ) 61 prompt_message = " ".join(prompt_messages) 62 63 prompt = create_prompt( 64 system_message=HumanMessage(prompt_message) 65 ) 66 return prompt 67 68 69 def count_words_in_file(file_path: str) -> int: 70 try: 71 with open(file_path, "r") as file: 72 content = file.readlines() 73 words = [len(sentence.split()) for sentence in content] 74 return sum(words) 75 except FileNotFoundError: 76 print("File not found.") 77 return 0 78 except Exception as e: 79 print("An error occurred:", str(e)) 80 return 0
Build the core csv agent abstraction
The cells below outline the assembly of the various components to build an agent abstraction tailored for intelligent CSV file interactions. We use langchain to provide the agent that has access to the tools declared above, along with additional capabilities to provide additional tools if needed, and put together an agentic abstraction using the prompts defined earlier to deliver an abstraction that can easily be called for natural language querying over csv files!
1 # Build the agent abstraction itself 2 def create_csv_agent( 3 llm: BaseLanguageModel, 4 path: Union[str, List[str]], 5 extra_tools: List[BaseTool] = [], 6 pandas_kwargs: Optional[dict] = None, 7 prompt: Optional[ChatPromptTemplate] = None, 8 number_of_head_rows: int = 5, 9 verbose: bool = True, 10 return_intermediate_steps: bool = True, 11 ) -> AgentExecutor: 12 """Create csv agent with the specified language model. 13 14 Args: 15 llm: Language model to use for the agent. 16 path: A string path, or a list of string paths 17 that can be read in as pandas DataFrames with pd.read_csv(). 18 number_of_head_rows: Number of rows to display in the prompt for sample data 19 include_df_in_prompt: Display the DataFrame sample values in the prompt. 20 pandas_kwargs: Named arguments to pass to pd.read_csv(). 21 prefix: Prompt prefix string. 22 suffix: Prompt suffix string. 23 prompt: Prompt to use for the agent. This takes precedence over the other prompt arguments, such as suffix and prefix. 24 temp_path_dir: Temporary directory to store the csv files in for the python repl. 25 delete_temp_path: Whether to delete the temporary directory after the agent is done. This only works if temp_path_dir is not provided. 26 27 Returns: 28 An AgentExecutor with the specified agent_type agent and access to 29 a PythonREPL and any user-provided extra_tools. 30 31 Example: 32 .. code-block:: python 33 34 from langchain_cohere import ChatCohere, create_csv_agent 35 36 llm = ChatCohere(model="command-r-plus", temperature=0) 37 agent_executor = create_csv_agent( 38 llm, 39 "titanic.csv" 40 ) 41 resp = agent_executor.invoke({"input":"How many people were on the titanic?"}) 42 print(resp.get("output")) 43 """ # noqa: E501 44 try: 45 import pandas as pd 46 except ImportError: 47 raise ImportError( 48 "pandas package not found, please install with `pip install pandas`." 49 ) 50 51 _kwargs = pandas_kwargs or {} 52 if isinstance(path, (str)): 53 df = pd.read_csv(path, **_kwargs) 54 55 elif isinstance(path, list): 56 df = [] 57 for item in path: 58 if not isinstance(item, (str, IOBase)): 59 raise ValueError( 60 f"Expected str or file-like object, got {type(path)}" 61 ) 62 df.append(pd.read_csv(item, **_kwargs)) 63 else: 64 raise ValueError( 65 f"Expected str, list, or file-like object, got {type(path)}" 66 ) 67 68 if not prompt: 69 prompt = _get_prompt(path, number_of_head_rows) 70 71 final_tools = [ 72 get_file_read_tool(), 73 get_file_peek_tool(), 74 get_python_tool(), 75 ] + extra_tools 76 if "preamble" in llm.__dict__ and not llm.__dict__.get( 77 "preamble" 78 ): 79 llm = ChatCohere(**llm.__dict__) 80 llm.preamble = CSV_PREAMBLE.format( 81 current_date=datetime.now().strftime( 82 "%A, %B %d, %Y %H:%M:%S" 83 ) 84 ) 85 86 agent = create_tool_calling_agent( 87 llm=llm, tools=final_tools, prompt=prompt 88 ) 89 agent_executor = AgentExecutor( 90 agent=agent, 91 tools=final_tools, 92 verbose=verbose, 93 return_intermediate_steps=return_intermediate_steps, 94 ) 95 return agent_executor
Using the CSV agent
Let’s create a dummy CSV file for demo
1 import csv 2 3 # Data to be written to the CSV file 4 data = [ 5 ["movie", "name", "num_tickets"], 6 ["The Shawshank Redemption", "John", 2], 7 ["The Shawshank Redemption", "Jerry", 2], 8 ["The Shawshank Redemption", "Jack", 4], 9 ["The Shawshank Redemption", "Jeremy", 2], 10 ["Finding Nemo", "Darren", 3], 11 ["Finding Nemo", "Jones", 2], 12 ["Finding Nemo", "King", 1], 13 ["Finding Nemo", "Penelope", 5], 14 ] 15 16 file_path = "movies_tickets.csv" 17 18 with open(file_path, "w", newline="") as file: 19 writer = csv.writer(file) 20 writer.writerows(data) 21 22 print(f"CSV file created successfully at {file_path}.")
CSV file created successfully at movies_tickets.csv.
Let’s use our CSV Agent to interact with the CSV file
1 # Try out an example 2 llm = ChatCohere(model="command-r-plus", temperature=0) 3 agent_executor = create_csv_agent(llm, "movies_tickets.csv") 4 resp = agent_executor.invoke( 5 {"input": "Who all watched Shawshank redemption?"} 6 ) 7 print(resp.get("output"))
The output returned is:
John, Jerry, Jack and Jeremy watched Shawshank Redemption.