Jason Jung

Notebook Overview

Motivation

Tabular data reasoning continues to be an exciting application of Agents, particularly in the financial domain, where multiple reasoning steps may be needed to identify the right answer. For example, a financial analyst working with financial statements may be interested in computing various financial ratios with natural language queries.

Some examples may include:

ROE (Return on Equity) = Net Income / Shareholder’s Equity
Net Profit Margin = Net Income / Revenue
Asset Turnover = Revenue / Average Total Assets
Financial Leverage = Average Total Assets / Shareholder’s Equity

Having an Agent which is able to correctly compute these and other ratios would be a great help for any analyst in the field of Finance.

Objective

In this notebook we explore how to setup a Cohere Agent to answer questions over tables in Apple’s SEC10K 2020 form. Financial CSV Agent already showed how to use Langchain to ask questions about your data. This notebook will demonstrate how you can build the same agent using Cohere’s native API with Langchain Python tool. We will also explore how to make your agent more resilient to errors.

Setup

PYTHON

1 import os
2 from typing import List
3 
4 import cohere
5 import langchain
6 import langchain_core
7 import langchain_experimental
8 import pandas as pd
9 from langchain.agents import Tool
10 from langchain_core.pydantic_v1 import BaseModel, Field
11 from langchain_experimental.utilities import PythonREPL

PYTHON

1 # Uncomment if you need to install the following packages
2 # !pip install --quiet langchain langchain_experimental cohere --upgrade

PYTHON

1 # versions
2 print('cohere version:', cohere.__version__)
3 print('langchain version:', langchain.__version__)
4 print('langchain_core version:', langchain_core.__version__)
5 print('langchain_experimental version:', langchain_experimental.__version__)

Output

cohere version: 5.5.1
langchain version: 0.2.0
langchain_core version: 0.2.0
langchain_experimental version: 0.0.59

API Key

PYTHON

1 COHERE_API_KEY = os.environ["COHERE_API_KEY"]
2 CHAT_URL= "https://api.cohere.ai/v1/chat"
3 COHERE_MODEL = 'command-r-plus'
4 co = cohere.Client(api_key=COHERE_API_KEY)

Data Loading

PYTHON

1 income_statement = pd.read_csv('income_statement.csv')
2 balance_sheet = pd.read_csv('balance_sheet.csv')

PYTHON

1 income_statement.head(2)

	Unnamed: 0	index	RevenueFromContractWithCustomerExcludingAssessedTax	CostOfGoodsAndServicesSold	GrossProfit	ResearchAndDevelopmentExpense	SellingGeneralAndAdministrativeExpense	OperatingExpenses	OperatingIncomeLoss	NonoperatingIncomeExpense	IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest	IncomeTaxExpenseBenefit	NetIncomeLoss	EarningsPerShareBasic	EarningsPerShareDiluted	WeightedAverageNumberOfSharesOutstandingBasic	WeightedAverageNumberOfDilutedSharesOutstanding
0	0	2017-10-01-2018-09-29	265595000000	1.637560e+11	101839000000	1.423600e+10	1.670500e+10	3.094100e+10	7.089800e+10	2.005000e+09	7.290300e+10	1.337200e+10	59531000000	3.00	2.98	1.982151e+10	2.000044e+10
1	1	2018-09-30-2018-12-29	84310000000	NaN	32031000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19965000000	1.05	1.05	NaN	NaN

PYTHON

1 balance_sheet.head(2)

	Unnamed: 0	index	CashAndCashEquivalentsAtCarryingValue	MarketableSecuritiesCurrent	AccountsReceivableNetCurrent	InventoryNet	NontradeReceivablesCurrent	OtherAssetsCurrent	AssetsCurrent	MarketableSecuritiesNoncurrent	…	LongTermDebtNoncurrent	OtherLiabilitiesNoncurrent	LiabilitiesNoncurrent	Liabilities	CommitmentsAndContingencies	CommonStocksIncludingAdditionalPaidInCapital	RetainedEarningsAccumulatedDeficit	AccumulatedOtherComprehensiveIncomeLossNetOfTax	StockholdersEquity	LiabilitiesAndStockholdersEquity
0	0	2017-09-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	…	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	134047000000	NaN
1	1	2018-09-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	…	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	107147000000	NaN

2 rows × 30 columns

Define Python Tool

Here we define the python tool using langchain’s PythonREPL. We also define functions_map that will later be used by the Cohere Agent to correctly map function name to the actual function. Lastly, we define the tools that will be passed in the Cohere API.

PYTHON

1 python_repl = PythonREPL()
2 python_tool = Tool(
3     name="python_repl",
4     description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.",
5     func=python_repl.run,
6 )
7 python_tool.name = "python_interpreter"
8 
9 class ToolInput(BaseModel):
10     code: str = Field(description="Python code to execute.")
11 python_tool.args_schema = ToolInput
12 
13 def run_python_code(code: str) -> dict:
14     """
15     Function to run given python code
16     """
17     input_code = ToolInput(code=code)
18     return {'python_answer': python_tool.func(input_code.code)}
19 
20 functions_map = {
21     "run_python_code": run_python_code,
22 }
23 
24 tools = [
25     {
26         "name": "run_python_code",
27         "description": "given a python code, runs it",
28         "parameter_definitions": {
29             "code": {
30                 "description": "executable python code",
31                 "type": "str",
32                 "required": True
33             }
34         }
35     },]

Create Cohere Agent

As Multi-Step Tool Use shows, you have a lot of flexiblity on how you can customize and interact with the cohere agent. Here I am creating a wrapper so that it automatically determines when to stop calling the tools and output final answer. It will run maximum of 15 steps.

PYTHON

1 def cohere_agent(
2     message: str,
3     preamble: str,
4     tools: List[dict],
5     force_single_step=False,
6     verbose: bool = False,
7 ) -> str:
8     """
9     Function to handle multi-step tool use api.
10 
11     Args:
12         message (str): The message to send to the Cohere AI model.
13         preamble (str): The preamble or context for the conversation.
14         tools (list of dict): List of tools to use in the conversation.
15         verbose (bool, optional): Whether to print verbose output. Defaults to False.
16 
17     Returns:
18         str: The final response from the call.
19     """
20 
21     counter = 1
22 
23     response = co.chat(
24         model=COHERE_MODEL,
25         message=message,
26         preamble=preamble,
27         tools=tools,
28         force_single_step=force_single_step,
29     )
30 
31     if verbose:
32         print(f"\nrunning 0th step.")
33         print(response.text)
34 
35     while response.tool_calls:
36         tool_results = []
37 
38         if verbose:
39             print(f"\nrunning {counter}th step.")
40 
41         for tool_call in response.tool_calls:
42             output = functions_map[tool_call.name](**tool_call.parameters)
43             outputs = [output]
44             tool_results.append({"call": tool_call, "outputs": outputs})
45 
46             if verbose:
47                 print(
48                     f"= running tool {tool_call.name}, with parameters: {tool_call.parameters}"
49                 )
50                 print(f"== tool results: {outputs}")
51 
52         response = co.chat(
53             model=COHERE_MODEL,
54             message="",
55             chat_history=response.chat_history,
56             preamble=preamble,
57             tools=tools,
58             force_single_step=force_single_step,
59             tool_results=tool_results,
60         )
61 
62         if verbose:
63             print(response.text)
64 
65             counter += 1
66 
67     return response.text
68 
69 
70 # test
71 output = cohere_agent("can you use python to answer 1 + 1", None, tools, verbose=True)

Output

running 0th step.
I will use Python to answer this question.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'print(1 + 1)'}
== tool results: [{'python_answer': '2\n'}]
The answer is **2**.

QnA over Single Table

In the example below, we show how the Python tool can be used to load a dataframe and extract information from it. To do this successfully we need to:

Pass the file name to the preamble so the model knows how to load the dataframe
Pass a preview of the dataframe in the preamble so the model knows which columns/rows to query

We will ask the following questions given income statement data.

What is the highest value of cost of goods and service?
What is the largest gross profit margin?
What is the minimum ratio of operating income loss divided by non operating income expense?

PYTHON

1 question_dict ={
2     'q1': ['what is the highest value of cost of goods and service?',169559000000],
3     'q2': ['what is the largest gross profit margin?',0.3836194330595236],
4     'q3': ['what is the minimum ratio of operating income loss divided by non operating income expense?',35.360599]
5 }

PYTHON

1 preamble = """
2 You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3 Here is a preview of the dataframe:
4 {head_df}
5 """.format(head_df=income_statement.head(3).to_markdown())
6 
7 print(preamble)

You are an expert who answers the user’s question. You are working with a pandas dataframe in Python. The name of the dataframe is income_statement.csv. Here is a preview of the dataframe:

	Unnamed: 0	index	RevenueFromContractWithCustomerExcludingAssessedTax	CostOfGoodsAndServicesSold	GrossProfit	ResearchAndDevelopmentExpense	SellingGeneralAndAdministrativeExpense	OperatingExpenses	OperatingIncomeLoss	NonoperatingIncomeExpense	IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest	IncomeTaxExpenseBenefit	NetIncomeLoss	EarningsPerShareBasic	EarningsPerShareDiluted	WeightedAverageNumberOfSharesOutstandingBasic	WeightedAverageNumberOfDilutedSharesOutstanding
0	0	2017-10-01-2018-09-29	265595000000	1.63756e+11	101839000000	1.4236e+10	1.6705e+10	3.0941e+10	7.0898e+10	2.005e+09	7.2903e+10	1.3372e+10	59531000000	3	2.98	1.98215e+10	2.00004e+10
1	1	2018-09-30-2018-12-29	84310000000	nan	32031000000	nan	nan	nan	nan	nan	nan	nan	19965000000	1.05	1.05	nan	nan
2	2	2018-09-30-2019-09-28	260174000000	1.61782e+11	98392000000	1.6217e+10	1.8245e+10	3.4462e+10	6.393e+10	1.807e+09	6.5737e+10	1.0481e+10	55256000000	2.99	2.97	1.84713e+10	1.85957e+10

PYTHON

1 for qsn,val in question_dict.items():
2     print(f'question:{qsn}')
3     question = val[0]
4     answer = val[1]
5     output = cohere_agent(question, preamble, tools, verbose=True)
6     print(f'GT Answer:{val[1]}')
7     print('-'*50)

Output

question:q1
running 0th step.
I will use Python to find the highest value of 'CostOfGoodsAndServicesSold' in the 'income_statement.csv' dataframe.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv(\'income_statement.csv\')\n\n# Find the highest value of \'CostOfGoodsAndServicesSold\'\nhighest_cost = df[\'CostOfGoodsAndServicesSold\'].max()\n\nprint(f"The highest value of \'CostOfGoodsAndServicesSold\' is {highest_cost}")'}
== tool results: [{'python_answer': "The highest value of 'CostOfGoodsAndServicesSold' is 169559000000.0\n"}]
The highest value of 'CostOfGoodsAndServicesSold' is 169559000000.0.
GT Answer:169559000000
--------------------------------------------------
question:q2
running 0th step.
I will write and execute Python code to find the largest gross profit margin.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv(\'income_statement.csv\')\n\n# Calculate gross profit margin\ndf[\'GrossProfitMargin\'] = df[\'GrossProfit\'] / df[\'RevenueFromContractWithCustomerExcludingAssessedTax\'] * 100\n\n# Find the largest gross profit margin\nlargest_gross_profit_margin = df[\'GrossProfitMargin\'].max()\n\nprint(f"The largest gross profit margin is {largest_gross_profit_margin:.2f}%")'}
== tool results: [{'python_answer': 'The largest gross profit margin is 38.36%\n'}]
The largest gross profit margin is 38.36%.
GT Answer:0.3836194330595236
--------------------------------------------------
question:q3
running 0th step.
I will use Python to find the minimum ratio of operating income loss divided by non-operating income expense.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv("income_statement.csv")\n\n# Calculate the ratio of operating income loss to non-operating income expense\ndf["OperatingIncomeLossRatio"] = df["OperatingIncomeLoss"] / df["NonoperatingIncomeExpense"]\n\n# Find the minimum ratio\nmin_ratio = df["OperatingIncomeLossRatio"].min()\n\nprint(f"The minimum ratio of operating income loss to non-operating income expense is: {min_ratio:.2f}")'}
== tool results: [{'python_answer': 'The minimum ratio of operating income loss to non-operating income expense is: 35.36\n'}]
The minimum ratio of operating income loss to non-operating income expense is 35.36.
GT Answer:35.360599
--------------------------------------------------

QnA over Multiple Tables

We now make the task for the Agent more complicated by asking a question that can be only answered by retrieving relevant information from multiple tables:

Q: What is the ratio of the largest stockholders equity to the smallest revenue?

As you will see below, this question can be obtained only by accessing both the balance sheet and the income statement.

PYTHON

1 question_dict ={
2     'q1': ['what is the ratio of the largest stockholders equity to the smallest revenue'],
3 }

PYTHON

1 # get the largest stockholders equity
2 x = balance_sheet['StockholdersEquity'].astype(float).max()
3 print(f"The largest stockholders equity value is: {x}")
4 
5 # get the smallest revenue
6 y = income_statement['RevenueFromContractWithCustomerExcludingAssessedTax'].astype(float).min()
7 print(f"The smallest revenue value is: {y}")
8 
9 # compute the ratio
10 ratio = x/y
11 print(f"Their ratio is: {ratio}")

Output

The largest stockholders equity value is: 134047000000.0
The smallest revenue value is: 53809000000.0
Their ratio is: 2.4911631883142227

PYTHON

1 preamble = """
2 You are an expert who answers the user's question in complete sentences. You are working with two pandas dataframe in Python. Ensure your output is a string.
3 
4 Here is a preview of the `income_statement.csv` dataframe:
5 {table_1}
6 
7 Here is a preview of the `balance_sheet.csv` dataframe:
8 {table_2}
9 """.format(table_1=income_statement.head(3).to_markdown(),table_2=balance_sheet.head(3).to_markdown())
10 
11 
12 print(preamble)

Output

You are an expert who answers the user's question in complete sentences. You are working with two pandas dataframe in Python. Ensure your output is a string.
Here is a preview of the `income_statement.csv` dataframe:
|    |   Unnamed: 0 | index                 |   RevenueFromContractWithCustomerExcludingAssessedTax |   CostOfGoodsAndServicesSold |   GrossProfit |   ResearchAndDevelopmentExpense |   SellingGeneralAndAdministrativeExpense |   OperatingExpenses |   OperatingIncomeLoss |   NonoperatingIncomeExpense |   IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest |   IncomeTaxExpenseBenefit |   NetIncomeLoss |   EarningsPerShareBasic |   EarningsPerShareDiluted |   WeightedAverageNumberOfSharesOutstandingBasic |   WeightedAverageNumberOfDilutedSharesOutstanding |
|---:|-------------:|:----------------------|------------------------------------------------------:|-----------------------------:|--------------:|--------------------------------:|-----------------------------------------:|--------------------:|----------------------:|----------------------------:|----------------------------------------------------------------------------------------------:|--------------------------:|----------------:|------------------------:|--------------------------:|------------------------------------------------:|--------------------------------------------------:|
|  0 |            0 | 2017-10-01-2018-09-29 |                                          265595000000 |                  1.63756e+11 |  101839000000 |                      1.4236e+10 |                               1.6705e+10 |          3.0941e+10 |            7.0898e+10 |                   2.005e+09 |                                                                                    7.2903e+10 |                1.3372e+10 |     59531000000 |                    3    |                      2.98 |                                     1.98215e+10 |                                       2.00004e+10 |
|  1 |            1 | 2018-09-30-2018-12-29 |                                           84310000000 |                nan           |   32031000000 |                    nan          |                             nan          |        nan          |          nan          |                 nan         |                                                                                  nan          |              nan          |     19965000000 |                    1.05 |                      1.05 |                                   nan           |                                     nan           |
|  2 |            2 | 2018-09-30-2019-09-28 |                                          260174000000 |                  1.61782e+11 |   98392000000 |                      1.6217e+10 |                               1.8245e+10 |          3.4462e+10 |            6.393e+10  |                   1.807e+09 |                                                                                    6.5737e+10 |                1.0481e+10 |     55256000000 |                    2.99 |                      2.97 |                                     1.84713e+10 |                                       1.85957e+10 |
Here is a preview of the `balance_sheet.csv` dataframe:
|    |   Unnamed: 0 | index      |   CashAndCashEquivalentsAtCarryingValue |   MarketableSecuritiesCurrent |   AccountsReceivableNetCurrent |   InventoryNet |   NontradeReceivablesCurrent |   OtherAssetsCurrent |   AssetsCurrent |   MarketableSecuritiesNoncurrent |   PropertyPlantAndEquipmentNet |   OtherAssetsNoncurrent |   AssetsNoncurrent |        Assets |   AccountsPayableCurrent |   OtherLiabilitiesCurrent |   ContractWithCustomerLiabilityCurrent |   CommercialPaper |   LongTermDebtCurrent |   LiabilitiesCurrent |   LongTermDebtNoncurrent |   OtherLiabilitiesNoncurrent |   LiabilitiesNoncurrent |   Liabilities |   CommitmentsAndContingencies |   CommonStocksIncludingAdditionalPaidInCapital |   RetainedEarningsAccumulatedDeficit |   AccumulatedOtherComprehensiveIncomeLossNetOfTax |   StockholdersEquity |   LiabilitiesAndStockholdersEquity |
|---:|-------------:|:-----------|----------------------------------------:|------------------------------:|-------------------------------:|---------------:|-----------------------------:|---------------------:|----------------:|---------------------------------:|-------------------------------:|------------------------:|-------------------:|--------------:|-------------------------:|--------------------------:|---------------------------------------:|------------------:|----------------------:|---------------------:|-------------------------:|-----------------------------:|------------------------:|--------------:|------------------------------:|-----------------------------------------------:|-------------------------------------:|--------------------------------------------------:|---------------------:|-----------------------------------:|
|  0 |            0 | 2017-09-30 |                            nan          |                  nan          |                   nan          |    nan         |                 nan          |         nan          |   nan           |                    nan           |                   nan          |            nan          |      nan           | nan           |             nan          |               nan         |                            nan         |        nan        |           nan         |        nan           |             nan          |                 nan          |            nan          | nan           |                           nan |                                   nan          |                         nan          |                                        nan        |         134047000000 |                      nan           |
|  1 |            1 | 2018-09-29 |                            nan          |                  nan          |                   nan          |    nan         |                 nan          |         nan          |   nan           |                    nan           |                   nan          |            nan          |      nan           | nan           |             nan          |               nan         |                            nan         |        nan        |           nan         |        nan           |             nan          |                 nan          |            nan          | nan           |                           nan |                                   nan          |                         nan          |                                        nan        |         107147000000 |                      nan           |
|  2 |            2 | 2019-09-28 |                              4.8844e+10 |                    5.1713e+10 |                     2.2926e+10 |      4.106e+09 |                   2.2878e+10 |           1.2352e+10 |     1.62819e+11 |                      1.05341e+11 |                     3.7378e+10 |              3.2978e+10 |        1.75697e+11 |   3.38516e+11 |               4.6236e+10 |                 3.772e+10 |                              5.522e+09 |          5.98e+09 |             1.026e+10 |          1.05718e+11 |               9.1807e+10 |                   5.0503e+10 |              1.4231e+11 |   2.48028e+11 |                             0 |                                     4.5174e+10 |                           4.5898e+10 |                                         -5.84e+08 |          90488000000 |                        3.38516e+11 |

PYTHON

1 for qsn,val in question_dict.items():
2     print(f'question:{qsn}')
3     question = val[0]
4     output = cohere_agent(question, preamble, tools, verbose=True)

Output

question:q1
running 0th step.
I will use the provided code to find the ratio of the largest stockholders equity to the smallest revenue.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\n# Read the CSV files into dataframes\nincome_statement = pd.read_csv(\'income_statement.csv\')\nbalance_sheet = pd.read_csv(\'balance_sheet.csv\')\n\n# Find the smallest revenue\nsmallest_revenue = income_statement[\'RevenueFromContractWithCustomerExcludingAssessedTax\'].min()\n\n# Find the largest stockholders equity\nlargest_stockholders_equity = balance_sheet[\'StockholdersEquity\'].max()\n\n# Calculate the ratio\nratio = largest_stockholders_equity / smallest_revenue\nprint(f"The ratio of the largest stockholders equity to the smallest revenue is {ratio:.2f}")'}
== tool results: [{'python_answer': 'The ratio of the largest stockholders equity to the smallest revenue is 2.49\n'}]
The ratio of the largest stockholders equity to the smallest revenue is 2.49.

Error Resilience

In the previous example over single table, the model successfully answered your questions. However, the model may not always have access to the preview of the data. You will see that when we remove the preview from the preamble, the model runs into an error and is not produce the answer. We will solve this problem with two different ways:

Asking the model to keep trying until it fixes the issue.
Giving the model another tool to view the data and telling it to preview the data before writing code.

You will see that the second method is able to come to the answer with fewer steps.

PYTHON

1 preamble = """
2 You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3 """
4 
5 question1 = "what is the highest value of cost of goods and service?"
6 
7 output = cohere_agent(question1, preamble, tools, verbose=True)

Output

running 0th step.
I will use Python to find the highest value of 'Cost of Goods and Service' in the 'income_statement.csv' file.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv(\'income_statement.csv\')\n\n# Find the highest value of \'Cost of Goods and Service\'\nhighest_cost = df[\'Cost of Goods and Service\'].max()\n\nprint(f"The highest value of \'Cost of Goods and Service\' is: {highest_cost}")'}
== tool results: [{'python_answer': "KeyError('Cost of Goods and Service')"}]
Sorry, there is no column named 'Cost of Goods and Service' in the 'income_statement.csv' file.

As you see above, the model failed to execute because it assumed certain column names but they turned out to be wrong. One simple fix is to tell the model to continue to solve the problem in the face of error.

PYTHON

1 preamble = """
2 You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3 If you run into error, keep trying until you fix it. You may need to view the data to understand the error.
4 """
5 
6 question1 = "what is the highest value of cost of goods and service?"
7 
8 output = cohere_agent(question1, preamble, tools, verbose=True)

Output

running 0th step.
I will use Python to find the highest value of cost of goods and service.
running 1th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv(\'income_statement.csv\')\n\n# Find the highest value of \'Cost of Goods and Services\'\nhighest_cost = df[\'Cost of Goods and Services\'].max()\n\nprint(f"The highest value of \'Cost of Goods and Services\' is: {highest_cost}")'}
== tool results: [{'python_answer': "KeyError('Cost of Goods and Services')"}]
I encountered an error. I will now view the data to understand the error.
running 2th step.
= running tool run_python_code, with parameters: {'code': "import pandas as pd\n\ndf = pd.read_csv('income_statement.csv')\n\nprint(df.columns)"}
== tool results: [{'python_answer': "Index(['Unnamed: 0', 'index',\n       'RevenueFromContractWithCustomerExcludingAssessedTax',\n       'CostOfGoodsAndServicesSold', 'GrossProfit',\n       'ResearchAndDevelopmentExpense',\n       'SellingGeneralAndAdministrativeExpense', 'OperatingExpenses',\n       'OperatingIncomeLoss', 'NonoperatingIncomeExpense',\n       'IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest',\n       'IncomeTaxExpenseBenefit', 'NetIncomeLoss', 'EarningsPerShareBasic',\n       'EarningsPerShareDiluted',\n       'WeightedAverageNumberOfSharesOutstandingBasic',\n       'WeightedAverageNumberOfDilutedSharesOutstanding'],\n      dtype='object')\n"}]
I found that the column name is 'CostOfGoodsAndServicesSold'. I will now use this to find the highest value.
running 3th step.
= running tool run_python_code, with parameters: {'code': 'import pandas as pd\n\ndf = pd.read_csv(\'income_statement.csv\')\n\n# Find the highest value of \'Cost of Goods and Services\'\nhighest_cost = df[\'CostOfGoodsAndServicesSold\'].max()\n\nprint(f"The highest value of \'Cost of Goods and Services\' is: {highest_cost}")'}
== tool results: [{'python_answer': "The highest value of 'Cost of Goods and Services' is: 169559000000.0\n"}]
The highest value of 'Cost of Goods and Services' is 169559000000.0.

Add Viewing Tool

What if we directly give the model the ability to view the data as a tool so that it can explicitly use it instead of indirectly figuring it out?

PYTHON

1 def view_csv_data(path: str) -> dict:
2     """
3     Function to view the head, tail and shape of a given csv file.
4     """
5     df = pd.read_csv(path)
6 
7     return {
8         "head": df.head().to_string(),
9         "tail": df.tail().to_string(),
10         "shape": str(df.shape),
11     }
12 
13 functions_map = {
14     "run_python_code": run_python_code,
15     "view_csv_data": view_csv_data
16 }
17 
18 tools = [
19     {
20         "name": "run_python_code",
21         "description": "given a python code, runs it",
22         "parameter_definitions": {
23             "code": {
24                 "description": "executable python code",
25                 "type": "str",
26                 "required": True
27             }
28         }
29     },
30     {
31         "name": "view_csv_data",
32         "description": "give path to csv data and get head, tail and shape of the data",
33         "parameter_definitions": {
34             "path": {
35                 "description": "path to csv",
36                 "type": "str",
37                 "required": True
38             }
39         }
40     },
41 ]

PYTHON

1 preamble = """
2 You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3 Always view the data first to write flawless code.
4 """
5 
6 question1 = "what is the highest value of cost of goods and service?"
7 
8 output = cohere_agent(question1, preamble, tools, verbose=True)

Output

running 0th step.
I will first view the data and then write and execute Python code to find the highest value of cost of goods and service.
running 1th step.
= running tool view_csv_data, with parameters: {'path': 'income_statement.csv'}
== tool results: [{'head': '   Unnamed: 0                  index  RevenueFromContractWithCustomerExcludingAssessedTax  CostOfGoodsAndServicesSold   GrossProfit  ResearchAndDevelopmentExpense  SellingGeneralAndAdministrativeExpense  OperatingExpenses  OperatingIncomeLoss  NonoperatingIncomeExpense  IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest  IncomeTaxExpenseBenefit  NetIncomeLoss  EarningsPerShareBasic  EarningsPerShareDiluted  WeightedAverageNumberOfSharesOutstandingBasic  WeightedAverageNumberOfDilutedSharesOutstanding\n0           0  2017-10-01-2018-09-29                                         265595000000                1.637560e+11  101839000000                   1.423600e+10                            1.670500e+10       3.094100e+10         7.089800e+10               2.005000e+09                                                                                 7.290300e+10             1.337200e+10    59531000000                   3.00                     2.98                                   1.982151e+10                                     2.000044e+10\n1           1  2018-09-30-2018-12-29                                          84310000000                         NaN   32031000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    19965000000                   1.05                     1.05                                            NaN                                              NaN\n2           2  2018-09-30-2019-09-28                                         260174000000                1.617820e+11   98392000000                   1.621700e+10                            1.824500e+10       3.446200e+10         6.393000e+10               1.807000e+09                                                                                 6.573700e+10             1.048100e+10    55256000000                   2.99                     2.97                                   1.847134e+10                                     1.859565e+10\n3           3  2018-12-30-2019-03-30                                          58015000000                         NaN   21821000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    11561000000                   0.62                     0.61                                            NaN                                              NaN\n4           4  2019-03-31-2019-06-29                                          53809000000                         NaN   20227000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    10044000000                   0.55                     0.55                                            NaN                                              NaN', 'tail': '    Unnamed: 0                  index  RevenueFromContractWithCustomerExcludingAssessedTax  CostOfGoodsAndServicesSold   GrossProfit  ResearchAndDevelopmentExpense  SellingGeneralAndAdministrativeExpense  OperatingExpenses  OperatingIncomeLoss  NonoperatingIncomeExpense  IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest  IncomeTaxExpenseBenefit  NetIncomeLoss  EarningsPerShareBasic  EarningsPerShareDiluted  WeightedAverageNumberOfSharesOutstandingBasic  WeightedAverageNumberOfDilutedSharesOutstanding\n6            6  2019-09-29-2019-12-28                                          91819000000                         NaN   35217000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    22236000000                   1.26                     1.25                                            NaN                                              NaN\n7            7  2019-09-29-2020-09-26                                         274515000000                1.695590e+11  104956000000                   1.875200e+10                            1.991600e+10       3.866800e+10         6.628800e+10                803000000.0                                                                                 6.709100e+10             9.680000e+09    57411000000                   3.31                     3.28                                   1.735212e+10                                     1.752821e+10\n8            8  2019-12-29-2020-03-28                                          58313000000                         NaN   22370000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    11249000000                   0.64                     0.64                                            NaN                                              NaN\n9            9  2020-03-29-2020-06-27                                          59685000000                         NaN   22680000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    11253000000                   0.65                     0.65                                            NaN                                              NaN\n10          10  2020-06-28-2020-09-26                                          64698000000                         NaN   24689000000                            NaN                                     NaN                NaN                  NaN                        NaN                                                                                          NaN                      NaN    12673000000                   0.74                     0.73                                            NaN                                              NaN', 'shape': '(11, 17)'}]
The column name is 'CostOfGoodsAndServicesSold'. I will now write and execute Python code to find the highest value in this column.
running 2th step.
= running tool run_python_code, with parameters: {'code': "import pandas as pd\n\ndf = pd.read_csv('income_statement.csv')\n\nprint(df['CostOfGoodsAndServicesSold'].max())"}
== tool results: [{'python_answer': '169559000000.0\n'}]
The highest value of cost of goods and services is 169559000000.0.

By being prescriptive, we were able to cut down a step and get to the answer faster.

1	import os
2	from typing import List
3
4	import cohere
5	import langchain
6	import langchain_core
7	import langchain_experimental
8	import pandas as pd
9	from langchain.agents import Tool
10	from langchain_core.pydantic_v1 import BaseModel, Field
11	from langchain_experimental.utilities import PythonREPL

1	# Uncomment if you need to install the following packages
2	# !pip install --quiet langchain langchain_experimental cohere --upgrade

1	# versions
2	print('cohere version:', cohere.__version__)
3	print('langchain version:', langchain.__version__)
4	print('langchain_core version:', langchain_core.__version__)
5	print('langchain_experimental version:', langchain_experimental.__version__)

1	COHERE_API_KEY = os.environ["COHERE_API_KEY"]
2	CHAT_URL= "https://api.cohere.ai/v1/chat"
3	COHERE_MODEL = 'command-r-plus'
4	co = cohere.Client(api_key=COHERE_API_KEY)

1	income_statement = pd.read_csv('income_statement.csv')
2	balance_sheet = pd.read_csv('balance_sheet.csv')

1	python_repl = PythonREPL()
2	python_tool = Tool(
3	name="python_repl",
4	description="Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.",
5	func=python_repl.run,
6	)
7	python_tool.name = "python_interpreter"
8
9	class ToolInput(BaseModel):
10	code: str = Field(description="Python code to execute.")
11	python_tool.args_schema = ToolInput
12
13	def run_python_code(code: str) -> dict:
14	"""
15	Function to run given python code
16	"""
17	input_code = ToolInput(code=code)
18	return {'python_answer': python_tool.func(input_code.code)}
19
20	functions_map = {
21	"run_python_code": run_python_code,
22	}
23
24	tools = [
25	{
26	"name": "run_python_code",
27	"description": "given a python code, runs it",
28	"parameter_definitions": {
29	"code": {
30	"description": "executable python code",
31	"type": "str",
32	"required": True
33	}
34	}
35	},]

1	def cohere_agent(
2	message: str,
3	preamble: str,
4	tools: List[dict],
5	force_single_step=False,
6	verbose: bool = False,
7	) -> str:
8	"""
9	Function to handle multi-step tool use api.
10
11	Args:
12	message (str): The message to send to the Cohere AI model.
13	preamble (str): The preamble or context for the conversation.
14	tools (list of dict): List of tools to use in the conversation.
15	verbose (bool, optional): Whether to print verbose output. Defaults to False.
16
17	Returns:
18	str: The final response from the call.
19	"""
20
21	counter = 1
22
23	response = co.chat(
24	model=COHERE_MODEL,
25	message=message,
26	preamble=preamble,
27	tools=tools,
28	force_single_step=force_single_step,
29	)
30
31	if verbose:
32	print(f"\nrunning 0th step.")
33	print(response.text)
34
35	while response.tool_calls:
36	tool_results = []
37
38	if verbose:
39	print(f"\nrunning {counter}th step.")
40
41	for tool_call in response.tool_calls:
42	output = functions_map[tool_call.name](**tool_call.parameters)
43	outputs = [output]
44	tool_results.append({"call": tool_call, "outputs": outputs})
45
46	if verbose:
47	print(
48	f"= running tool {tool_call.name}, with parameters: {tool_call.parameters}"
49	)
50	print(f"== tool results: {outputs}")
51
52	response = co.chat(
53	model=COHERE_MODEL,
54	message="",
55	chat_history=response.chat_history,
56	preamble=preamble,
57	tools=tools,
58	force_single_step=force_single_step,
59	tool_results=tool_results,
60	)
61
62	if verbose:
63	print(response.text)
64
65	counter += 1
66
67	return response.text
68
69
70	# test
71	output = cohere_agent("can you use python to answer 1 + 1", None, tools, verbose=True)

1	question_dict ={
2	'q1': ['what is the highest value of cost of goods and service?',169559000000],
3	'q2': ['what is the largest gross profit margin?',0.3836194330595236],
4	'q3': ['what is the minimum ratio of operating income loss divided by non operating income expense?',35.360599]
5	}

1	preamble = """
2	You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3	Here is a preview of the dataframe:
4	{head_df}
5	""".format(head_df=income_statement.head(3).to_markdown())
6
7	print(preamble)

1	for qsn,val in question_dict.items():
2	print(f'question:{qsn}')
3	question = val[0]
4	answer = val[1]
5	output = cohere_agent(question, preamble, tools, verbose=True)
6	print(f'GT Answer:{val[1]}')
7	print('-'*50)

1	question_dict ={
2	'q1': ['what is the ratio of the largest stockholders equity to the smallest revenue'],
3	}

1	# get the largest stockholders equity
2	x = balance_sheet['StockholdersEquity'].astype(float).max()
3	print(f"The largest stockholders equity value is: {x}")
4
5	# get the smallest revenue
6	y = income_statement['RevenueFromContractWithCustomerExcludingAssessedTax'].astype(float).min()
7	print(f"The smallest revenue value is: {y}")
8
9	# compute the ratio
10	ratio = x/y
11	print(f"Their ratio is: {ratio}")

1	preamble = """
2	You are an expert who answers the user's question in complete sentences. You are working with two pandas dataframe in Python. Ensure your output is a string.
3
4	Here is a preview of the `income_statement.csv` dataframe:
5	{table_1}
6
7	Here is a preview of the `balance_sheet.csv` dataframe:
8	{table_2}
9	""".format(table_1=income_statement.head(3).to_markdown(),table_2=balance_sheet.head(3).to_markdown())
10
11
12	print(preamble)

1	preamble = """
2	You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3	"""
4
5	question1 = "what is the highest value of cost of goods and service?"
6
7	output = cohere_agent(question1, preamble, tools, verbose=True)

1	preamble = """
2	You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3	If you run into error, keep trying until you fix it. You may need to view the data to understand the error.
4	"""
5
6	question1 = "what is the highest value of cost of goods and service?"
7
8	output = cohere_agent(question1, preamble, tools, verbose=True)

1	def view_csv_data(path: str) -> dict:
2	"""
3	Function to view the head, tail and shape of a given csv file.
4	"""
5	df = pd.read_csv(path)
6
7	return {
8	"head": df.head().to_string(),
9	"tail": df.tail().to_string(),
10	"shape": str(df.shape),
11	}
12
13	functions_map = {
14	"run_python_code": run_python_code,
15	"view_csv_data": view_csv_data
16	}
17
18	tools = [
19	{
20	"name": "run_python_code",
21	"description": "given a python code, runs it",
22	"parameter_definitions": {
23	"code": {
24	"description": "executable python code",
25	"type": "str",
26	"required": True
27	}
28	}
29	},
30	{
31	"name": "view_csv_data",
32	"description": "give path to csv data and get head, tail and shape of the data",
33	"parameter_definitions": {
34	"path": {
35	"description": "path to csv",
36	"type": "str",
37	"required": True
38	}
39	}
40	},
41	]

1	preamble = """
2	You are an expert who answers the user's question. You are working with a pandas dataframe in Python. The name of the dataframe is `income_statement.csv`.
3	Always view the data first to write flawless code.
4	"""
5
6	question1 = "what is the highest value of cost of goods and service?"
7
8	output = cohere_agent(question1, preamble, tools, verbose=True)