Financial CSV Agent with Native Multi-Step Cohere API
Notebook Overview
Motivation
Tabular data reasoning continues to be an exciting application of Agents, particularly in the financial domain, where multiple reasoning steps may be needed to identify the right answer. For example, a financial analyst working with financial statements may be interested in computing various financial ratios with natural language queries.
Some examples may include:
- ROE (Return on Equity) = Net Income / Shareholder’s Equity
- Net Profit Margin = Net Income / Revenue
- Asset Turnover = Revenue / Average Total Assets
- Financial Leverage = Average Total Assets / Shareholder’s Equity
Having an Agent which is able to correctly compute these and other ratios would be a great help for any analyst in the field of Finance.
Objective
In this notebook we explore how to setup a Cohere Agent to answer questions over tables in Apple’s SEC10K 2020 form. Financial CSV Agent already showed how to use Langchain to ask questions about your data. This notebook will demonstrate how you can build the same agent using Cohere’s native API with Langchain Python tool. We will also explore how to make your agent more resilient to errors.
Setup
API Key
Data Loading
Unnamed: 0 | index | RevenueFromContractWithCustomerExcludingAssessedTax | CostOfGoodsAndServicesSold | GrossProfit | ResearchAndDevelopmentExpense | SellingGeneralAndAdministrativeExpense | OperatingExpenses | OperatingIncomeLoss | NonoperatingIncomeExpense | IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest | IncomeTaxExpenseBenefit | NetIncomeLoss | EarningsPerShareBasic | EarningsPerShareDiluted | WeightedAverageNumberOfSharesOutstandingBasic | WeightedAverageNumberOfDilutedSharesOutstanding | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2017-10-01-2018-09-29 | 265595000000 | 1.637560e+11 | 101839000000 | 1.423600e+10 | 1.670500e+10 | 3.094100e+10 | 7.089800e+10 | 2.005000e+09 | 7.290300e+10 | 1.337200e+10 | 59531000000 | 3.00 | 2.98 | 1.982151e+10 | 2.000044e+10 |
1 | 1 | 2018-09-30-2018-12-29 | 84310000000 | NaN | 32031000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 19965000000 | 1.05 | 1.05 | NaN | NaN |
Unnamed: 0 | index | CashAndCashEquivalentsAtCarryingValue | MarketableSecuritiesCurrent | AccountsReceivableNetCurrent | InventoryNet | NontradeReceivablesCurrent | OtherAssetsCurrent | AssetsCurrent | MarketableSecuritiesNoncurrent | … | LongTermDebtNoncurrent | OtherLiabilitiesNoncurrent | LiabilitiesNoncurrent | Liabilities | CommitmentsAndContingencies | CommonStocksIncludingAdditionalPaidInCapital | RetainedEarningsAccumulatedDeficit | AccumulatedOtherComprehensiveIncomeLossNetOfTax | StockholdersEquity | LiabilitiesAndStockholdersEquity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2017-09-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 134047000000 | NaN |
1 | 1 | 2018-09-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | … | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 107147000000 | NaN |
2 rows × 30 columns
Define Python Tool
Here we define the python tool using langchain’s PythonREPL. We also define functions_map
that will later be used by the Cohere Agent to correctly map function name to the actual function. Lastly, we define the tools that will be passed in the Cohere API.
Create Cohere Agent
As Multi-Step Tool Use shows, you have a lot of flexiblity on how you can customize and interact with the cohere agent. Here I am creating a wrapper so that it automatically determines when to stop calling the tools and output final answer. It will run maximum of 15 steps.
QnA over Single Table
In the example below, we show how the Python tool can be used to load a dataframe and extract information from it. To do this successfully we need to:
- Pass the file name to the preamble so the model knows how to load the dataframe
- Pass a preview of the dataframe in the preamble so the model knows which columns/rows to query
We will ask the following questions given income statement data.
- What is the highest value of cost of goods and service?
- What is the largest gross profit margin?
- What is the minimum ratio of operating income loss divided by non operating income expense?
You are an expert who answers the user’s question. You are working with a pandas dataframe in Python. The name of the dataframe is income_statement.csv
.
Here is a preview of the dataframe:
QnA over Multiple Tables
We now make the task for the Agent more complicated by asking a question that can be only answered by retrieving relevant information from multiple tables:
- Q: What is the ratio of the largest stockholders equity to the smallest revenue?
As you will see below, this question can be obtained only by accessing both the balance sheet and the income statement.
Error Resilience
In the previous example over single table, the model successfully answered your questions. However, the model may not always have access to the preview of the data. You will see that when we remove the preview from the preamble, the model runs into an error and is not produce the answer. We will solve this problem with two different ways:
- Asking the model to keep trying until it fixes the issue.
- Giving the model another tool to view the data and telling it to preview the data before writing code.
You will see that the second method is able to come to the answer with fewer steps.
As you see above, the model failed to execute because it assumed certain column names but they turned out to be wrong. One simple fix is to tell the model to continue to solve the problem in the face of error.
Add Viewing Tool
What if we directly give the model the ability to view the data as a tool so that it can explicitly use it instead of indirectly figuring it out?
By being prescriptive, we were able to cut down a step and get to the answer faster.