Financial CSV Agent with Langchain
Notebook Overview
Motivation
Tabular data reasoning continues to be an exciting application of Agents, particularly in the financial domain, where multiple reasoning steps may be needed to identify the right answer. For example, a financial analyst working with financial statements may be interested in computing various financial ratios with natural language queries.
Some examples may include:
- ROE (Return on Equity) = Net Income / Shareholder’s Equity
- Net Profit Margin = Net Income / Revenue
- Asset Turnover = Revenue / Average Total Assets
- Financial Leverage = Average Total Assets / Shareholder’s Equity
Having an Agent which is able to correctly compute these and other ratios would be a great help for any analyst in the field of Finance.
Objective
In this notebook we explore how to setup a Cohere ReAct Agent to answer questions over tables in Apple’s SEC10K 2020 form. We show how this can be done with two variants of a Langchain Python tool, one that requires you to pass the full path of the dataframe and another that requires the dataframe objects to be loaded in memory. While there is no major difference between the two approaches, we present both to show how to manage files that are loaded in memory vs files in storage.
Setup
Introduction
The aim of this notebook is to showcase how Cohere Langchain Agents can be used to answer questions over tabular data. This notebook assumes the input is a set of csv files extracted from Apple’s SEC 10K filings.
Data Loading
We use the sec-api to download the income statement and balance sheet from the SEC 10K .htm file. Please note that the tables need to be parsed such that the index is numerical as the python code generation struggles to filter based on index. We have processed the tables and provided them for you. They can be found here.
QnA over Single Table
Agent with Python Tool
In the example below, we show how the python tool can be used to load a dataframe and extract information from it. To do this successfully we need to:
- Pass the file name to the preamble so the model knows how to load the dataframe.
- Pass a preview of the dataframe in the preamble so the model knows which columns/rows to query.
First, let’s implement the ReAct agent.
Then, we define the dictionary including the questions we want the Agent to answer, and their answer.
Let’s now see how the Agent answers each of the questions.
Nice! The agent uses the Python tool to write the code to access the data in the tables, and perform the required calculations.
Agent with Python Tool that takes tables as input
In the example above, the model needs to load the dataframe in the python call before carrying out operations. In this example, we show how to pass the dataframes to the python tool so it has the file already loaded.
Let’s loop again over the same dictionary of questions.
Also in this case, the Agent correctly answers all the questions.
QnA over Multiple Tables
We now make the task for the Agent more complicated, by asking it questions whose answer can be computed only by retrieving relevant information from multiple tables.
We now define a new question.
The answer to this question can be obtained only by accessing both the balance sheet and the income statement, as shown below:
Let’s now get the answer from the Agent.
The Agent defined a plan (“write and execute Python code to find the largest and smallest values in the relevant columns of the dataframes, then calculate the ratio”) and executed it. It made some mistakes when coding, that resulted in NameError errors, but it fixed them and finally got to the correct answer