PDF Extractor with Native Multi Step Tool Use

Jason JungJason Jung

Objective

Generally, users are limited to text inputs when using large language models (LLMs), but agents enable the model to do more than ingest or output text information. Using tools, LLMs can call other APIs, save data, and much more. In this notebook, we will explore how we can leverage agents to extract information from PDFs. We will mimic an application where the user uploads PDF files and the agent extracts useful information. This can be useful when the text information has varying formats and you need to extract various types of information.

In the directory, we have a simple_invoice.pdf file. Everytime a user uploads the document, the agent will extract key information total_amount and invoice_number and then save it as CSV which then can be used in another application. We only extract two pieces of information in this demo, but users can extend the example and extract a lot more information.

Steps

  1. extract_pdf() function extracts text data from the PDF using unstructured package. You can use other packages like PyPDF2 as well.
  2. This extracted text is added to the prompt so the model can “see” the document.
  3. The agent summarizes the document and passes that information to convert_to_json() function. This function makes another call to command model to convert the summary to json output. This separation of tasks is useful when the text document is complicated and long. Therefore, we first distill the information and ask another model to convert the text into json object. This is useful so each model or agent focuses on its own task without suffering from long context.
  4. Then the json object goes through a check to make sure all keys are present and gets saved as a csv file. When the document is too long or the task is too complex, the model may fail to extract all information. These checks are then very useful because they give feedback to the model so it can adjust it’s parameters to retry.
PYTHON
1import os
2
3import cohere
4import pandas as pd
5import json
6from unstructured.partition.pdf import partition_pdf
PYTHON
1# uncomment to install dependencies
2# !pip install cohere unstructured
PYTHON
1# versions
2print('cohere version:', cohere.__version__)
Output
cohere version: 5.5.1

Setup

PYTHON
1COHERE_API_KEY = os.environ.get("CO_API_KEY")
2COHERE_MODEL = 'command-r-plus'
3co = cohere.Client(api_key=COHERE_API_KEY)

Data

The sample invoice data is from https://unidoc.io/media/simple-invoices/simple_invoice.pdf.

Tool

Here we define the tool which converts summary of the pdf into json object. Then, it checks to make sure all necessary keys are present and saves it as csv.

PYTHON
1def convert_to_json(text: str) -> dict:
2 """
3 Given text files, convert to json object and saves to csv.
4
5 Args:
6 text (str): The text to extract information from.
7
8 Returns:
9 dict: A dictionary containing the result of the conversion process.
10 """
11
12 MANDATORY_FIELDS = [
13 "total_amount",
14 "invoice_number",
15 ]
16
17 message = """# Instruction
18 Given the text, convert to json object with the following keys:
19 total_amount, invoice_number
20
21 # Output format json:
22 {{
23 "total_amount": "<extracted amount="" invoice="" total="">",
24 "invoice_number": "<extracted invoice="" number="">",
25 }}
26
27 Do not output code blocks.
28
29 # Extracted PDF
30 {text}
31 """
32
33 result = co.chat(
34 message=message.format(text=text), model=COHERE_MODEL, preamble=None
35 ).text
36
37 try:
38 result = json.loads(result)
39 # check if all keys are present
40 if not all(i in result.keys() for i in MANDATORY_FIELDS):
41 return {"result": f"ERROR: Keys are missing. Please check your result {result}"}
42
43 df = pd.DataFrame(result, index=[0])
44 df.to_csv("output.csv", index=False)
45 return {"result": "SUCCESS. All steps have been completed."}
46
47 except Exception as e:
48 return {"result": f"ERROR: Could not load the result as json. Please check the result: {result} and ERROR: {e}"}

Cohere Agent

Below is a cohere agent that leverages multi-step API. It is equipped with convert_to_json tool.

PYTHON
1def cohere_agent(
2 message: str,
3 preamble: str,
4 verbose: bool = False,
5) -> str:
6 """
7 Function to handle multi-step tool use api.
8
9 Args:
10 message (str): The message to send to the Cohere AI model.
11 preamble (str): The preamble or context for the conversation.
12 verbose (bool, optional): Whether to print verbose output. Defaults to False.
13
14 Returns:
15 str: The final response from the call.
16 """
17
18 functions_map = {
19 "convert_to_json": convert_to_json,
20 }
21
22 tools = [
23 {
24 "name": "convert_to_json",
25 "description": "Given a text, convert it to json object.",
26 "parameter_definitions": {
27 "text": {
28 "description": "text to be converted into json",
29 "type": "str",
30 "required": True,
31 },
32 },
33 }
34 ]
35
36 counter = 1
37
38 response = co.chat(
39 model=COHERE_MODEL,
40 message=message,
41 preamble=preamble,
42 tools=tools,
43 )
44
45 if verbose:
46 print(f"\nrunning step 0")
47 print(response.text)
48
49 while response.tool_calls:
50 tool_results = []
51
52 if verbose:
53 print(f"\nrunning step {counter}")
54 for tool_call in response.tool_calls:
55 print("tool_call.parameters:", tool_call.parameters)
56 if tool_call.parameters:
57 output = functions_map[tool_call.name](**tool_call.parameters)
58 else:
59 output = functions_map[tool_call.name]()
60
61 outputs = [output]
62 tool_results.append({"call": tool_call, "outputs": outputs})
63
64 if verbose:
65 print(
66 f"= running tool {tool_call.name}, with parameters: {tool_call.parameters}"
67 )
68 print(f"== tool results: {outputs}")
69
70 response = co.chat(
71 model=COHERE_MODEL,
72 message="",
73 chat_history=response.chat_history,
74 preamble=preamble,
75 tools=tools,
76 tool_results=tool_results,
77 )
78
79 if verbose:
80 print(response.text)
81 counter += 1
82
83 return response.text

main

PYTHON
1def extract_pdf(path):
2 """
3 Function to extract text from a PDF file.
4 """
5 elements = partition_pdf(path)
6 return "\n".join([str(el) for el in elements])
7
8
9def pdf_extractor(pdf_path):
10 """
11 Main function that extracts pdf and calls the cohere agent.
12 """
13 pdf_text = extract_pdf(pdf_path)
14
15 prompt = f"""
16 # Instruction
17 You are expert at extracting invoices from PDF. The text of the PDF file is given below.
18
19 You must follow the steps below:
20 1. Summarize the text and extract only the most information: total amount billed and invoice number.
21 2. Using the summary above, call convert_to_json tool, which uses the summary from step 1.
22 If you run into issues. Identifiy the issue and retry.
23 You are not done unless you see SUCCESS in the tool output.
24
25 # File Name:
26 {pdf_path}
27
28 # Extracted Text:
29 {pdf_text}
30 """
31 output = cohere_agent(prompt, None, verbose=True)
32 print(f"Finished extracting: {pdf_path}")
33
34 print('Please check the output below')
35 print(pd.read_csv('output.csv'))
36
37
38pdf_extractor('simple_invoice.pdf')
Output
running step 0
I will summarise the text and then use the convert_to_json tool to format the summary.
running step 1
tool_call.parameters: {'text': 'Total amount billed: $115.00\nInvoice number: 0852'}
= running tool convert_to_json, with parameters: {'text': 'Total amount billed: $115.00\nInvoice number: 0852'}
== tool results: [{'result': 'SUCCESS. All steps have been completed.'}]
SUCCESS.
Finished extracting: simple_invoice.pdf
Please check the output below
total_amount invoice_number
0 $115.00 852

As shown above, the model first summarized the extracted pdf as Total amount: $115.00\nInvoice number: 0852 and sent this to conver_to_json() function. conver_to_json() then converts it to json format and saves it into a csv file.