PDF Extractor with Native Multi Step Tool Use
Objective
Generally, users are limited to text inputs when using large language models (LLMs), but agents enable the model to do more than ingest or output text information. Using tools, LLMs can call other APIs, save data, and much more. In this notebook, we will explore how we can leverage agents to extract information from PDFs. We will mimic an application where the user uploads PDF files and the agent extracts useful information. This can be useful when the text information has varying formats and you need to extract various types of information.
In the directory, we have a simple_invoice.pdf file. Everytime a user uploads the document, the agent will extract key information total_amount and invoice_number and then save it as CSV which then can be used in another application. We only extract two pieces of information in this demo, but users can extend the example and extract a lot more information.
Steps
- extract_pdf() function extracts text data from the PDF using unstructured package. You can use other packages like PyPDF2 as well.
- This extracted text is added to the prompt so the model can “see” the document.
- The agent summarizes the document and passes that information to convert_to_json() function. This function makes another call to command model to convert the summary to json output. This separation of tasks is useful when the text document is complicated and long. Therefore, we first distill the information and ask another model to convert the text into json object. This is useful so each model or agent focuses on its own task without suffering from long context.
- Then the json object goes through a check to make sure all keys are present and gets saved as a csv file. When the document is too long or the task is too complex, the model may fail to extract all information. These checks are then very useful because they give feedback to the model so it can adjust it’s parameters to retry.
Setup
Data
The sample invoice data is from https://unidoc.io/media/simple-invoices/simple_invoice.pdf.
Tool
Here we define the tool which converts summary of the pdf into json object. Then, it checks to make sure all necessary keys are present and saves it as csv.
Cohere Agent
Below is a cohere agent that leverages multi-step API. It is equipped with convert_to_json tool.
main
As shown above, the model first summarized the extracted pdf as Total amount: $115.00\nInvoice number: 0852
and sent this to conver_to_json()
function.
conver_to_json()
then converts it to json format and saves it into a csv file.