Generally, users are limited to text inputs when using large language models (LLMs), but agents enable the model to do more than ingest or output text information. Using tools, LLMs can call other APIs, save data, and much more. In this notebook, we will explore how we can leverage agents to extract information from PDFs. We will mimic an application where the user uploads PDF files and the agent extracts useful information. This can be useful when the text information has varying formats and you need to extract various types of information.
In the directory, we have a simple_invoice.pdf file. Everytime a user uploads the document, the agent will extract key information total_amount and invoice_number and then save it as CSV which then can be used in another application. We only extract two pieces of information in this demo, but users can extend the example and extract a lot more information.
The sample invoice data is from https://unidoc.io/media/simple-invoices/simple_invoice.pdf.
Here we define the tool which converts summary of the pdf into json object. Then, it checks to make sure all necessary keys are present and saves it as csv.
Below is a cohere agent that leverages multi-step API. It is equipped with convert_to_json tool.
As shown above, the model first summarized the extracted pdf as Total amount: $115.00\nInvoice number: 0852 and sent this to conver_to_json() function.
conver_to_json() then converts it to json format and saves it into a csv file.