Short-Term Memory Handling for Agents
Motivation
Users frequently engage in long conversations with LLMs with the expectation that the LLM fully recalls what was said in the previous turns.
The way we make LLMs aware of previous information is by simply providing them the full conversation history, i.e., the concatenation of previous input queries and generations.
As Agents become more and more used, we face the issue to make Agents fully aware of the info from previous turns. The current approach is to pass the Agent the generations of the previous turns (see https://python.langchain.com/docs/modules/agents/how_to/custom_agent/). However, as we show below, while this is a good approach for LLMs it is not for Agents. Given the same input, LLMs only produce the final generation; conversely, Agents first produce a reasoning chain (intermediate steps), then produce the final outcome. Hence, if we only retain the final generation we lose some crucial info: the reasoning chain.
A straightforward solution to this issue would be to append to the conversation history from both the reasoning chain and the generations. This is problematic due to the fact that reasoning chains can be very long, especially when the model makes mistakes and corrects itself. Using the full reasoning chains would (i) introduce a lot of noise; (ii) quickly fill the whole input window of the model.
Objective
In this notebook we introduce a simple approach to address the issue described above. We propose to use augmented memory objects, which we define as compact and interpretable pieces of information based on the reasoning chain and the generation.
Below, we show that, with augmented memory objects, the Agent is more aware of the information that emerged in the conversation, and, in turn, this makes the Agent behaviour more robust and effective.
Step 1: Setup the Prompt and the Agent
Step 2: Conversation without memory
Without memory, the model cannot answer follow up questions because it misses the necessary previous context
Step 3: Conversation with Memory using AI Messages
Here we will populate the chat history only with the generations from the model. This is the current approach used, e.g., here: https://python.langchain.com/docs/modules/agents/how_to/custom_agent/
Also in this case, the model cannot manage the follow up question. The reason is that the AI message only tells part of the necessary context: we need more info from previous turns.
Step 4: Conversation with Memory using AI Messages and Human Messages
It works! Let’s go on with the conversation.
The model does what we asked, but it decides to introduce the marker=“o” in the plotting function. While in this case the modification of the code does not affect the quality of the output, this is still an undesidered behaviour, since the model is introducing a modification that was not required.
To address this problem, we can further enrich the chat history by adding information from the reasoning chain.
Step 5: Conversation with Memory using AI Messages, Human Messages and the Reasoning Chain
Reasoning chains can be very long, especially in the cases that contain errors and the agent needs several attempts to get to the final output. Hence, by concatenating all the reasoning chains, we might have two issues: (i) noisy information; (ii) we would quickly hit max input length.
To avoid this issue, we need a way to extract the relevant info from the previous turns. Below, we propose a simple approach to info extraction. We format the extracted info in such a way to enhance human interpretability. We call the objects passed in the chat history augmented memory objects.
Below, an example of the augmented memory object generated by the model. You can see that the agent now has full visibility on what it did in the previous step.
We can see that, now, the plot only includes the modification we asked for, and nothing else. This is possible because we are now providing the Agent with the code it previously generated, and the Agent re-uses that code, making only the necessary modifications. This is fundamentally different from what we observed before, when the Agent had to re-create from scratch the code.
In sum, by providing the Agent with the information about its previous Reasoning Chain, we make it more robust and able to generate consistent outputs.
In a future post, we will explore how to handle really long historical context using vector databases.