Multi-step Tool Use (Agents)
Tool use is a technique which allows Cohere’s models to invoke external tools: search engines, APIs, functions, databases, and so on.
Multi-step tool use happens when the output of one tool calling step is needed as the input to the another. In other words, tool-calling needs to happen in a sequence.
For example, given the web-search
tool, the model can start answering complex questions that require performing internet searches.
Also, note that multi-step is enabled in the Chat API by default.
Multi-step Tool Use With the Chat API
Step 1: Define the tools
Step 2: Run the tool use workflow
How Does Multi-step Tool Use Work?
Here’s an outline of the basic steps involved in multi-step tool use:
- Given a user request, the model comes up with a plan to solve the problem which answers questions such as “Which tools should be used,” and “In what order should they be used.”
- The model then carries out the plan by repeatedly executing actions (using whatever tools are appropriate), reasoning over the results, and re-evaluating the plan.
- After each Action -> Observation ->Reflection cycle, the model reflects about what to do next. This reflection involves analyzing what has been figured out so far, determining whether any changes need to be made to the plan, and what to do next. The model can take as many steps as it deems necessary.
- Once the model decides it knows how to answer the user question, it proceeds to generating the final response.
What is the difference between tool use and Retrieval Augmented Generation (RAG)?
Tool use is a natural extension of retrieval augmented generation (RAG). RAG is about enabling the model to interact with an information retrieval system (like a vector database). Our models are trained to be excellent at RAG use cases.
Tool use pushes this further, allowing Cohere models to go far beyond information retrieval, interact with search engines, APIs, functions, databases, and many other tools.
A Further Example With Multiple Tools
This section provides another example of multi-step tool use, this time with multiple tools. The notebook for this example can be found here.
This example demonstrates an agent that performs analysis on a Spotify tracks dataset (via a Python interpreter tool) while also having access to another tool: web search tool.
Step 1: Define the tools
Here, we define the web search tool, which uses the Tavily Python client to perform web searches.
Here, we define the Python interpreter tool, which uses the exec
function to execute Python code.
We’ll also need the spotify_data
dataset, which contains information about Spotify tracks such as the track information, release information, popularity metrics, and musical characteristics. You can find the dataset here.
Here is the task that the agent needs to perform:
Step 2: Run the tool use workflow
Next, we run the tool use workflow involving for steps:
- Get the user message
- Model generates tool calls, if any
- Execute tools based on the tool calls generated by the model
- Model either generates more tool calls or returns a response with citations
And here is an example output. In summary, the agent performs the task in a sequence of 3 steps:
- Inspect the dataset and get a list of its columns.
- Write and execute Python code to find the top 3 most streamed songs on Spotify in 2023 and their respective artists.
- Search for the age and citizenship of each artist on the internet.