Querying Structured Data (Tables)

Open in Colab

In the previous tutorials, we explored how to build agentic RAG applications over unstructured and semi-structured data. In this tutorial and the next, we’ll turn our focus to structured data.

This tutorial focuses on querying tables, and the next tutorial will be about querying SQL databases.

Consider a scenario where you have a CSV file containing evaluation results for an LLM application.

A user might ask questions like “What’s the average score for a specific use case?” or “Which configuration has the lowest latency?”. These queries require not just retrieval, but also data analysis and interpretation.

In this tutorial, we’ll cover:

Creating a function to execute Python code
Setting up a tool to interact with tabular data
Building an agent for querying tabular data
Running the agent

Let’s get started by setting up our environment and defining the necessary tools for our agent.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

PYTHON

1 ! pip install cohere pandas -qq

PYTHON

1 import json
2 import os
3 import cohere
4 
5 co = cohere.ClientV2(
6     "COHERE_API_KEY"
7 )  # Get your free API key: https://dashboard.cohere.com/api-keys

And here’s the data we’ll be working with. evaluation_results.csv is a CSV file containing evaluation results for a set of LLM applications - name extraction, email drafting, and article summarization.

The file has the following columns:

usecase: The use case.
run: The run ID.
score: The evaluation score for a particular run.
temperature: The temperature setting of the model for a particular run.
tokens: The number of tokens generated by the model for a particular run.
latency: The latency of the model’s response for a particular run.

Important: the data can be found here. Make sure to have the file in the same directory as this notebook for the imports to work correctly.

PYTHON

1 import pandas as pd
2 
3 df = pd.read_csv("evaluation_results.csv")
4 
5 df.head()

	usecase	run	score	temperature	tokens	latency
0	extract_names	A	0.5	0.3	103	1.12
1	draft_email	A	0.6	0.3	252	2.50
2	summarize_article	A	0.8	0.3	350	4.20
3	extract_names	B	0.2	0.3	101	2.85
4	draft_email	B	0.4	0.3	230	3.20

Creating a function to execute Python code

Here, we introduce a new tool that allows the agent to execute Python code and return the result. The agent will use this tool to generate pandas code for querying the data.

To create this tool, we’ll use the PythonREPL class from the langchain_experimental.utilities module. This class provides a sandboxed environment for executing Python code and returns the result.

First, we define a python_tool that uses the PythonREPL class to execute Python code and return the result.

Next, we define a ToolInput class to handle the input for the python_tool.

Finally, we create a function analyze_evaluation_results that takes a string of Python code as input, executes the code using the Python tool we created, and returns the result.

IMPORTANT:

The source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly.

PYTHON

1 from tool_def import (
2     analyze_evaluation_results,
3     analyze_evaluation_results_tool,
4 )

PYTHON

1 functions_map = {
2     "analyze_evaluation_results": analyze_evaluation_results
3 }

Setting up a tool to interact with tabular data

Next, we define the analyze_evaluation_results tool. There are many ways we can set up a tool to work with CSV data, and in this example, we are using the tool description to provide the agent with the necessary context for working with the CSV file, such as:

the name of the CSV file to load
the columns of the CSV file
additional instructions on what libraries to use (in this case, pandas)

The parameter of this tool is the code string containing the Python code that the agent writes to analyze the data.

PYTHON

1 analyze_evaluation_results_tool = {
2     "type": "function",
3     "function": {
4         "name": "analyze_evaluation_results",
5         "description": "Generate Python code using the pandas library to analyze evaluation results from a dataframe called `evaluation_results`. The dataframe has columns 'usecase','run','score','temperature','tokens', and 'latency'. You must start with `import pandas as pd` and read a CSV file called `evaluation_results.csv` into the `evaluation_results` dataframe.",
6         "parameters": {
7             "type": "object",
8             "properties": {
9                 "code": {
10                     "type": "string",
11                     "description": "Executable Python code",
12                 }
13             },
14             "required": ["code"],
15         },
16     },
17 }

PYTHON

1 tools = [analyze_evaluation_results_tool]

Building an agent for querying tabular data

Next, let’s create a run_agent function to run the agentic RAG workflow, the same as in Part 1.

The only change we are making here is to make the system message simpler and more specific since the agent now only has one tool.

PYTHON

1 system_message = """## Task and Context
2 ou are an assistant who helps developers analyze LLM application evaluation results from a CSV files."""

PYTHON

1 model = "command-a-03-2025"
2 
3 
4 def run_agent(query, messages=None):
5     if messages is None:
6         messages = []
7 
8     if "system" not in {m.get("role") for m in messages}:
9         messages.append({"role": "system", "content": system_message})
10 
11     # Step 1: get user message
12     print(f"Question:\n{query}")
13     print("=" * 50)
14 
15     messages.append({"role": "user", "content": query})
16 
17     # Step 2: Generate tool calls (if any)
18     response = co.chat(
19         model=model, messages=messages, tools=tools, temperature=0.3
20     )
21 
22     while response.message.tool_calls:
23 
24         print("TOOL PLAN:")
25         print(response.message.tool_plan, "\n")
26         print("TOOL CALLS:")
27         for tc in response.message.tool_calls:
28             if tc.function.name == "analyze_evaluation_results":
29                 print(f"Tool name: {tc.function.name}")
30                 tool_call_prettified = print(
31                     "\n".join(
32                         f"  {line}"
33                         for line_num, line in enumerate(
34                             json.loads(tc.function.arguments)[
35                                 "code"
36                             ].splitlines()
37                         )
38                     )
39                 )
40                 print(tool_call_prettified)
41             else:
42                 print(
43                     f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
44                 )
45         print("=" * 50)
46 
47         messages.append(
48             {
49                 "role": "assistant",
50                 "tool_calls": response.message.tool_calls,
51                 "tool_plan": response.message.tool_plan,
52             }
53         )
54 
55         # Step 3: Get tool results
56         for tc in response.message.tool_calls:
57             tool_result = functions_map[tc.function.name](
58                 **json.loads(tc.function.arguments)
59             )
60             tool_content = [
61                 (
62                     {
63                         "type": "document",
64                         "document": {"data": json.dumps(tool_result)},
65                     }
66                 )
67             ]
68 
69             messages.append(
70                 {
71                     "role": "tool",
72                     "tool_call_id": tc.id,
73                     "content": tool_content,
74                 }
75             )
76 
77         # Step 4: Generate response and citations
78         response = co.chat(
79             model=model,
80             messages=messages,
81             tools=tools,
82             temperature=0.3,
83         )
84 
85     messages.append(
86         {
87             "role": "assistant",
88             "content": response.message.content[0].text,
89         }
90     )
91 
92     # Print final response
93     print("RESPONSE:")
94     print(response.message.content[0].text)
95     print("=" * 50)
96 
97     # Print citations (if any)
98     verbose_source = (
99         False  # Change to True to display the contents of a source
100     )
101     if response.message.citations:
102         print("CITATIONS:\n")
103         for citation in response.message.citations:
104             print(
105                 f"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' "
106             )
107             print("Sources:")
108             for idx, source in enumerate(citation.sources):
109                 print(f"{idx+1}. {source.id}")
110                 if verbose_source:
111                     print(f"{source.tool_output}")
112             print("\n")
113 
114     return messages

Running the agent

Let’s ask the agent a few questions, starting with this query about the average evaluation score in run A.

To answer this query, the agent needs to write Python code that uses the pandas library to calculate the average evaluation score in run A. And it gets the answer right.

PYTHON

1 messages = run_agent("What's the average evaluation score in run A")
2 # Answer: 0.63

1 Question:
2 What's the average evaluation score in run A
3 ==================================================
4 
5 
6 Python REPL can execute arbitrary code. Use with caution.
7 
8 
9 TOOL PLAN:
10 I will write and execute Python code to calculate the average evaluation score in run A. 
11 
12 TOOL CALLS:
13 Tool name: analyze_evaluation_results
14     import pandas as pd
15     
16     df = pd.read_csv("evaluation_results.csv")
17     
18     # Calculate the average evaluation score in run A
19     average_score_run_A = df[df["run"] == "A"]["score"].mean()
20     
21     print(f"Average evaluation score in run A: {average_score_run_A}")
22 None
23 ==================================================
24 RESPONSE:
25 The average evaluation score in run A is 0.63.
26 ==================================================
27 CITATIONS:
28 
29 Start: 41| End:46| Text:'0.63.' 
30 Sources:
31 1. analyze_evaluation_results_phqpwwat2hgf:0

Next, we ask a slightly more complex question, this time about the latency of the highest-scoring run for one use case. This requires the agent to filter based on the use case, find the highest-scoring run, and return the latency value.

PYTHON

1 messages = run_agent(
2     "What's the latency of the highest-scoring run for the summarize_article use case?"
3 )
4 # Answer: 4.8

1 Question:
2 What's the latency of the highest-scoring run for the summarize_article use case?
3 ==================================================
4 TOOL PLAN:
5 I will write Python code to find the latency of the highest-scoring run for the summarize_article use case. 
6 
7 TOOL CALLS:
8 Tool name: analyze_evaluation_results
9     import pandas as pd
10     
11     df = pd.read_csv("evaluation_results.csv")
12     
13     # Filter for the summarize_article use case
14     use_case_df = df[df["usecase"] == "summarize_article"]
15     
16     # Find the highest-scoring run
17     highest_score_run = use_case_df.loc[use_case_df["score"].idxmax()]
18     
19     # Get the latency of the highest-scoring run
20     latency = highest_score_run["latency"]
21     
22     print(f"Latency of the highest-scoring run: {latency}")
23 None
24 ==================================================
25 RESPONSE:
26 The latency of the highest-scoring run for the summarize_article use case is 4.8.
27 ==================================================
28 CITATIONS:
29 
30 Start: 77| End:81| Text:'4.8.' 
31 Sources:
32 1. analyze_evaluation_results_es3hnnnp5pey:0

Next, we ask a question to compare the use cases in terms of token usage, and to show a markdown table to show the comparison.

PYTHON

1 messages = run_agent(
2     "Which use case uses the least amount of tokens on average? Show the comparison of all use cases in a markdown table."
3 )
4 # Answer: extract_names (106.25), draft_email (245.75), summarize_article (355.75)

1 Question:
2 Which use case uses the least amount of tokens on average? Show the comparison of all use cases in a markdown table.
3 ==================================================
4 TOOL PLAN:
5 I will use the analyze_evaluation_results tool to generate Python code to find the use case that uses the least amount of tokens on average. I will also generate code to create a markdown table to compare all use cases. 
6 
7 TOOL CALLS:
8 Tool name: analyze_evaluation_results
9     import pandas as pd
10     
11     evaluation_results = pd.read_csv("evaluation_results.csv")
12     
13     # Group by 'usecase' and calculate the average tokens
14     avg_tokens_by_usecase = evaluation_results.groupby('usecase')['tokens'].mean()
15     
16     # Find the use case with the least average tokens
17     least_avg_tokens_usecase = avg_tokens_by_usecase.idxmin()
18     
19     print(f"Use case with the least average tokens: {least_avg_tokens_usecase}")
20     
21     # Create a markdown table comparing average tokens for all use cases
22     markdown_table = avg_tokens_by_usecase.reset_index()
23     markdown_table.columns = ["Use Case", "Average Tokens"]
24     print(markdown_table.to_markdown(index=False))
25 None
26 ==================================================
27 RESPONSE:
28 The use case that uses the least amount of tokens on average is extract_names.
29 
30 Here is a markdown table comparing the average tokens for all use cases:
31 
32 | Use Case | Average Tokens |
33 |:-------------------------|-------------------------------:|
34 | draft_email | 245.75 |
35 | extract_names | 106.25 |
36 | summarize_article | 355.75 |
37 ==================================================
38 CITATIONS:
39 
40 Start: 64| End:78| Text:'extract_names.' 
41 Sources:
42 1. analyze_evaluation_results_zp68h5304e3v:0
43 
44 
45 Start: 156| End:164| Text:'Use Case' 
46 Sources:
47 1. analyze_evaluation_results_zp68h5304e3v:0
48 
49 
50 Start: 167| End:181| Text:'Average Tokens' 
51 Sources:
52 1. analyze_evaluation_results_zp68h5304e3v:0
53 
54 
55 Start: 248| End:259| Text:'draft_email' 
56 Sources:
57 1. analyze_evaluation_results_zp68h5304e3v:0
58 
59 
60 Start: 262| End:268| Text:'245.75' 
61 Sources:
62 1. analyze_evaluation_results_zp68h5304e3v:0
63 
64 
65 Start: 273| End:286| Text:'extract_names' 
66 Sources:
67 1. analyze_evaluation_results_zp68h5304e3v:0
68 
69 
70 Start: 289| End:295| Text:'106.25' 
71 Sources:
72 1. analyze_evaluation_results_zp68h5304e3v:0
73 
74 
75 Start: 300| End:317| Text:'summarize_article' 
76 Sources:
77 1. analyze_evaluation_results_zp68h5304e3v:0
78 
79 
80 Start: 320| End:326| Text:'355.75' 
81 Sources:
82 1. analyze_evaluation_results_zp68h5304e3v:0

Summary

In this tutorial, we learned about:

How to create a function to execute Python code
How to set up a tool to interact with tabular data
How to run the agent

By implementing these techniques, we’ve expanded our agentic RAG system to handle structured data in the form of tables.

While this tutorial demonstrated how to work with tabular data using pandas and Python, the agentic RAG approach can be applied to other forms of structured data as well. This means we can build agents that can translate natural language queries into various types of data analysis tasks.

In Part 6, we’ll learn how to do structured query generation for SQL databases.