How to Evaluate your LLM Response (v1 API)

You can leverage our Command models to evaluate natural language responses that cannot be easily scored with manual rules.

Prompt

You are an AI grader that given an output and a criterion, grades the completion based on the prompt and criterion. Below is a prompt, a completion, and a criterion with which to
grade the completion. You need to respond according to the criterion instructions.
## Output
The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
Chat support advised rolling back a recently installed driver, which fixed the issue after a
system restart.
## Criterion 
Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
and business appropriate tone and 0 being an informal tone. Respond only with the score.

Output

0.8

API Request

PYTHON

1 import cohere
2 
3 co = cohere.Client(api_key="Your API key")
4 response = co.chat(
5     model="command-a-03-2025",
6     message="""
7 You are an AI grader that given an output and a criterion, grades the completion based on
8 the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade
9 the completion. You need to respond according to the criterion instructions.
10 
11 ## Output
12 The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
13 Chat support advised rolling back a recently installed driver, which fixed the issue after a
14 system restart.
15 
16 ## Criterion 
17 Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
18 and business appropriate tone and 0 being an informal tone. Respond only with the score.
19 """,
20 )
21 print(response.text)

1	import cohere
2
3	co = cohere.Client(api_key="Your API key")
4	response = co.chat(
5	model="command-a-03-2025",
6	message="""
7	You are an AI grader that given an output and a criterion, grades the completion based on
8	the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade
9	the completion. You need to respond according to the criterion instructions.
10
11	## Output
12	The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
13	Chat support advised rolling back a recently installed driver, which fixed the issue after a
14	system restart.
15
16	## Criterion
17	Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
18	and business appropriate tone and 0 being an informal tone. Respond only with the score.
19	""",
20	)
21	print(response.text)