How to Evaluate your LLM Response

You can leverage Command A to evaluate natural language responses that cannot be easily scored with manual rules.

Prompt

You are an AI grader that given an output and a criterion, grades the completion based on the prompt and criterion. Below is a prompt, a completion, and a criterion with which to
grade the completion. You need to respond according to the criterion instructions.
## Output
The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
Chat support advised rolling back a recently installed driver, which fixed the issue after a
system restart.
## Criterion 
Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
and business appropriate tone and 0 being an informal tone. Respond only with the score.

Output

0.8

API Request

PYTHON

1 import cohere
2 
3 co = cohere.ClientV2(api_key="<YOUR API KEY>")
4 
5 response = co.chat(
6     model="command-a-03-2025",
7     messages=[
8         {
9             "role": "user",
10             "content": """
11     You are an AI grader that given an output and a criterion, grades the completion based on
12     the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade
13     the completion. You need to respond according to the criterion instructions.
14     
15     ## Output
16     The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
17     Chat support advised rolling back a recently installed driver, which fixed the issue after a
18     system restart.
19     
20     ## Criterion 
21     Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
22     and business appropriate tone and 0 being an informal tone. Respond only with the score.
23     """,
24         }
25     ],
26 )
27 
28 print(response.message.content[0].text)