How to Evaluate your LLM Response

You can leverage Command R to evaluate natural language responses that cannot be easily scored with manual rules.

Prompt

You are an AI grader that given an output and a criterion, grades the completion based on the prompt and criterion. Below is a prompt, a completion, and a criterion with which to
grade the completion. You need to respond according to the criterion instructions.
## Output
The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
Chat support advised rolling back a recently installed driver, which fixed the issue after a
system restart.
## Criterion
Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
and business appropriate tone and 0 being an informal tone. Respond only with the score.

Output

0.8

API Request

PYTHON
1import cohere
2
3co = cohere.ClientV2(api_key="<YOUR API KEY>")
4
5response = co.chat(
6 model="command-r-plus-08-2024",
7 messages=[
8 {
9 "role": "user",
10 "content": """
11 You are an AI grader that given an output and a criterion, grades the completion based on
12 the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade
13 the completion. You need to respond according to the criterion instructions.
14
15 ## Output
16 The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
17 Chat support advised rolling back a recently installed driver, which fixed the issue after a
18 system restart.
19
20 ## Criterion
21 Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
22 and business appropriate tone and 0 being an informal tone. Respond only with the score.
23 """,
24 }
25 ],
26)
27
28print(response.message.content[0].text)
Built with