How to Evaluate your LLM Response

You can leverage Command R to evaluate natural language responses that cannot be easily scored with manual rules.

Prompt

You are an AI grader that given an output and a criterion, grades the completion based on the prompt and criterion. Below is a prompt, a completion, and a criterion with which to
grade the completion. You need to respond according to the criterion instructions.
## Output
The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
Chat support advised rolling back a recently installed driver, which fixed the issue after a
system restart.
## Criterion
Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
and business appropriate tone and 0 being an informal tone. Respond only with the score.

Output

0.8

API Request

PYTHON
1import cohere
2
3co = cohere.Client(api_key="Your API key")
4response = co.chat(
5 message="""
6You are an AI grader that given an output and a criterion, grades the completion based on
7the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade
8the completion. You need to respond according to the criterion instructions.
9
10## Output
11The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue.
12Chat support advised rolling back a recently installed driver, which fixed the issue after a
13system restart.
14
15## Criterion
16Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal
17and business appropriate tone and 0 being an informal tone. Respond only with the score.
18""",
19)
20print(response)
Built with