How to Evaluate your LLM Response
You can leverage Command R to evaluate natural language responses that cannot be easily scored with manual rules.
Prompt
You are an AI grader that given an output and a criterion, grades the completion based on the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade the completion. You need to respond according to the criterion instructions. ## Output The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue. Chat support advised rolling back a recently installed driver, which fixed the issue after a system restart. ## Criterion Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal and business appropriate tone and 0 being an informal tone. Respond only with the score.
Output
0.8
API Request
PYTHON
1 import cohere 2 3 co = cohere.ClientV2(api_key="<YOUR API KEY>") 4 5 response = co.chat( 6 model="command-r-plus-08-2024", 7 messages=[ 8 { 9 "role": "user", 10 "content": """ 11 You are an AI grader that given an output and a criterion, grades the completion based on 12 the prompt and criterion. Below is a prompt, a completion, and a criterion with which to grade 13 the completion. You need to respond according to the criterion instructions. 14 15 ## Output 16 The customer's UltraBook X15 displayed a black screen, likely due to a graphics driver issue. 17 Chat support advised rolling back a recently installed driver, which fixed the issue after a 18 system restart. 19 20 ## Criterion 21 Rate the ouput text with a score between 0 and 1. 1 being the text was written in a formal 22 and business appropriate tone and 0 being an informal tone. Respond only with the score. 23 """, 24 } 25 ], 26 ) 27 28 print(response.message.content[0].text)