Advanced Prompt Engineering Techniques
The previous chapter discussed general rules and heuristics to follow for successfully prompting the Command family of models. Here, we will discuss specific advanced prompt engineering techniques that can in many cases vastly improve the quality of the model’s completions. These include how to give clear and unambiguous instructions, few-shot prompting, chain-of-thought (CoT) techniques, and prompt chaining.
As we develop these techniques, we will work through an example where our aim is to improve a prompt from the LegalBench “hearsay” task. The task asks an LLM to determine whether a particular piece of evidence qualifies as hearsay. Hearsay is an out-of-court statement introduced to prove the truth of the matter asserted. For example, the following two samples provide examples of statements that are, and are not, hearsay.
Before we apply any specific prompting techniques, we can see that simply prompting the model with the direct question results in too much unwanted and ambiguous information:
Using the Chat API, we could do the following:
The answer returned with this method is unfortunately wrong. The correct answer is “Yes” (non-verbal hearsay). Without a definition of the task or other additional context the model can sometimes make an incorrect assertion and then attempt to reconcile what has already been generated.
Defining the Task
Rather than simply asking a question directly, one should clearly define the task while providing concise and unambiguous instructions. The model can generally construct a much more grounded response by including relevant background knowledge, domain-specific terminology, and related examples. Optimizing the length of the prompt itself to only provide sufficient information without overwhelming the model’s context window can also improve performance.
The obvious thing missing in the prompt above is concise and unambiguous instructions. There is also no background knowledge provided or domain-specific terminology (the model seems to know what hearsay is, but it could help by quickly explaining it). A good zero-shot prompt for the same question could then be:
This is correct. It could be that defining the task has helped enough to arrive at the correct answer, but it is also possible that we just got lucky. Some further explanation could be helpful and if we were applying this prompt template to a whole set of questions (for example the entire task from LegalBench), having some more robustness would surely be helpful.
Few-shot Prompting
Unlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.
In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. Ordering the examples can also be important; if there are patterns that could be picked up on that are not relevant to the correctness of the question, the model may incorrectly pick up on those instead of the semantics of the question itself.
To improve the above question, we can include several positive and negative examples in random order from the LegalBench training set as follows:
The model continues to answer correctly, and now it also backs up the answer with a clear explanation.
Chain of Thought Prompting
Finally, chain of thought (sometimes abbreviated CoT) prompting encourages the LLM to provide a step-by-step explanation of its reasoning that can improve transparency, allow for better error analysis, and help guide the model to the correct answer. Problems can arise when the model gives an answer right away and then ends up being “stuck” with it and has to find a way to reconcile the already given answer.
With CoT prompting, one can also request intermediate outputs at each step, which can help identify and correct errors early in the process. This forced “thinking before you answer” helps emulate human thought processes and incorporate common-sense knowledge into the task.
There are several different ways to incorporate CoT prompting into a task. With “zero-shot CoT,” one can simply ask the model to “think step by step”:
This answer is quite satisfying not only because we get the correct answer, but we also see how it was arrived at by applying the rules of the situation. In many situations this approach can bring a wrong answer to a correct one and it also adds some level of trustworthiness to the answer when we can follow along with the reasoning. While we have the answer now, it is not easily extractable (we would prefer either “yes” or “no” separate from the reasoning). One approach is to incorporate CoT in the few-shot setup and simultaneously demonstrate the desired output format.
Good. The answer now begins simply with “Yes,” so theoretically it should be easy to extract. Another approach to simplify answer extraction is to ask the model to format the response in a structured way such as JSON. For example:
Much better! Now that the outputs are structured, we can easily parse the completion and directly extract the answer.
Prompt Chaining
Finally, prompt chaining can explicitly force a model to slow down and break a task into constituent parts. As explained in the previous chapter, task splitting can be an effective technique to improve the quality of completions. However, an LLM will sometimes try to jump to the answer immediately. Further, one can include more complex instructions without as high of a chance of them being lost in the information overload.
For example, instead of asking the model to “work through the problem step by step” before answering (which in certain cases LLMs can forget to do), we can first ask for an analysis of the situation, then ask for a simple “yes” or “no” answer.
The issue was analyzed correctly in the above completion, but we are seeking a clear “Yes” or “No” answer that a downstream task can easily ingest. Therefore, we chain the completion of the first prompt with a second prompt:
Chaining prompts together allows us to use the first prompt to focus on the analysis, and the second to properly extract the information in a single-word response.