Parameters for Controlling Outputs

In this chapter, you’ll learn about the parameters that you can leverage to ​​control the Chat endpoint's outputs.

We’ll use Cohere’s Python SDK for the code examples. Follow along in this notebook.

The Chat endpoint is a versatile tool that empowers developers with an extensive array of options and parameters.

As you’ll learn, the Command model has many variations to select from, where each has been carefully crafted to suit different needs. Additionally, you will see how to use parameters to control the creativity and conciseness of model responses.

Setup

To set up, we first import the Cohere module and create a client.

import cohere
co = cohere.Client("COHERE_API_KEY") # Your Cohere API key

Model Type

With the Chat endpoint , you can choose from several variations of the Command model . Different models may produce different output profiles, so you may want to experiment with different models to get the best output for your use case.

See the documentation for the most updated list of available Cohere models. At the time of writing, the models are as follows:

  • command: The default model used in a Chat endpoint call. If you don’t define the model parameter, this model will be used.
  • command-light: A smaller, faster version of command. Almost as capable, but a lot faster.
  • command-r: Performs language tasks at a higher quality, more reliably, and with a longer context than previous models.
  • command-nightly and command-light-nightly: The latest, most experimental, and (possibly) unstable versions of their default counterparts. Not recommended for production use.

Use the model parameter to select a variation that suits your requirements. In the code cell, we select command-r.

response = co.chat(message="Hello",
                   model="command-r")
print(response.text)
# RESPONSE 
Hi! Hello there! How's it going? I hope you're having a fantastic day so far. Is there anything I can help you with?

Randomness

Often, you’ll need to control the level of randomness of the model. There are a number of factors to keep in mind when tuning model randomness, including:

  • Task type: You’ll likely want to decrease the randomness of the model when performing structured tasks that have a correct answer, like for question answering or summarization, or for generating technical text. In these cases, we want the model to yield a safe and predictable response. On the other hand, if you’re generating poetry or brainstorming ideas, you might want to increase the randomness to produce more diverse and creative responses.
  • Model behavior: We likely need to increase the randomness of the model if it gets stuck in a loop and starts repeating itself, or if it is producing overly generic phrases. Increasing the randomness will expand the set of words that the model can use when generating responses.
  • Controlling style and tone: If generating text that needs to have a specific tone or style, like for a company blog post or customer support responses, a low level of randomness might be desirable, to keep the model from generating unusual words.

Modifying the temperature parameter changes the extent to which the model considers incorporating unlikely tokens (can be words, parts of words, or punctuation) in its response, which can make the output more random and creative.

To understand this, we’ll look at an example. The model would likely predict that the token cookies has a much higher likelihood than chair for appearing after the phrase I like to bake.

The model assigns a likelihood number to each of all possible next tokens

The model assigns a likelihood number to each of all possible next tokens

Before these likelihoods can be used to select the next token, they first need to be converted to probabilities. The temperature parameter controls how this conversion is done.

  • At low temperature, low likelihood tokens are assigned very low probabilities, and high likelihood tokens are assigned very high probabilities.
  • At high temperature, the probabilities will look roughly similar for each token, with high likelihood tokens assigned only slightly higher probability.
Adjusting the temperature setting

Adjusting the temperature setting

Building off the example above,

  • At low temperature, there’s a probability that chair is selected, but the probability is significantly lower than cookies.
  • At high temperature, the probability that chair is selected is only slightly lower than cookies.

The temperature parameter is a value between 0 and 1. As you increase the temperature, the model gets more creative and random. Temperature can be tuned for different problems, and most people will find that the default temperature of 0.3 is a good starting point.

Let’s look at a code example, where we suggest that the model generate alternative names for a blog post. Prompting the endpoint five times when the temperature is set to 0 yields the same output each time.

message = """Suggest a more exciting title for a blog post titled: Intro to Retrieval-Augmented Generation. \
Respond in a single line."""

for _ in range(5):
    response = co.chat(message=message,
                       temperature=0)
    print(response.text)
# RESPONSE

"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"

However, if we increase the temperature to the maximum value of 1, the model gives different proposals.

message = """Suggest a more exciting title for a blog post titled: Intro to Retrieval-Augmented Generation. \
Respond in a single line."""

for _ in range(5):
    response = co.chat(message=message,
                       temperature=1)
    print(response.text)
# RESPONSE

"Unleashing the Power of Generation: A Guide to the Future of Retrieval-Augmented Creation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"
"Unleashing the Power of RAG: A Guide to the Future of Generation"
"Unleashing the Power of Augmented Generation: A Guide to the Future of AI Text Generation"
"Unleashing the Power of Generation: A Guide to the Exciting World of Retrieval-Augmented Creation"

Conciseness

We have seen how to control the model randomness, but you might also want to control the conciseness of model outputs. For example, let’s consider what happens when we ask the model a simple question: “How many eggs are in one dozen?”.

response = co.chat(message="How many eggs are in one dozen?")
print(response.text)
# RESPONSE

There are 12 eggs in one dozen. The term "dozen" is used to represent the number 12, and it's commonly used when referring to measurements or quantities, especially for eggs. So, when you buy or hear about a dozen eggs, it means you're dealing with 12 eggs.

The model answers the question, but it uses multiple sentences when the first sentence would have been sufficient.

We can get the model to shorten its response by setting the preamble parameter to an empty string.

response = co.chat(message="How many eggs are in one dozen?", preamble="")
print(response.text)
# RESPONSE

There are 12 eggs in one dozen.

Conclusion

In this chapter, you learned how to call different variations of the Command model when using the Chat endpoint. You worked with a code example using the temperature parameter to control the Command model’s level of randomness. You also saw how to use the preamble parameter to reduce the chattiness of the model and make it more concise when generating responses.


What’s Next

Continue to the next chapter to learn the basics of prompt engineering and how to craft creative and effective prompts.