Cohere Text Generation Tutorial

Open in Colab

Command is Cohere’s flagship LLM, able to generate a response based on a user message or prompt. It is trained to follow user commands and to be instantly useful in practical business applications, like summarization, copywriting, extraction, and question-answering.

Command R and Command R+ are the most recent models in the Command family. They strike the kind of balance between efficiency and high levels of accuracy that enable enterprises to move from proof of concept to production-grade AI applications.

This tutorial leans of the Chat endpoint to build an onboarding assistant for new hires at Co1t, a fictional company, and covers:

  • Basic text generation
  • Prompt engineering
  • Parameters for controlling output
  • Structured output generation
  • Streaming output

Setup

To get started, first we need to install the cohere library and create a Cohere client.

PYTHON
1# pip install cohere
2
3import cohere
4
5co = cohere.Client("COHERE_API_KEY") # Get your API key: https://dashboard.cohere.com/api-keys

Basic text generation

To get started we just need to pass a single message parameter that represents (you guessed it) the user message, after which we use the client we just created to call the Chat endpoint.

PYTHON
1# Add the user message
2message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."
3
4# Generate the response
5response = co.chat(message=message)
6
7print(response.text)

The response we get back contains several objects, but for the sake of simplicity we’ll focus for the moment on the text object:

Sure! Here is a short introduction message:
"Hi everyone! My name is [Your Name] and I am excited to join the Co1t team today. I am passionate about [relevant experience or skills] and look forward to contributing my skills and ideas to the team. In my free time, I enjoy [hobbies or interests]. Feel free to reach out to me directly if you want to chat or collaborate. Let's work together to make Co1t a success!"

Here are some additional resources if you’d like to read further:

Prompt engineering

Prompting is at the heart of working with LLMs as it provides context for the text that we want the model to generate. Prompts can be anything from simple instructions to more complex pieces of text, and they are used to steer the model to producing a specific type of output.

This section examines a couple of prompting techniques, the first of which is adding more specific instructions to the prompt (the more instructions you provide in the prompt, the closer you can get to the response you need.)

The limit of how long a prompt can be is dependent on the maximum context length that a model can support (in the case Command R and Command R+, it’s 128k tokens).

Below, we’ll add one additional instruction to the earlier prompt, the length we need the response to be.

PYTHON
1# Add the user message
2message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3
4# Generate the response
5response = co.chat(message=message)
6
7print(response.text)
Here's a potential introduction message:
"Hi everyone, my name is [Your Name] and I'm thrilled to join Co1t today as part of the team, and I look forward to contributing my skills and ideas to drive innovation and success!"
This message expresses your excitement about joining the company and highlights your commitment to contributing to the team's success.

All our prompts so far use what is called zero-shot prompting, which means that provide instruction without any example. But in many cases, it is extremely helpful to provide examples to the model to guide its response. This is called few-shot prompting.

Few-shot prompting is especially useful when we want the model response to follow a particular style or format. Also, it is sometimes hard to explain what you want in an instruction, and easier to show examples.

Below, we want the response to be similar in style and length to the convention, as we show in the examples.

PYTHON
1# Add the user message
2user_input = "Why can't I access the server? Is it a permissions issue?"
3
4# Create a prompt containing example outputs
5message=f"""Write a ticket title for the following user request:
6
7User request: Where are the usual storage places for project files?
8Ticket title: Project File Storage Location
9
10User request: Emails won't send. What could be the issue?
11Ticket title: Email Sending Issues
12
13User request: How can I set up a connection to the office printer?
14Ticket title: Printer Connection Setup
15
16User request: {user_input}
17Ticket title:"""
18
19# Generate the response
20response = co.chat(message=message)
21
22print(response.text)
Server Access Issues

Further reading:

Parameters for controlling output

The Chat endpoint provides developers with an array of options and parameters.

For example, you can choose from several variations of the Command model. Different models produce different output profiles, such as quality and latency.

PYTHON
1# Add the user message
2message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3
4# Generate the response by specifying a model
5response = co.chat(message=message, model="command-r-08-2024")
6
7print(response.text)
Hello, my name is [Your Name] and I'm thrilled to join the Co1t team today as the new kid in town!

Often, you’ll need to control the level of randomness of the output. You can control this using a few parameters.

The most commonly used parameter is temperature, which is a number used to tune the degree of randomness. You can enter values between 0.0 to 1.0.

A lower temperature gives more predictable outputs, and a higher temperature gives more “creative” outputs.

Here’s an example of setting temperature to 0.

PYTHON
1# Add the user message
2message = "I like learning about the industrial revolution and how it shapes the modern world. How can I introduce myself in two words."
3
4# Generate the response multiple times by specifying a low temperature value
5for idx in range(3):
6 response = co.chat(message=message, temperature=0)
7 print(f"{idx+1}: {response.text}\n")
1: Curious Historian.
2: Curious Historian.
3: Curious Historian.

And here’s an example of setting temperature to 1.

PYTHON
1# Add the user message
2message = "I like learning about the industrial revolution and how it shapes the modern world. How can I introduce myself in two words."
3
4# Generate the response multiple times by specifying a high temperature value
5for idx in range(3):
6 response = co.chat(message=message, temperature=1)
7 print(f"{idx+1}: {response.text}\n")
1: Sure! Here are two words that can describe you:
1. Industry Enthusiast
2. Revolution Aficionado
These words combine your passion for learning about the Industrial Revolution with a modern twist, showcasing your enthusiasm and knowledge in a concise manner.
2: "Revolution Fan"
3: History Enthusiast!

Further reading:

Structured output generation

By adding the response_format parameter, you can get the model to generate the output as a JSON object. By generating JSON objects, you can structure and organize the model’s responses in a way that can be used in downstream applications.

The response_format parameter allows you to specify the schema the JSON object must follow. It takes the following parameters:

  • message: The user message
  • response_format: The schema of the JSON object
PYTHON
1# Add the user message
2user_input = "Why can't I access the server? Is it a permissions issue?"
3
4# Generate the response multiple times by adding the JSON schema
5response = co.chat(
6 model="command-r-plus-08-2024",
7 message=f"""Create an IT ticket for the following user request. Generate a JSON object.
8 {user_input}""",
9 response_format={
10 "type": "json_object",
11 "schema": {
12 "type": "object",
13 "required": ["title", "category", "status"],
14 "properties": {
15 "title": { "type": "string"},
16 "category": { "type" : "string", "enum" : ["access", "software"]},
17 "status": { "type" : "string" , "enum" : ["open", "closed"]}
18 }
19 }
20 },
21)
22
23import json
24json_object = json.loads(response.text)
25
26print(json_object)
{'title': 'User Unable to Access Server', 'category': 'access', 'status': 'open'}

Further reading:

Streaming responses

All the previous examples above generate responses in a non-streamed manner. This means that the endpoint would return a response object only after the model has generated the text in full.

The Chat endpoint also provides streaming support. In a streamed response, the endpoint would return a response object for each token as it is being generated. This means you can display the text incrementally without having to wait for the full completion.

To activate it, use co.chat_stream() instead of co.chat().

In streaming mode, the endpoint will generate a series of objects. To get the actual text contents, we take objects whose event_type is text-generation.

PYTHON
1# Add the user message
2message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3
4# Generate the response by streaming it
5response = co.chat_stream(
6 message=message)
7
8for event in response:
9 if event.event_type == "text-generation":
10 print(event.text, end="")
Here's a potential introduction message:
"Hi everyone, my name is [Your Name] and I'm thrilled to join Co1t today as the newest [Your Role], and I look forward to contributing my skills and expertise to the team and driving innovative solutions for our customers."

Further reading:

Conclusion

In this tutorial, you learned about:

  • How to get started with a basic text generation
  • How to improve outputs with prompt engineering
  • How to control outputs using parameter changes
  • How to generate structured outputs
  • How to stream text generation outputs

However, we have only done all this using direct text generations. As its name implies, the Chat endpoint can also support building chatbots, which require features to support multi-turn conversations and maintain the conversation state.

In Part 3, you’ll learn how to build chatbots with the Chat endpoint.