Cohere Text Generation Tutorial

Open in Colab

Command is Cohere’s flagship LLM, able to generate a response based on a user message or prompt. It is trained to follow user commands and to be instantly useful in practical business applications, like summarization, copywriting, extraction, and question-answering.

Command A and Command R7B are the most recent models in the Command family. They strike the kind of balance between efficiency and high levels of accuracy that enable enterprises to move from proof of concept to production-grade AI applications.

This tutorial leans of the Chat endpoint to build an onboarding assistant for new hires at Co1t, a fictional company, and covers:

Basic text generation
Prompt engineering
Parameters for controlling output
Structured output generation
Streaming output

Setup

To get started, first we need to install the cohere library and create a Cohere client.

PYTHON

1 # pip install cohere
2 
3 import cohere
4 
5 # Get your API key: https://dashboard.cohere.com/api-keys
6 co = cohere.Client("COHERE_API_KEY")

Basic text generation

To get started we just need to pass a single message parameter that represents (you guessed it) the user message, after which we use the client we just created to call the Chat endpoint.

PYTHON

1 # Add the user message
2 message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."
3 
4 # Generate the response
5 response = co.chat(message=message)
6 
7 print(response.text)

The response we get back contains several objects, but for the sake of simplicity we’ll focus for the moment on the text object:

Sure! Here is a short introduction message:
"Hi everyone! My name is [Your Name] and I am excited to join the Co1t team today. I am passionate about [relevant experience or skills] and look forward to contributing my skills and ideas to the team. In my free time, I enjoy [hobbies or interests]. Feel free to reach out to me directly if you want to chat or collaborate. Let's work together to make Co1t a success!"

Here are some additional resources if you’d like to read further:

Prompt engineering

Prompting is at the heart of working with LLMs as it provides context for the text that we want the model to generate. Prompts can be anything from simple instructions to more complex pieces of text, and they are used to steer the model to producing a specific type of output.

This section examines a couple of prompting techniques, the first of which is adding more specific instructions to the prompt (the more instructions you provide in the prompt, the closer you can get to the response you need.)

The limit of how long a prompt can be is dependent on the maximum context length that a model can support (in the case Command R and Command R+, it’s 128k tokens).

Below, we’ll add one additional instruction to the earlier prompt, the length we need the response to be.

PYTHON

1 # Add the user message
2 message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3 
4 # Generate the response
5 response = co.chat(message=message)
6 
7 print(response.text)

Here's a potential introduction message:
"Hi everyone, my name is [Your Name] and I'm thrilled to join Co1t today as part of the team, and I look forward to contributing my skills and ideas to drive innovation and success!"
This message expresses your excitement about joining the company and highlights your commitment to contributing to the team's success.

All our prompts so far use what is called zero-shot prompting, which means that provide instruction without any example. But in many cases, it is extremely helpful to provide examples to the model to guide its response. This is called few-shot prompting.

Few-shot prompting is especially useful when we want the model response to follow a particular style or format. Also, it is sometimes hard to explain what you want in an instruction, and easier to show examples.

Below, we want the response to be similar in style and length to the convention, as we show in the examples.

PYTHON

1 # Add the user message
2 user_input = (
3     "Why can't I access the server? Is it a permissions issue?"
4 )
5 
6 # Create a prompt containing example outputs
7 message = f"""Write a ticket title for the following user request:
8 
9 User request: Where are the usual storage places for project files?
10 Ticket title: Project File Storage Location
11 
12 User request: Emails won't send. What could be the issue?
13 Ticket title: Email Sending Issues
14 
15 User request: How can I set up a connection to the office printer?
16 Ticket title: Printer Connection Setup
17 
18 User request: {user_input}
19 Ticket title:"""
20 
21 # Generate the response
22 response = co.chat(message=message)
23 
24 print(response.text)

Server Access Issues

Further reading:

Parameters for controlling output

The Chat endpoint provides developers with an array of options and parameters.

For example, you can choose from several variations of the Command model. Different models produce different output profiles, such as quality and latency.

PYTHON

1 # Add the user message
2 message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3 
4 # Generate the response by specifying a model
5 response = co.chat(message=message, model="command-a-03-2025")
6 
7 print(response.text)

Hello, my name is [Your Name] and I'm thrilled to join the Co1t team today as the new kid in town!

Often, you’ll need to control the level of randomness of the output. You can control this using a few parameters.

The most commonly used parameter is temperature, which is a number used to tune the degree of randomness. You can enter values between 0.0 to 1.0.

A lower temperature gives more predictable outputs, and a higher temperature gives more “creative” outputs.

Here’s an example of setting temperature to 0.

PYTHON

1 # Add the user message
2 message = "I like learning about the industrial revolution and how it shapes the modern world. How can I introduce myself in two words."
3 
4 # Generate the response multiple times by specifying a low temperature value
5 for idx in range(3):
6     response = co.chat(message=message, temperature=0)
7     print(f"{idx+1}: {response.text}\n")

1: Curious Historian.
2: Curious Historian.
3: Curious Historian.

And here’s an example of setting temperature to 1.

PYTHON

1 # Add the user message
2 message = "I like learning about the industrial revolution and how it shapes the modern world. How can I introduce myself in two words."
3 
4 # Generate the response multiple times by specifying a high temperature value
5 for idx in range(3):
6     response = co.chat(message=message, temperature=1)
7     print(f"{idx+1}: {response.text}\n")

1: Sure! Here are two words that can describe you: 
1. Industry Enthusiast 
2. Revolution Aficionado 
These words combine your passion for learning about the Industrial Revolution with a modern twist, showcasing your enthusiasm and knowledge in a concise manner.
2: "Revolution Fan"
3: History Enthusiast!

Further reading:

Structured output generation

By adding the response_format parameter, you can get the model to generate the output as a JSON object. By generating JSON objects, you can structure and organize the model’s responses in a way that can be used in downstream applications.

The response_format parameter allows you to specify the schema the JSON object must follow. It takes the following parameters:

message: The user message
response_format: The schema of the JSON object

PYTHON

1 # Add the user message
2 user_input = (
3     "Why can't I access the server? Is it a permissions issue?"
4 )
5 
6 # Generate the response multiple times by adding the JSON schema
7 response = co.chat(
8     model="command-a-03-2025",
9     message=f"""Create an IT ticket for the following user request. Generate a JSON object.
10   {user_input}""",
11     response_format={
12         "type": "json_object",
13         "schema": {
14             "type": "object",
15             "required": ["title", "category", "status"],
16             "properties": {
17                 "title": {"type": "string"},
18                 "category": {
19                     "type": "string",
20                     "enum": ["access", "software"],
21                 },
22                 "status": {
23                     "type": "string",
24                     "enum": ["open", "closed"],
25                 },
26             },
27         },
28     },
29 )
30 
31 import json
32 
33 json_object = json.loads(response.text)
34 
35 print(json_object)

{'title': 'User Unable to Access Server', 'category': 'access', 'status': 'open'}

Further reading:

Documentation on Structured Outputs

Streaming responses

All the previous examples above generate responses in a non-streamed manner. This means that the endpoint would return a response object only after the model has generated the text in full.

The Chat endpoint also provides streaming support. In a streamed response, the endpoint would return a response object for each token as it is being generated. This means you can display the text incrementally without having to wait for the full completion.

To activate it, use co.chat_stream() instead of co.chat().

In streaming mode, the endpoint will generate a series of objects. To get the actual text contents, we take objects whose event_type is text-generation.

PYTHON

1 # Add the user message
2 message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."
3 
4 # Generate the response by streaming it
5 response = co.chat_stream(message=message)
6 
7 for event in response:
8     if event.event_type == "text-generation":
9         print(event.text, end="")

Here's a potential introduction message:
"Hi everyone, my name is [Your Name] and I'm thrilled to join Co1t today as the newest [Your Role], and I look forward to contributing my skills and expertise to the team and driving innovative solutions for our customers."

Further reading:

Documentation on streaming responses

Conclusion

In this tutorial, you learned about:

How to get started with a basic text generation
How to improve outputs with prompt engineering
How to control outputs using parameter changes
How to generate structured outputs
How to stream text generation outputs

However, we have only done all this using direct text generations. As its name implies, the Chat endpoint can also support building chatbots, which require features to support multi-turn conversations and maintain the conversation state.

In Part 3, you’ll learn how to build chatbots with the Chat endpoint.