Building a Chatbot with Cohere
As its name implies, the Chat endpoint enables developers to build chatbots that can handle conversations. At the core of a conversation is a multi-turn dialog between the user and the chatbot. This requires the chatbot to have the state (or “memory”) of all the previous turns to maintain the state of the conversation.
In this tutorial, you’ll learn about:
- Sending messages to the model
- Crafting a system message
- Maintaining conversation state
You’ll learn these by building an onboarding assistant for new hires.
Setup
To get started, first we need to install the cohere
library and create a Cohere client.
Sending messages to the model
We will use the Cohere Chat API to send messages and genereate responses from the model. The required inputs to the Chat endpoint are the model
(the model name) and messages
(a list of messages in chronological order). In the example below, we send a single message to the model command-a-03-2025
:
Notice that in addition to the message “content”, there is also a field titled “role”. Messages with the role “user” represent prompts from the user interacting with the chatbot. Responses from model will always have a message with the role “assistant”. Below is the response message from the API:
Crafting a system message
When building a chatbot, it may be useful to constrain its behavior. For example, you may want to prevent the assistant from responding to certain prompts, or force it to respond in a desired tone. To achieve this, you can include a message with the role “system” in the messages
array. Instructions in system messages always take precedence over instructions in user messages, so as a developer you have control over the chatbot behavior.
For example, if we want the chatbot to adopt a formal style, the system instruction can be used to encourage the generation of more business-like and professional responses. We can also instruct the chatbot to refuse requests that are unrelated to onboarding. When writing a system message, the recommended approach is to use two H2 Markdown headers: “Task and Context” and “Style Guide” in the exact order.
In the example below, the system instruction provides context for the assistant’s task (task and context) and encourages the generation of rhymes as much as possible (style guide).
Maintaining conversation state
Conversations with your chatbot will often span more than one turn. In order to not lose context of previous turns, the entire chat history will need to be passed in the messages
array when making calls with the Chat API.
In the example below, we keep adding “assistant” and “user” messages to the messages
array to build up the chat history over multiple turns:
You will use the same method for running a multi-turn conversation when you learn about other use cases such as RAG (Part 6) and tool use (Part 7).
But to fully leverage these other capabilities, you will need another type of language model that generates text representations, or embeddings.
In Part 4, you will learn how text embeddings can power an important use case for RAG, which is semantic search.