Co.Chat (Beta)
In this guide, we show how to use the Chat endpoint to create a simple Chatbot that, given an input query, responds to it considering the previous context.
Set up
Install the SDK
pip install cohere
You may get errors if you're running an older version of the Cohere SDK, but you can upgrade it with:
pip install --upgrade cohere
Import dependencies and set up the Cohere client.
import cohere
co = cohere.Client('Your API key')
Create Prompt
Store the message you want to send into a variable
message = "Hello World!"
Define the Model Settings
The endpoint has a number of settings you can use to control the kind of output it generates. The full list is available in the API reference, but let’s look at a few:
model
: The currently available models arecommand
,command-light
,command-nightly
, andcommand-light-nightly
(command
is the default). Generally, light models are faster but may produce lower-quality generated text, while the others perform better.temperature
: Controls the randomness of the output, with higher values tending to generate more creative outputs and less grounded replies when using retrieval augmented generation.
Generate the Response
Call the endpoint via the co.chat()
method, specifying the message and the model settings.
response = co.chat(
message,
model="command",
temperature=0.9
)
answer = response.text
Use the Previous Messages to Continue a Conversation
So far, we have generated a single reply to a message without using any previous messages. Now let's add context to the conversation by specifying the chat_history
, which will provide our model with context that it can use in its replies.
chat_history = [
{"user_name": "User", "text": "Hey!"},
{"user_name": "Chatbot", "text": "Hey! How can I help you today?"},
]
message = "Can you tell me about LLMs?"
response = co.chat(
message=message,
chat_history=chat_history
)
answer = response.text
Keeping Track of Responses
Instead of hardcoding the chat_history
, we can build it dynamically as we have a conversation.
chat_history = []
max_turns = 10
for _ in range(max_turns):
# get user input
message = input("Send the model a message: ")
# generate a response with the current chat history
response = co.chat(
message,
temperature=0.8,
chat_history=chat_history
)
answer = response.text
print(answer)
# add message and answer to the chat history
user_message = {"user_name": "User", "text": message}
bot_message = {"user_name": "Chatbot", "text": answer}
chat_history.append(user_message)
chat_history.append(bot_message)
Using Document Mode
With the release of retrieval augmented generation (RAG), it's possible to feed the model context to ground its replies. Large language models are often quite good at generating sensible output on their own, but they're well-known to hallucinate factually incorrect, nonsensical, or incomplete information in their replies, which can be problematic for certain use cases.
RAG substantially reduces this problem by giving the model source material to work with. Rather than simply generating an output based on the input prompt, the model can pull information out of this material and incorporate it into its reply.
There are a few ways to do this (which you can read more about in the RAG-specific documentation linked above), but here, we'll confine our discussion to documents
mode.
Document mode involves users providing the model with their own documents directly in the message, which it can use to ground its replies.
Here's an example of interacting with document mode via the Postman API service. We're asking the co.chat()
about penguins, and uploading documents for it to use:
{
"message": "Where do the tallest penguins live?",
"documents": [
{
"title": "Tall penguins",
"snippet": "Emperor penguins are the tallest."
},
{
"title": "Penguin habitats",
"snippet": "Emperor penguins only live in Antarctica."
},
{
"title": "What are animals?",
"snippet": "Animals are different from plants."
}
],
"prompt_truncation": "AUTO"
}
Here's an example reply:
{
"response_id": "ea9eaeb0-073c-42f4-9251-9ecef5b189ef",
"text": "The tallest penguins, Emperor penguins, live in Antarctica.",
"generation_id": "1b5565da-733e-4c14-9ff5-88d18a26da96",
"token_count": {
"prompt_tokens": 445,
"response_tokens": 13,
"total_tokens": 458,
"billed_tokens": 20
},
"meta": {
"api_version": {
"version": "2022-12-06"
}
},
"citations": [
{
"start": 22,
"end": 38,
"text": "Emperor penguins",
"document_ids": [
"doc_0"
]
},
{
"start": 48,
"end": 59,
"text": "Antarctica.",
"document_ids": [
"doc_1"
]
}
],
"documents": [
{
"id": "doc_0",
"title": "Tall penguins",
"snippet": "Emperor penguins are the tallest.",
"url": ""
},
{
"id": "doc_1",
"title": "Penguin habitats",
"snippet": "Emperor penguins only live in Antarctica.",
"url": ""
}
],
"search_queries": []
}
Observe that the payload includes a list of documents with a “snippet” field containing the information we want the model to use. The recommended length for the snippet of each document is relatively short, 300 words or less. We recommend using field names similar to the ones we’ve included in this example (i.e. “title” and “snippet” ), but RAG is quite flexible with respect to how you structure the documents. You can give the fields any names you want, and can pass in other fields as well, such as a “date” field. All field names and field values are passed to the model.
Also, we can clearly see that it has utilized the document. Our first document says that Emperor penguins are the tallest penguin species, and our second says that Emperor penguins can only be found in Antarctica. The model’s reply successfully synthesizes both of these facts: "The tallest penguins, Emperor penguins, live in Antarctica."
Finally, note that the output contains a citations object that tells us not only which documents the model relied upon (with the "text"
and “document_ids"
fields), but also the particular part of the claim supported by a particular document (with the “start”
and “end”
fields, which are spans that tell us the location of the supported claim inside the reply). This citation object is included because the model was able to use the documents provided, but if it hadn’t been able to do so, no citation object would be present.
You can experiment with RAG in the chat playground.
Connector Mode
Finally, if you want to point the model at the sources it should use rather than specifying your own, you can do that through connector mode.

Here’s an example:
{
"message": "What are the tallest living penguins?",
"connectors": [{"id": "web-search"}],
"prompt_truncation":"AUTO"
}
And here’s what the output looks like:
{
"response_id": "a29d7080-11e5-43f6-bbb6-9bc3c187eed7",
"text": "The tallest living penguin species is the emperor penguin, which can reach a height of 100 cm (39 in) and weigh between 22 and 45 kg (49 to 99 lb).",
"generation_id": "1c60cb38-f92f-4054-b37d-566601de7e2e",
"token_count": {
"prompt_tokens": 1257,
"response_tokens": 38,
"total_tokens": 1295,
"billed_tokens": 44
},
"meta": {
"api_version": {
"version": "2022-12-06"
}
},
"citations": [
{
"start": 42,
"end": 57,
"text": "emperor penguin",
"document_ids": [
"web-search_1",
"web-search_8"
]
},
{
"start": 87,
"end": 101,
"text": "100 cm (39 in)",
"document_ids": [
"web-search_1"
]
},
{
"start": 120,
"end": 146,
"text": "22 and 45 kg (49 to 99 lb)",
"document_ids": [
"web-search_1",
"web-search_8"
]
}
],
"documents": [
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_8",
"title": "The largest penguin that ever lived",
"snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",
"url": "https://www.cam.ac.uk/stories/giant-penguin"
},
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_1",
"title": "Emperor penguin - Wikipedia",
"snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",
"url": "https://en.wikipedia.org/wiki/Emperor_penguin"
},
{
"id": "web-search_8",
"title": "The largest penguin that ever lived",
"snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",
"url": "https://www.cam.ac.uk/stories/giant-penguin"
}
],
"search_results": [
{
"search_query": {
"text": "tallest living penguins",
"generation_id": "12eda337-f096-404f-9ba9-905076304934"
},
"document_ids": [
"web-search_0",
"web-search_1",
"web-search_2",
"web-search_3",
"web-search_4",
"web-search_5",
"web-search_6",
"web-search_7",
"web-search_8",
"web-search_9"
],
"connector": {
"id": "web-search"
}
}
],
"search_queries": [
{
"text": "tallest living penguins",
"generation_id": "12eda337-f096-404f-9ba9-905076304934"
}
]
}
(NOTE: In this example, we’ve modified the query slightly to say “living” penguins, because “What are the tallest penguins?” returns a great deal of information about a long-extinct penguin species that was nearly seven feet tall.)
In connector mode, we tell the model to use the “web-search”
connector to find out which breed of penguin is tallest, rather than pass in source material through the documents
parameter. If you’re wondering how this works under the hood, we have more information in the next section.
You can experiment with this feature in the chat playground. Here is a screenshot of what that looks like:

As with document mode, when the chat endpoint generates a response using a connector it will include a citations
object in its output. If the model is unable to find anything suitable, however, no such citation
object will appear in the output.
A Note on Connectors
We’re working on a top-to-bottom breakdown of our connector system, but it’s worth briefly making a few comments in this document.
Connectors allow Coral users to initiate a search of a third-party application containing textual data, such as the internet, a document database, etc. The application will send relevant information back to Coral, and Coral will use it to generate a grounded response. Right now, the only connector supported by Cohere is the “web-search” connector, and it runs searches against a browser in safe mode.
We’re working on adding additional connectors, and we’re also working on enabling users to either register their own data sources or spin up their own connectors. For the details of that process you’ll need to refer to the larger connectors document when it’s made available.
Next Steps
Check the co.Chat
API reference and start building your own products! You can also read the retrieval augmented generation (RAG) documentation for more context.
Updated 13 days ago