📘

This Guide Uses the Chat Endpoint.

You can find the API reference for the endpoint here.

In this guide, we show how to use the Chat endpoint to create a simple Chatbot that, given an input query, responds to it considering the previous context.

Getting Set Up

First, let's install the SDK (the examples below are in Python, Typescript, and Go):

pip install cohere
npm i -s cohere-ai
go get github.com/cohere-ai/cohere-go/v2

Import dependencies and set up the Cohere client.

import cohere
co = cohere.Client('Your API key')
import { CohereClient } from "cohere-ai";

const cohere = new CohereClient({
    token: "YOUR_API_KEY",
});

(async () => {
    const prediction = await cohere.generate({
        prompt: "hello",
        maxTokens: 10,
    });
    
    console.log("Received prediction", prediction);
})();
import cohereclient "github.com/cohere-ai/cohere-go/v2/client"

client := cohereclient.NewClient(cohereclient.WithToken("<YOUR_AUTH_TOKEN>"))

(All the rest of the examples on this page will be in Python, but you can find more detailed instructions for getting set up by checking out the Github repositories for Python, Typescript, and Go.)

Create Prompt

Store the message you want to send into a variable

message = "Hello World!"

Define the Model Settings

The endpoint has a number of settings you can use to control the kind of output it generates. The full list is available in the API reference, but let’s look at a few:

  • model: The currently available models are command, command-light, command-nightly, and command-light-nightly (command is the default). Generally, light models are faster but may produce lower-quality generated text, while the others perform better.
  • temperature: Controls the randomness of the output, with higher values tending to generate more creative outputs and less grounded replies when using retrieval augmented generation.

Generate the Response

Call the endpoint via the co.chat() method, specifying the message and the model settings.

response = co.chat(
	message=message, 
	model="command", 
	temperature=0.9
)

answer = response.text

Various Ways of Using the Chat Endpoint

Now that we've covered the basics of getting set up, lets discuss some of the different ways you can use co.chat().

Interacting with Chat Directly

In the "Generate the Response" section directly above, we included this code snippet:

response = co.chat(
	message=message, 
	model="command", 
	temperature=0.9
)

answer = response.text

Here, we are simply pinging the underlying chat model and getting back whatever it generates. This is the simplest way of leveraging co.chat(), and it's distinct from storing messages as part of an ongoing conversation (covered in the next section) and from "grounding" model outputs in user-provided information (covered near the end).

The advantages of this approach are that the model will attempt to do what you ask it to do, without being constrained by any external data sources (more on this below). For this same reason, the model can also produce more creative replies when you're completing brainstorming or writing tasks.

The disadvantage is that the model's output could also contain factually incorrect information and, without the kinds of citations produced by the model in Document mode, it can be very hard to double check.

Multi-Message Conversations

So far, we have generated a single reply to a message without using any previous messages.

If you want to utilize the chat_history or conversation_id id parameters to utilize multi-turn functionality, check out our dedicated documentation on multi-message conversations.

Documents Mode

With the release of retrieval augmented generation (RAG), it's possible to feed the model context to ground its replies. Large language models are often quite good at generating sensible output on their own, but they're well-known to hallucinate factually incorrect, nonsensical, or incomplete information in their replies, which can be problematic for certain use cases.

RAG substantially reduces this problem by giving the model source material to work with. Rather than simply generating an output based on the input prompt, the model can pull information out of this material and incorporate it into its reply.

You can read more about how this works in "Documents and Citations."

Connectors mode

Finally, if you want to point the model at the sources it should use rather than specifying your own, you can do that through connector mode.

A diagram of RAG's connector mode.

Here’s an example:

{  
  "message": "What are the tallest living penguins?",  
  "connectors": [{"id": "web-search"}],  
  "prompt_truncation":"AUTO"  
}

And here’s what the output looks like:

{  
    "response_id": "a29d7080-11e5-43f6-bbb6-9bc3c187eed7",  
    "text": "The tallest living penguin species is the emperor penguin, which can reach a height of 100 cm (39 in) and weigh between 22 and 45 kg (49 to 99 lb).",  
    "generation_id": "1c60cb38-f92f-4054-b37d-566601de7e2e",  
    "token_count": {  
        "prompt_tokens": 1257,  
        "response_tokens": 38,  
        "total_tokens": 1295,  
        "billed_tokens": 44  
    },  
    "meta": {  
        "api_version": {  
            "version": "2022-12-06"  
        }  
    },  
    "citations": [  
        {  
            "start": 42,  
            "end": 57,  
            "text": "emperor penguin",  
            "document_ids": [  
                "web-search_1",  
                "web-search_8"  
            ]  
        },  
        {  
            "start": 87,  
            "end": 101,  
            "text": "100 cm (39 in)",  
            "document_ids": [  
                "web-search_1"  
            ]  
        },  
        {  
            "start": 120,  
            "end": 146,  
            "text": "22 and 45 kg (49 to 99 lb)",  
            "document_ids": [  
                "web-search_1",  
                "web-search_8"  
            ]  
        }  
    ],  
    "documents": [  
        {  
            "id": "web-search_1",  
            "title": "Emperor penguin - Wikipedia",  
            "snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",  
            "url": "https://en.wikipedia.org/wiki/Emperor_penguin"  
        },  
        {  
            "id": "web-search_8",  
            "title": "The largest penguin that ever lived",  
            "snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",  
            "url": "https://www.cam.ac.uk/stories/giant-penguin"  
        },  
        {  
            "id": "web-search_1",  
            "title": "Emperor penguin - Wikipedia",  
            "snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",  
            "url": "https://en.wikipedia.org/wiki/Emperor_penguin"  
        },  
        {  
            "id": "web-search_1",  
            "title": "Emperor penguin - Wikipedia",  
            "snippet": "The emperor penguin (Aptenodytes forsteri) is the tallest and heaviest of all living penguin species and is endemic to Antarctica. The male and female are similar in plumage and size, reaching 100 cm (39 in) in length and weighing from 22 to 45 kg (49 to 99 lb).",  
            "url": "https://en.wikipedia.org/wiki/Emperor_penguin"  
        },  
        {  
            "id": "web-search_8",  
            "title": "The largest penguin that ever lived",  
            "snippet": "They concluded that the largest flipper bones belong to a penguin that tipped the scales at an astounding 154 kg. In comparison, emperor penguins, the tallest and heaviest of all living penguins, typically weigh between 22 and 45 kg.",  
            "url": "https://www.cam.ac.uk/stories/giant-penguin"  
        }  
    ],  
    "search_results": [  
        {  
            "search_query": {  
                "text": "tallest living penguins",  
                "generation_id": "12eda337-f096-404f-9ba9-905076304934"  
            },  
            "document_ids": [  
                "web-search_0",  
                "web-search_1",  
                "web-search_2",  
                "web-search_3",  
                "web-search_4",  
                "web-search_5",  
                "web-search_6",  
                "web-search_7",  
                "web-search_8",  
                "web-search_9"  
            ],  
            "connector": {  
                "id": "web-search"  
            }  
        }  
    ],  
    "search_queries": [  
        {  
            "text": "tallest living penguins",  
            "generation_id": "12eda337-f096-404f-9ba9-905076304934"  
        }  
    ]  
}

(NOTE: In this example, we’ve modified the query slightly to say “living” penguins, because “What are the tallest penguins?” returns a great deal of information about a long-extinct penguin species that was nearly seven feet tall.)

As you can see, we've told the model to use the “web-search” connector to find out which breed of penguin is tallest, rather than pass in source material through the documents parameter. If you’re wondering how this works under the hood, we have more information in the next section.

You can experiment with this feature in the chat playground. Here is a screenshot of what that looks like:

A screenshot of the RAG UI.

As with document mode, when the chat endpoint generates a response using a connector it will include a citations object in its output. If the model is unable to find anything suitable, however, no such citation object will appear in the output.

A Note on Connectors

Connectors allow Coral users to initiate a search of a third-party application containing textual data, such as the internet, a document database, etc. The application will send relevant information back to Coral, and Coral will use it to generate a grounded response. Cohere supports the “web-search” connector which runs searches against a browser in safe mode, and you can create and deploy your own custom Connectors for services such as Google Drive, Confluence etc.

Tool use

Single-step and multi-step tool use are extensions of this idea. Both allow you to create dynamic, powerful workflows by giving underlying models access to databases, internet search, and much more. Check out the linked documents for additional information.

Streaming Mode

All the methods of interacting with Chat discussed above -- including talking to it directly, using document mode, and using connector mode -- can also be used to stream responses. This is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.

All that's required to do this is to set stream equal to True (it's set to False by default). In the next few sections we'll include code snippets and examples of what it looks like to interact with co.chat() in streaming mode.

Streaming Responses from the Chat Endpoint.

Here, we're simply asking the model about penguins and streaming the reply. Note that we're importing StreamEvent', that stream=True, and that we're using if and elif clauses to respond to different states that StreamEvent can be in:

import cohere
from cohere.responses.chat import StreamEvent

co = cohere.Client("<YOUR API KEY>")

for event in co.chat("What are the tallest living penguins?", stream=True):
    if event.event_type == StreamEvent.TEXT_GENERATION:
      print(event.text)
    elif event.event_type == StreamEvent.STREAM_END:
      print(event.finish_reason)

Here's what the response looks like:

The
 tallest
 living
 penguins
 are
 emperor
 penguins
 (
A
pt
en
ody
tes
 for
ster
i
).
 On
 average
,
 adult
 emperor
 penguins
 stand
 at
 about
 115
 cm
 (
45
 inches
)
...
Would
 you
 like
 to
 know
 more
 about
 any
 of
 these
 penguin
 species
?
COMPLETE

The output has been truncated for readability, but you can see that the model streams one token after another until it hits COMPLETE.

Streaming Responses in Document Mode.

In document mode, we pass in sources for the model to use in formulating it's reply. Here's what that looks like:

import cohere
from cohere.responses.chat import StreamEvent

co = cohere.Client("<YOUR API KEY>")

documents = [
    {
      "title": "Tall penguins",
      "snippet": "Emperor penguins are the tallest."
    },
    {
      "title": "Penguin habitats",
      "snippet": "Emperor penguins only live in Antarctica."
    },
    {
      "title": "What are animals?",
      "snippet": "Animals are different from plants."
    }
  ]

for event in co.chat_stream(message="What are the tallest living penguins?", documents=documents, prompt_truncation="AUTO"):
    if event.event_type == "text-generation":
      print(event.text)
    elif event.event_type == "citation-generation":
       print(event.citations)
    elif event.event_type == "stream-end":
      print(event.finish_reason)

Here's what the response looks like:

The
 tallest
 living
 penguins
 in
 the
 world
 are
 Emperor
 penguins
,
 which
 can
 reach
 heights
 of
 approximately
 115
 cm
 (
45
.
3
 inches
)
 tall
.
 Interestingly
,
 they
 are
 only
 found
 in
 Antarctica
.
[{'start': 45, 'end': 61, 'text': 'Emperor penguins', 'document_ids': ['doc_0']}]
[{'start': 104, 'end': 130, 'text': '115 cm (45.3 inches) tall.', 'document_ids': ['doc_0']}]
[{'start': 169, 'end': 180, 'text': 'Antarctica.', 'document_ids': ['doc_1']}]
COMPLETE

Note that the citation objects appear at the end, just before the stream completes. If you're not sure what these mean or how to read them, check out the "Document Mode" section above, which furnishes additional context.

Streaming Responses in Connector Mode.

Finally, we can also stream responses from the model when it's operating in connector mode. Here's what the code looks like:

import cohere
from cohere.responses.chat import StreamEvent

co = cohere.Client("<YOUR API KEY>")

for event in co.chat("What are the tallest living penguins?", stream=True, connectors=[{"id": "web-search"}], prompt_truncation="AUTO"):
    if event.event_type == StreamEvent.TEXT_GENERATION:
      print(event.text)
    elif event.event_type == StreamEvent.CITATION_GENERATION:
       print(event.citations)
    elif event.event_type == StreamEvent.STREAM_END:
      print(event.finish_reason)

And here's what the response looks like:

The
 tallest
 living
 penguins
 are
 the
 males
 of
 the
 Emperor
 penguin
 species
,
 who
 can
 stand
 up
 to
 1
.
3
 meters
 (
4
 feet
 3
 inches
)
 tall
 and
 weigh
 as
 much
 as
 45
 kilograms
 (
99
 pounds
).
 This
 species
 of
 penguin
 is
 native
 to
 Antarctica
.
 While
 they
 are
 the
 tallest
 living
 penguins
,
 they
 would
 be
 dwar
fed
 ...
[{'start': 36, 'end': 72, 'text': 'males of the Emperor penguin species', 'document_ids': ['web-search_6:0']}]
[{'start': 94, 'end': 127, 'text': '1.3 meters (4 feet 3 inches) tall', 'document_ids': ['web-search_6:0']}]
[{'start': 149, 'end': 173, 'text': '45 kilograms (99 pounds)', 'document_ids': ['web-search_0:0', 'web-search_6:0']}]
[{'start': 212, 'end': 223, 'text': 'Antarctica.', 'document_ids': ['web-search_0:0', 'web-search_6:0']}]
[{'start': 282, 'end': 310, 'text': 'dwarfed by the Mega Penguins', 'document_ids': ['web-search_9:0']}]
[{'start': 364, 'end': 388, 'text': 'Palaeeudyptes klekowskii', 'document_ids': ['web-search_9:4']}]
[{'start': 408, 'end': 424, 'text': 'Colossus Penguin', 'document_ids': ['web-search_9:4']}]
[{'start': 448, 'end': 461, 'text': '115 kilograms', 'document_ids': ['web-search_9:4']}]
[{'start': 475, 'end': 489, 'text': '2 meters tall.', 'document_ids': ['web-search_9:4']}]
COMPLETE

Next Steps

Check the Chat API reference and start building your own products! You can also read the retrieval augmented generation (RAG) documentation for more context.

Speaking of RAG, we've also released "Toolkit," a collection of pre-built front-end and back-end components enabling users to quickly build and deploy RAG applications. Check out the documentation for more details.