Overview

To enable response streaming in RAG, use the chat_stream endpoint instead of chat.

This allows your application to receive token streams as the model generates its response.

Events stream

In RAG, the events streamed by the endpoint follows the structure of a basic chat stream event but contains additional events for tool calling and response generation with the associated contents. This section describes the stream of events and their contents.

Event types

message-start

Same as in a basic chat stream event.

content-start

Same as in a basic chat stream event.

content-delta

Same as in a basic chat stream event.

citation-start

Emitted for every citation generated in the response. This event contains the details about a citation such as the start and end indices of the text that cites a source(s), the corresponding text, and the list of sources.

citation-end

Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start and citation-end pairs.

content-end

Same as in a basic chat stream event.

message-end

Same as in a basic chat stream event.

Example stream

The following is an example stream with RAG.

1"Where do the tallest penguins live?"
2
3type='message-start' id='d93f187e-e9ac-44a9-a2d9-bdf2d65fee94' delta=ChatMessageStartEventDelta(message=ChatMessageStartEventDeltaMessage(role='assistant', content=[], tool_plan='', tool_calls=[], citations=[]))
4 --------------------------------------------------
5type='content-start' index=0 delta=ChatContentStartEventDelta(message=ChatContentStartEventDeltaMessage(content=ChatContentStartEventDeltaMessageContent(text='', type='text')))
6 --------------------------------------------------
7type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='The'))) logprobs=None
8 --------------------------------------------------
9type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' tallest'))) logprobs=None
10 --------------------------------------------------
11type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' penguins'))) logprobs=None
12 --------------------------------------------------
13type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' are'))) logprobs=None
14 --------------------------------------------------
15type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' the'))) logprobs=None
16 --------------------------------------------------
17type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' Emperor'))) logprobs=None
18 --------------------------------------------------
19type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' penguins'))) logprobs=None
20 --------------------------------------------------
21type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='.'))) logprobs=None
22 --------------------------------------------------
23type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' They'))) logprobs=None
24 --------------------------------------------------
25type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' only'))) logprobs=None
26 --------------------------------------------------
27type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' live'))) logprobs=None
28 --------------------------------------------------
29type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' in'))) logprobs=None
30 --------------------------------------------------
31type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' Antarctica'))) logprobs=None
32 --------------------------------------------------
33type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='.'))) logprobs=None
34 --------------------------------------------------
35type='citation-start' index=0 delta=CitationStartEventDelta(message=CitationStartEventDeltaMessage(citations=Citation(start=29, end=46, text='Emperor penguins.', sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})], type='TEXT_CONTENT')))
36 --------------------------------------------------
37type='citation-end' index=0
38 --------------------------------------------------
39type='citation-start' index=1 delta=CitationStartEventDelta(message=CitationStartEventDeltaMessage(citations=Citation(start=65, end=76, text='Antarctica.', sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})], type='TEXT_CONTENT')))
40 --------------------------------------------------
41type='citation-end' index=1
42 --------------------------------------------------
43type='content-end' index=0
44 --------------------------------------------------
45type='message-end' id=None delta=ChatMessageEndEventDelta(finish_reason='COMPLETE', usage=Usage(billed_units=UsageBilledUnits(input_tokens=34.0, output_tokens=14.0, search_units=None, classifications=None), tokens=UsageTokens(input_tokens=721.0, output_tokens=59.0)))
46 --------------------------------------------------

Usage example

This section provides an example of handling streamed objects in the tool use response generation step.

Setup

First, import the Cohere library and create a client.

PYTHON
1# ! pip install -U cohere
2import cohere
3
4co = cohere.ClientV2(
5 "COHERE_API_KEY"
6) # Get your free API key here: https://dashboard.cohere.com/api-keys

Define documents

Next, define the documents to be passed to the endpoint.

PYTHON
1documents = [
2 {
3 "data": {
4 "title": "Tall penguins",
5 "snippet": "Emperor penguins are the tallest.",
6 }
7 },
8 {
9 "data": {
10 "title": "Penguin habitats",
11 "snippet": "Emperor penguins only live in Antarctica.",
12 }
13 },
14]

Streaming the response

We can now stream the response using the chat_stream endpoint.

The events are streamed as chunk objects. In the example below, we pick content-delta to display the text response and citation-start to display the citations.

PYTHON
1messages = [
2 {"role": "user", "content": "Where do the tallest penguins live?"}
3]
4
5response = co.chat_stream(
6 model="command-a-03-2025",
7 messages=messages,
8 documents=documents,
9)
10
11response_text = ""
12citations = []
13for chunk in response:
14 if chunk:
15 if chunk.type == "content-delta":
16 response_text += chunk.delta.message.content.text
17 print(chunk.delta.message.content.text, end="")
18 if chunk.type == "citation-start":
19 citations.append(chunk.delta.message.citations)
20
21for citation in citations:
22 print(citation, "\n")

Example response:

1The tallest penguins are the Emperor penguins, which only live in Antarctica.
2
3start=29 end=45 text='Emperor penguins' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT'
4
5start=66 end=77 text='Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'
Built with