For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
      • Basic usage
      • End-to-end example
      • Streaming
      • Citations
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Overview
  • Events stream
  • Event types
  • Example stream
  • Usage example
  • Setup
  • Define documents
  • Streaming the response
Text GenerationRetrieval Augmented Generation (RAG)

RAG Streaming

Was this page helpful?
Edit this page
Previous

RAG Citations

Next
Built with

Overview

To enable response streaming in RAG, use the chat_stream endpoint instead of chat.

This allows your application to receive token streams as the model generates its response.

Events stream

In RAG, the events streamed by the endpoint follows the structure of a basic chat stream event but contains additional events for tool calling and response generation with the associated contents. This section describes the stream of events and their contents.

Event types

message-start

Same as in a basic chat stream event.

content-start

Same as in a basic chat stream event.

content-delta

Same as in a basic chat stream event.

citation-start

Emitted for every citation generated in the response. This event contains the details about a citation such as the start and end indices of the text that cites a source(s), the corresponding text, and the list of sources.

citation-end

Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start and citation-end pairs.

content-end

Same as in a basic chat stream event.

message-end

Same as in a basic chat stream event.

Example stream

The following is an example stream with RAG.

1"Where do the tallest penguins live?"
2
3type='message-start' id='d93f187e-e9ac-44a9-a2d9-bdf2d65fee94' delta=ChatMessageStartEventDelta(message=ChatMessageStartEventDeltaMessage(role='assistant', content=[], tool_plan='', tool_calls=[], citations=[]))
4 --------------------------------------------------
5type='content-start' index=0 delta=ChatContentStartEventDelta(message=ChatContentStartEventDeltaMessage(content=ChatContentStartEventDeltaMessageContent(text='', type='text')))
6 --------------------------------------------------
7type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='The'))) logprobs=None
8 --------------------------------------------------
9type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' tallest'))) logprobs=None
10 --------------------------------------------------
11type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' penguins'))) logprobs=None
12 --------------------------------------------------
13type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' are'))) logprobs=None
14 --------------------------------------------------
15type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' the'))) logprobs=None
16 --------------------------------------------------
17type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' Emperor'))) logprobs=None
18 --------------------------------------------------
19type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' penguins'))) logprobs=None
20 --------------------------------------------------
21type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='.'))) logprobs=None
22 --------------------------------------------------
23type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' They'))) logprobs=None
24 --------------------------------------------------
25type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' only'))) logprobs=None
26 --------------------------------------------------
27type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' live'))) logprobs=None
28 --------------------------------------------------
29type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' in'))) logprobs=None
30 --------------------------------------------------
31type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' Antarctica'))) logprobs=None
32 --------------------------------------------------
33type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='.'))) logprobs=None
34 --------------------------------------------------
35type='citation-start' index=0 delta=CitationStartEventDelta(message=CitationStartEventDeltaMessage(citations=Citation(start=29, end=46, text='Emperor penguins.', sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})], type='TEXT_CONTENT')))
36 --------------------------------------------------
37type='citation-end' index=0
38 --------------------------------------------------
39type='citation-start' index=1 delta=CitationStartEventDelta(message=CitationStartEventDeltaMessage(citations=Citation(start=65, end=76, text='Antarctica.', sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})], type='TEXT_CONTENT')))
40 --------------------------------------------------
41type='citation-end' index=1
42 --------------------------------------------------
43type='content-end' index=0
44 --------------------------------------------------
45type='message-end' id=None delta=ChatMessageEndEventDelta(finish_reason='COMPLETE', usage=Usage(billed_units=UsageBilledUnits(input_tokens=34.0, output_tokens=14.0, search_units=None, classifications=None), tokens=UsageTokens(input_tokens=721.0, output_tokens=59.0)))
46 --------------------------------------------------

Usage example

This section provides an example of handling streamed objects in the tool use response generation step.

Setup

First, import the Cohere library and create a client.

Cohere platform
Private deployment
PYTHON
1# ! pip install -U cohere
2import cohere
3
4co = cohere.ClientV2(
5 "COHERE_API_KEY"
6) # Get your free API key here: https://dashboard.cohere.com/api-keys

Define documents

Next, define the documents to be passed to the endpoint.

PYTHON
1documents = [
2 {
3 "data": {
4 "title": "Tall penguins",
5 "snippet": "Emperor penguins are the tallest.",
6 }
7 },
8 {
9 "data": {
10 "title": "Penguin habitats",
11 "snippet": "Emperor penguins only live in Antarctica.",
12 }
13 },
14]

Streaming the response

We can now stream the response using the chat_stream endpoint.

The events are streamed as chunk objects. In the example below, we pick content-delta to display the text response and citation-start to display the citations.

1messages = [
2 {"role": "user", "content": "Where do the tallest penguins live?"}
3]
4
5response = co.chat_stream(
6 model="command-a-plus-05-2026",
7 messages=messages,
8 documents=documents,
9)
10
11response_text = ""
12citations = []
13for chunk in response:
14 if chunk:
15 if chunk.type == "content-delta":
16 response_text += chunk.delta.message.content.text
17 print(chunk.delta.message.content.text, end="")
18 if chunk.type == "citation-start":
19 citations.append(chunk.delta.message.citations)
20
21for citation in citations:
22 print(citation, "\n")

Example response:

1The tallest penguins are the Emperor penguins, which only live in Antarctica.
2
3start=29 end=45 text='Emperor penguins' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT'
4
5start=66 end=77 text='Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'