For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Stream Events
  • Basic Chat Stream Events
  • message-start
  • content-start
  • content-delta
  • content-end
  • message-end
  • Retrieval Augmented Generation Stream Events
  • message-start
  • content-start
  • content-delta
  • citation-start
  • citation-end
  • content-end
  • message-end
  • Tool Use Stream Events (For Tool Calling)
  • message-start
  • tool-plan-delta
  • tool-call-start
  • tool-call-delta
  • tool-call-end
  • message-end
  • Tool Use Stream Events (For Response Generation)
  • message-start
  • content-start
  • content-delta
  • citation-start
  • citation-end
  • content-end
  • message-end
Text Generation

A Guide to Streaming Responses

Was this page helpful?
Edit this page
Previous

How do Structured Outputs Work?

Next
Built with

The Chat API is capable of streaming events (such as text generation) as they come. This means that partial results from the model can be displayed within moments, even if the full generation takes longer.

You’re likely already familiar with streaming. When you ask the model a question using the Cohere playground, the interface doesn’t output a single block of text, instead it streams the text out a few words at a time. In many user interfaces enabling streaming improves the user experience by lowering the perceived latency.

Stream Events

When streaming is enabled, the API sends events down one by one. Each event has a type. Events of different types need to be handled correctly.

The following is an example of printing the content-delta event type from a streamed response, which contains the text contents of an LLM’s response.

1import cohere
2
3co = cohere.ClientV2(api_key="<YOUR API KEY>")
4
5res = co.chat_stream(
6 model="command-a-plus-05-2026",
7 messages=[{"role": "user", "content": "What is an LLM?"}],
8)
9
10for event in res:
11 if event:
12 if event.type == "content-delta":
13 print(event.delta.message.content.text, end="")
# Sample output (streamed)
A large language model (LLM) is a type of artificial neural network model that has been trained on massive amounts of text data ...

The following sections describe the different types of events that are emitted during a streaming session.

Basic Chat Stream Events

message-start

The first event in the stream containing metadata for the request such as the id. Only one message-start event will be emitted.

content-start

The event that indicates the start of the content block of the message. Only one content-start event will be emitted.

content-delta

The event that is emitted whenever the next chunk of text comes back from the model. As the model continues generating text, multiple events of this type will be emitted. Each event generates one token through the delta.message.content.text field.

# Sample events
type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text='A')))
type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' large')))
type='content-delta' index=0 delta=ChatContentDeltaEventDelta(message=ChatContentDeltaEventDeltaMessage(content=ChatContentDeltaEventDeltaMessageContent(text=' language')))
...

content-end

The event that indicates the end of the content block of the message. Only one content-end event will be emitted.

message-end

The final event in the stream indicating the end of the streamed response. Only one message-end event will be emitted.

Retrieval Augmented Generation Stream Events

message-start

Same as in a basic chat stream event.

content-start

Same as in a basic chat stream event.

content-delta

Same as in a basic chat stream event.

citation-start

Emitted for every citation generated in the response.

# Sample event
type='citation-start' index=0 delta=CitationStartEventDelta(message=CitationStartEventDeltaMessage(citations=Citation(start=14, end=29, text='gym memberships', sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance.'})])))

citation-end

Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start and citation-end pairs.

content-end

Same as in a basic chat stream event.

message-end

Same as in a basic chat stream event.

Tool Use Stream Events (For Tool Calling)

message-start

Same as in a basic chat stream event.

tool-plan-delta

Emitted when the next token of the tool plan is generated.

# Sample events
type='tool-plan-delta' delta=ChatToolPlanDeltaEventDelta(tool_plan=None, message={'tool_plan': 'I'})
type='tool-plan-delta' delta=ChatToolPlanDeltaEventDelta(tool_plan=None, message={'tool_plan': ' will'})
type='tool-plan-delta' delta=ChatToolPlanDeltaEventDelta(tool_plan=None, message={'tool_plan': ' use'})
...

tool-call-start

Emitted when the model generates tool calls that require actioning upon. The event contains a list of tool_calls containing the tool name and tool call ID of the tool.

# Sample event
type='tool-call-start' index=0 delta=ChatToolCallStartEventDelta(tool_call=None, message={'tool_calls': {'id': 'get_weather_nsz5zm3w56q3', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': ''}}})

tool-call-delta

Emitted when the next token of the the tool call is generated.

# Sample events
type='tool-call-delta' index=0 delta=ChatToolCallDeltaEventDelta(tool_call=None, message={'tool_calls': {'function': {'arguments': '{\n "'}}})
type='tool-call-delta' index=0 delta=ChatToolCallDeltaEventDelta(tool_call=None, message={'tool_calls': {'function': {'arguments': 'location'}}})
type='tool-call-delta' index=0 delta=ChatToolCallDeltaEventDelta(tool_call=None, message={'tool_calls': {'function': {'arguments': '":'}}})
...

tool-call-end

Emitted when the tool call is finished.

message-end

Same as in a basic chat stream event.

Tool Use Stream Events (For Response Generation)

message-start

Same as in a basic chat stream event.

content-start

Same as in a basic chat stream event.

content-delta

Same as in a basic chat stream event.

citation-start

Emitted for every citation generated in the response.

citation-end

Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start and citation-end pairs.

content-end

Same as in a basic chat stream event.

message-end

Same as in a basic chat stream event.