A Guide to Streaming Responses
The Chat API is capable of streaming events (such as text generation) as they come. This means that partial results from the model can be displayed within moments, even if the full generation takes longer.
You’re likely already familiar with streaming. When you ask the model a question using the Coral UI, the interface doesn’t output a single block of text, instead it streams the text out a few words at a time. In many user interfaces enabling streaming improves the user experience by lowering the perceived latency.
Stream Events
When streaming is enabled, the API sends events down one by one. Each event has a type
. Events of different types need to be handled correctly.
The following is an example of printing the content-delta
event type from a streamed response, which contains the text contents of an LLM’s response.
The following sections describe the different types of events that are emitted during a streaming session.
Basic Chat Stream Events
message-start
The first event in the stream containing metadata for the request such as the id
. Only one message-start
event will be emitted.
content-start
The event that indicates the start of the content block of the message. Only one content-start
event will be emitted.
content-delta
The event that is emitted whenever the next chunk of text comes back from the model. As the model continues generating text, multiple events of this type will be emitted. Each event generates one token through the delta.message.content.text
field.
content-end
The event that indicates the end of the content block of the message. Only one content-end
event will be emitted.
message-end
The final event in the stream indicating the end of the streamed response. Only one message-end
event will be emitted.
Retrieval Augmented Generation Stream Events
message-start
Same as in a basic chat stream event.
content-start
Same as in a basic chat stream event.
content-delta
Same as in a basic chat stream event.
citation-start
Emitted for every citation generated in the response.
citation-end
Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start
and citation-end
pairs.
content-end
Same as in a basic chat stream event.
message-end
Same as in a basic chat stream event.
Tool Use Stream Events (For Tool Calling)
message-start
Same as in a basic chat stream event.
tool-plan-delta
Emitted when the next token of the tool plan is generated.
tool-call-start
Emitted when the model generates tool calls that require actioning upon. The event contains a list of tool_calls
containing the tool name and tool call ID of the tool.
tool-call-delta
Emitted when the next token of the the tool call is generated.
tool-call-end
Emitted when the tool call is finished.
message-end
Same as in a basic chat stream event.
Tool Use Stream Events (For Response Generation)
message-start
Same as in a basic chat stream event.
content-start
Same as in a basic chat stream event.
content-delta
Same as in a basic chat stream event.
citation-start
Emitted for every citation generated in the response.
citation-end
Emitted to indicate the end of a citation. If there are multiple citations generated, the events will come as a sequence of citation-start
and citation-end
pairs.
content-end
Same as in a basic chat stream event.
message-end
Same as in a basic chat stream event.