Generates a text response to a user message and streams it down, token by token. To learn how to use the Chat API with streaming follow our Text Generation guides.
Follow the Migration Guide for instructions on moving from API v1 to API v2.
The name of the project that is making the request.
The name of a compatible Cohere model (such as command-r or command-r-plus) or the ID of a fine-tuned model.
A list of chat messages in chronological order, representing a conversation between the user and the model.
Messages can be from User
, Assistant
, Tool
and System
roles. Learn more about messages and roles in the Chat API guide.
A list of available tools (functions) that the model may suggest invoking before producing a text response.
When tools
is passed (without tool_results
), the text
content in the response will be empty and the tool_calls
field in the response will be populated with a list of tool calls that need to be made. If no calls need to be made, the tool_calls
array will be empty.
A list of relevant documents that the model can cite to generate a more accurate reply. Each document is either a string or document object with content and metadata.
Options for controlling citation generation.
Configuration for forcing the model output to adhere to the specified format. Supported on Command R, Command R+ and newer models.
The model can be forced into outputting JSON objects by setting { "type": "json_object" }
.
A JSON Schema can optionally be provided, to ensure a specific structure.
Note: When using { "type": "json_object" }
your message
should always explicitly instruct the model to generate a JSON (eg: “Generate a JSON …”) . Otherwise the model may end up getting stuck generating an infinite stream of characters and eventually run out of context length.
Note: When json_schema
is not specified, the generated object can have up to 5 layers of nesting.
Limitation: The parameter is not supported when used in combinations with the documents
or tools
parameters.
Used to select the safety instruction inserted into the prompt. Defaults to CONTEXTUAL
.
When OFF
is specified, the safety instruction will be omitted.
Safety modes are not yet configurable in combination with tools
, tool_results
and documents
parameters.
Note: This parameter is only compatible with models Command R 08-2024, Command R+ 08-2024 and newer.
The maximum number of tokens the model will generate as part of the response.
Note: Setting a low value may result in incomplete generations.
A list of up to 5 strings that the model will use to stop generation. If the model generates a string that matches any of the strings in the list, it will stop generating tokens and return the generated text up to that point not including the stop sequence.
Defaults to 0.3
.
A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations, and higher temperatures mean more random generations.
Randomness can be further maximized by increasing the value of the p
parameter.
>=0
If specified, the backend will make a best effort to sample tokens deterministically, such that repeated requests with the same seed and parameters should return the same result. However, determinism cannot be totally guaranteed.
Defaults to 0.0
, min value of 0.0
, max value of 1.0
.
Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.
Defaults to 0.0
, min value of 0.0
, max value of 1.0
.
Used to reduce repetitiveness of generated tokens. Similar to frequency_penalty
, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.
Ensures that only the top k
most likely tokens are considered for generation at each step. When k
is set to 0
, k-sampling is disabled.
Defaults to 0
, min value of 0
, max value of 500
.
Ensures that only the most likely tokens, with total probability mass of p
, are considered for generation at each step. If both k
and p
are enabled, p
acts after k
.
Defaults to 0.75
. min value of 0.01
, max value of 0.99
.
Defaults to false
. When set to true
, the log probabilities of the generated tokens will be included in the response.
A streamed event which signifies that a stream has started.
A streamed delta event which signifies that a new content block has started.
A streamed delta event which contains a delta of chat text content.
A streamed delta event which signifies that the content block has ended.
A streamed event which contains a delta of tool plan text.
A streamed event delta which signifies a tool call has started streaming.
A streamed event delta which signifies a delta in tool call arguments.
A streamed event delta which signifies a tool call has finished streaming.
A streamed event which signifies a citation has been created.
A streamed event which signifies a citation has finished streaming.
A streamed event which signifies that the chat message has ended.