Chat
Generates a text response to a user message and streams it down, token by token. To learn how to use the Chat API with streaming follow our Text Generation guides.
Follow the Migration Guide for instructions on moving from API v1 to API v2.
Headers
Bearer authentication of the form Bearer <token>
, where token is your auth token.
Request
Defaults to false
.
When true
, the response will be a SSE stream of events.
Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.
The name of a compatible Cohere model or the ID of a fine-tuned model.
A list of chat messages in chronological order, representing a conversation between the user and the model.
Messages can be from User
, Assistant
, Tool
and System
roles. Learn more about messages and roles in the Chat API guide.
A list of tools (functions) available to the model. The model response may contain ‘tool_calls’ to the specified tools.
Learn more in the Tool Use guide.
Defaults to 0.3
.
A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations, and higher temperatures mean more random generations.
Randomness can be further maximized by increasing the value of the p
parameter.
Defaults to 0.0
, min value of 0.0
, max value of 1.0
.
Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.
Defaults to 0.0
, min value of 0.0
, max value of 1.0
.
Used to reduce repetitiveness of generated tokens. Similar to frequency_penalty
, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.
Ensures that only the top k
most likely tokens are considered for generation at each step. When k
is set to 0
, k-sampling is disabled.
Defaults to 0
, min value of 0
, max value of 500
.
Ensures that only the most likely tokens, with total probability mass of p
, are considered for generation at each step. If both k
and p
are enabled, p
acts after k
.
Defaults to 0.75
. min value of 0.01
, max value of 0.99
.
Defaults to false
. When set to true
, the log probabilities of the generated tokens will be included in the response.