For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Response Structure
  • System Message
  • Multi-Turn Conversations
Text Generation

Using the Cohere Chat API for Text Generation

Was this page helpful?
Edit this page
Previous

Reasoning Capabilities

Next
Built with

The Chat API endpoint is used to generate text with Cohere LLMs. This endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.

Every message comes with a content field and an associated role, which indicates who that message is sent from. Roles can be user, assistant, system and tool.

1import cohere
2
3co = cohere.ClientV2(api_key="<YOUR API KEY>")
4
5res = co.chat(
6 model="command-a-plus-05-2026",
7 messages=[
8 {
9 "role": "user",
10 "content": "Write a title for a blog post about API design. Only output the title text.",
11 }
12 ],
13)
14
15print(res.message.content[0].text)
16# "The Ultimate Guide to API Design: Best Practices for Building Robust and Scalable APIs"

Response Structure

Below is a sample response from the Chat API. Here, the role of the message is going to be assistant.

JSON
1{
2 "id": "5a50480a-cf52-46f0-af01-53d18539bd31",
3 "message": {
4 "role": "assistant",
5 "content": [
6 {
7 "type": "text",
8 "text": "The Art of API Design: Crafting Elegant and Powerful Interfaces",
9 }
10 ],
11 },
12 "finish_reason": "COMPLETE",
13 "meta": {
14 "api_version": {"version": "2", "is_experimental": True},
15 "warnings": [
16 "You are using an experimental version, for more information please refer to https://docs.cohere.com/versioning-reference"
17 ],
18 "billed_units": {"input_tokens": 17, "output_tokens": 12},
19 "tokens": {"input_tokens": 215, "output_tokens": 12},
20 },
21}

Every response contains the following fields:

  • message the generated message from the model.
  • id the ID corresponding to this response.
  • finish_reason can be one of the following:
    • COMPLETE the model successfully finished generating the message
    • MAX_TOKENS the model’s context limit was reached before the generation could be completed
  • meta contains information with token counts, billing etc.

System Message

Developers can adjust the LLMs behavior by including a system message in the messages list with the role set to system.

The system message contains instructions that the model will respect over any instructions sent in messages sent from other roles. It is often used by developers to control the style in which the model communicates and to provide guidelines for how to handle various topics.

It is recommended to send the system message as the first element in the messages list.

1import cohere
2
3co = cohere.ClientV2(api_key="<YOUR API KEY>")
4
5system_message = "You respond concisely, in about 5 words or less"
6
7res = co.chat(
8 model="command-a-plus-05-2026",
9 messages=[
10 {"role": "system", "content": system_message},
11 {
12 "role": "user",
13 "content": "Write a title for a blog post about API design. Only output the title text.",
14 },
15 ], # "Designing Perfect APIs"
16)
17
18print(res.message.content[0].text)

Multi-Turn Conversations

A single Chat request can encapsulate multiple turns of a conversation, where each message in the messages list appears in the order it was sent. Sending multiple messages can give the model context for generating a response.

1import cohere
2
3co = cohere.ClientV2(api_key="<YOUR API KEY>")
4
5system_message = "You respond concisely, in about 5 words or less"
6
7res = co.chat(
8 model="command-a-plus-05-2026",
9 messages=[
10 {"role": "system", "content": system_message},
11 {
12 "role": "user",
13 "content": "Write a title for a blog post about API design. Only output the title text.",
14 },
15 {"role": "assistant", "content": "Designing Perfect APIs"},
16 {
17 "role": "user",
18 "content": "Another one about generative AI.",
19 },
20 ],
21)
22
23# "AI: The Generative Age"
24print(res.message.content[0].text)