For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
      • RAG
      • Reranking
      • Semantic Search
      • Text Generation
      • Tool Use & Agents
      • Transcribing Audio
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Further Resources
Get StartedQuickstart

Retrieval augmented generation (RAG) - quickstart

Was this page helpful?
Edit this page
Previous

Reranking - quickstart

Next
Built with

Retrieval Augmented Generation (RAG) enables an LLM to ground its responses on external documents, thus improving the accuracy of its responses and minimizing hallucinations.

The Chat endpoint comes with built-in RAG capabilities such as document grounding and citation generation.

This quickstart guide shows you how to perform RAG with the Chat endpoint.

1

Setup

First, install the Cohere Python SDK with the following command.

$pip install -U cohere

Next, import the library and create a client.

Cohere Platform
Private Deployment
Bedrock
SageMaker
Azure AI
PYTHON
1import cohere
2
3co = cohere.ClientV2(
4 "COHERE_API_KEY"
5) # Get your free API key here: https://dashboard.cohere.com/api-keys
2

Documents

First, define the documents that will passed as the context for RAG. These documents are typically retrieved from sources such as vector databases via semantic search, or any system that can retrieve unstructured data given a user query.

Each document is a data object that can take any number of fields e.g. title, url, text, etc.

PYTHON
1documents = [
2 {
3 "data": {
4 "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
5 }
6 },
7 {
8 "data": {
9 "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
10 }
11 },
12 {
13 "data": {
14 "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
15 }
16 },
17]
3

Response Generation

Next, call the Chat API by passing the documents in the documents parameter. This tells the model to run in RAG-mode and use these documents as the context in its response.

Cohere Platform
Private Deployment
Bedrock
SageMaker
Azure AI
PYTHON
1# Add the user query
2query = "Are there health benefits?"
3
4# Generate the response
5response = co.chat(
6 model="command-a-plus-05-2026",
7 messages=[{"role": "user", "content": query}],
8 documents=documents,
9)
10
11# Display the response
12print(response.message.content[0].text)
1Yes, there are health benefits. We offer gym memberships, on-site yoga classes, and comprehensive health insurance.
4

Citation Generation

The response object contains a citations field, which contains specific text spans from the documents on which the response is grounded.

PYTHON
1if response.message.citations:
2 for citation in response.message.citations:
3 print(citation, "\n")
1start=14 end=88 text='gym memberships, on-site yoga classes, and comprehensive health insurance.' document_ids=['doc_1']

Further Resources

  • Chat endpoint API reference
  • Documentation on RAG
  • LLM University module on RAG