Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) enables an LLM to ground its responses on external documents, thus improving the accuracy of its responses and minimizing hallucinations.

The Chat endpoint comes with built-in RAG capabilities such as document grounding and citation generation.

This quickstart guide shows you how to perform RAG with the Chat endpoint.

Setup

First, install the Cohere Python SDK with the following command.

$ pip install -U cohere

Next, import the library and create a client.

Cohere Platform

Private Deployment

Bedrock

SageMaker

Azure AI

PYTHON

1 import cohere
2 
3 co = cohere.Client(
4     "COHERE_API_KEY"
5 )  # Get your free API key here: https://dashboard.cohere.com/api-keys

Documents

First, define the documents that will passed as the context for RAG. These documents are typically retrieved from sources such as vector databases via semantic search, or any system that can retrieve unstructured data given a user query.

Each document can take any number of fields e.g. title, url, text, etc.

PYTHON

1 documents = [
2     {
3         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
4     },
5     {
6         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
7     },
8 ]

Response Generation

Next, call the Chat API by passing the documents in the documents parameter. This tells the model to run in RAG-mode and use these documents as the context in its response.

Cohere Platform

Private Deployment

Bedrock

SageMaker

Azure AI

PYTHON

1 message = "Are there fitness-related benefits?"
2 
3 response = co.chat(
4     model="command-a-03-2025",
5     message=message,
6     documents=documents,
7 )
8 
9 print(response.text)

1 Yes, we offer gym memberships, on-site yoga classes, and comprehensive health insurance.

Citation Generation

The response object contains a citations field, which contains specific text spans from the documents on which the response is grounded.

PYTHON

1 if response.citations:
2     for citation in response.citations:
3         print(citation, "\n")

1 start=14 end=88 text='gym memberships, on-site yoga classes, and comprehensive health insurance.' document_ids=['doc_1']