Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response. When used in conjunction with Command family of models, the Chat API makes it easy to generate text that is grounded on supplementary documents, thus minimizing hallucinations.

A quick example

To call the Chat API with RAG, pass the following parameters as a minimum:

model for the model ID
messages for the user’s query.
documents for defining the documents to be used as the context for the response.

The code snippet below, for example, will produce a grounded answer to "Where do the tallest penguins live?", along with inline citations based on the provided documents.

Cohere platform

Private deployment

PYTHON

1 # ! pip install -U cohere
2 import cohere
3 
4 co = cohere.ClientV2(
5     "COHERE_API_KEY"
6 )  # Get your free API key here: https://dashboard.cohere.com/api-keys

PYTHON

1 # Retrieve the documents
2 documents = [
3     {
4         "data": {
5             "title": "Tall penguins",
6             "snippet": "Emperor penguins are the tallest.",
7         }
8     },
9     {
10         "data": {
11             "title": "Penguin habitats",
12             "snippet": "Emperor penguins only live in Antarctica.",
13         }
14     },
15     {
16         "data": {
17             "title": "What are animals?",
18             "snippet": "Animals are different from plants.",
19         }
20     },
21 ]
22 
23 # Add the user message
24 message = "Where do the tallest penguins live?"
25 
26 messages = [{"role": "user", "content": message}]
27 
28 response = co.chat(
29     model="command-a-03-2025",
30     messages=messages,
31     documents=documents,
32 )
33 
34 print(response.message.content[0].text)
35 
36 print(response.message.citations)

The resulting generation is"The tallest penguins are emperor penguins, which live in Antarctica.". The model was able to combine partial information from multiple sources and ignore irrelevant documents to arrive at the full answer.

Nice 🐧❄️!

Example response:

1 # response.message.content[0].text
2 Emperor penguins are the tallest penguins. They only live in Antarctica.
3 
4 # response.message.citations
5 [Citation(start=0,
6           end=16, 
7           text='Emperor penguins', 
8           sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})]), 
9 Citation(start=25, 
10           end=42, 
11           text='tallest penguins.', 
12           sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})]), 
13 Citation(start=61, 
14           end=72, 
15           text='Antarctica.',
16           sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})])]

The response also includes inline citations that reference the first two documents, since they hold the answers.

Read more about using and customizing RAG citations here

Three steps of RAG

The RAG workflow generally consists of 3 steps:

Generating search queries for finding relevant documents. What does the model recommend looking up before answering this question?
Fetching relevant documents from an external data source using the generated search queries. Performing a search to find some relevant information.
Generating a response with inline citations using the fetched documents. Generating a response using the fetched documents. This response will contain inline citations, which you can decide to leverage or ignore.

Example: Using RAG to identify the definitive 90s boy band

In this section, we will use the three step RAG workflow to finally settle the score between the notorious boy bands Backstreet Boys and NSYNC. We ask the model to provide an informed answer to the question "Who is more popular: Nsync or Backstreet Boys?"

Step 1: Generating search queries

First, the model needs to generate an optimal set of search queries to use for retrieval.

There are different possible approaches to do this. In this example, we’ll take a tool use approach.

Here, we build a tool that takes a user query and returns a list of relevant document snippets for that query. The tool can generate zero, one or multiple search queries depending on the user query.

PYTHON

1 message = "Who is more popular: Nsync or Backstreet Boys?"
2 
3 # Define the query generation tool
4 query_gen_tool = [
5     {
6         "type": "function",
7         "function": {
8             "name": "internet_search",
9             "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
10             "parameters": {
11                 "type": "object",
12                 "properties": {
13                     "queries": {
14                         "type": "array",
15                         "items": {"type": "string"},
16                         "description": "a list of queries to search the internet with.",
17                     }
18                 },
19                 "required": ["queries"],
20             },
21         },
22     }
23 ]
24 
25 # Define a system message to optimize search query generation
26 instructions = "Write a search query that will find helpful information for answering the user's question accurately. If you need more than one search query, write a list of search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."
27 
28 # Generate search queries (if any)
29 import json
30 
31 search_queries = []
32 
33 res = co.chat(
34     model="command-a-03-2025",
35     messages=[
36         {"role": "system", "content": instructions},
37         {"role": "user", "content": message},
38     ],
39     tools=query_gen_tool,
40 )
41 
42 if res.message.tool_calls:
43     for tc in res.message.tool_calls:
44         queries = json.loads(tc.function.arguments)["queries"]
45         search_queries.extend(queries)
46 
47 print(search_queries)

# Sample response
['popularity of NSync', 'popularity of Backstreet Boys']

Indeed, to generate a factually accurate answer to the question “Who is more popular: Nsync or Backstreet Boys?”, looking up popularity of NSync and popularity of Backstreet Boys first would be helpful.

Customizing the generation of search queries

You can then update the system message and/or the tool definition to generate queries that are more relevant to your use case.

For example, you can update the system message to encourage a longer list of search queries to be generated.

PYTHON

1 instructions = "Write a search query that will find helpful information for answering the user's question accurately. If you need more than one search query, write a list of search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."

Example response:

1 ['NSync popularity', 'Backstreet Boys popularity', 'NSync vs Backstreet Boys popularity comparison', 'Which boy band is more popular NSync or Backstreet Boys', 'NSync and Backstreet Boys fan base size comparison', 'Who has sold more albums NSync or Backstreet Boys', 'NSync and Backstreet Boys chart performance comparison']

Step 2: Fetching relevant documents

The next step is to fetch documents from the relevant data source using the generated search queries. For example, to answer the question about the two pop sensations NSYNC and Backstreet Boys, one might want to use an API from a web search engine, and fetch the contents of the websites listed at the top of the search results.

We won’t go into details of fetching data in this guide, since it’s very specific to the search API you’re querying. However we should mention that breaking up documents into small chunks of ±400 words will help you get the best performance from Cohere models.

Context length limit

When trying to stay within the context length limit, you might need to omit some of the documents from the request. To make sure that only the least relevant documents are omitted, we recommend using the Rerank endpoint endpoint which will sort the documents by relevancy to the query. The lowest ranked documents are the ones you should consider dropping first.

Formatting documents

The Chat endpoint supports a few different options for structuring documents in the documents parameter:

List of objects with data object: Each document is passed as a data object (with an optional id field to be used in citations). The data object is a string-any dictionary containing the document’s contents. For example, a web search document can contain a title, text, and url for the document’s title, text, and URL.
List of objects with data string: Each document is passed as a data string (with an optional id field to be used in citations).
List of strings: Each document is passed as a string.

The following examples demonstrate the options mentioned above.

'data' object

'data' string

string

PYTHON

1 documents = [
2     {
3         "data": {
4             "title": "Tall penguins",
5             "snippet": "Emperor penguins are the tallest.",
6         }
7     }
8 ]

The id field will be used in citation generation as the reference document IDs. If no id field is passed in an API call, the API will automatically generate the IDs based on the documents position in the list. For more information, see the guide on using custom IDs.

Step 3: Generating a response with citations

In the final step, we will be calling the Chat API again, but this time passing along the documents you acquired in Step 2. We recommend using a few descriptive keys such as "title", "snippet", or "last updated" and only including semantically relevant data. The keys and the values will be formatted into the prompt and passed to the model.

PYTHON

1 documents = [
2     {
3         "data": {
4             "title": "CSPC: Backstreet Boys Popularity Analysis - ChartMasters",
5             "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: Backstreet Boys Popularity Analysis\n\nHernán Lopez Posted on February 9, 2017 Posted in CSPC 72 Comments Tagged with Backstreet Boys, Boy band\n\nAt one point, Backstreet Boys defined success: massive albums sales across the globe, great singles sales, plenty of chart topping releases, hugely hyped tours and tremendous media coverage.\n\nIt is true that they benefited from extraordinarily good market conditions in all markets. After all, the all-time record year for the music business, as far as revenues in billion dollars are concerned, was actually 1999. That is, back when this five men group was at its peak.",
6         }
7     },
8     {
9         "data": {
10             "title": "CSPC: NSYNC Popularity Analysis - ChartMasters",
11             "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: NSYNC Popularity Analysis\n\nMJD Posted on February 9, 2018 Posted in CSPC 27 Comments Tagged with Boy band, N'Sync\n\nAt the turn of the millennium three teen acts were huge in the US, the Backstreet Boys, Britney Spears and NSYNC. The latter is the only one we haven’t study so far. It took 15 years and Adele to break their record of 2,4 million units sold of No Strings Attached in its first week alone.\n\nIt wasn’t a fluke, as the second fastest selling album of the Soundscan era prior 2015, was also theirs since Celebrity debuted with 1,88 million units sold.",
12         }
13     },
14     {
15         "data": {
16             "title": "CSPC: Backstreet Boys Popularity Analysis - ChartMasters",
17             "snippet": " 1997, 1998, 2000 and 2001 also rank amongst some of the very best years.\n\nYet the way many music consumers – especially teenagers and young women’s – embraced their output deserves its own chapter. If Jonas Brothers and more recently One Direction reached a great level of popularity during the past decade, the type of success achieved by Backstreet Boys is in a completely different level as they really dominated the business for a few years all over the world, including in some countries that were traditionally hard to penetrate for Western artists.\n\nWe will try to analyze the extent of that hegemony with this new article with final results which will more than surprise many readers.",
18         }
19     },
20     {
21         "data": {
22             "title": "CSPC: NSYNC Popularity Analysis - ChartMasters",
23             "snippet": " Was the teen group led by Justin Timberlake really that big? Was it only in the US where they found success? Or were they a global phenomenon?\n\nAs usual, I’ll be using the Commensurate Sales to Popularity Concept in order to relevantly gauge their results. This concept will not only bring you sales information for all NSYNC‘s albums, physical and download singles, as well as audio and video streaming, but it will also determine their true popularity. If you are not yet familiar with the CSPC method, the next page explains it with a short video. I fully recommend watching the video before getting into the sales figures.",
24         }
25     },
26 ]
27 
28 # Add the user message
29 message = "Who is more popular: Nsync or Backstreet Boys?"
30 messages = [{"role": "user", "content": message}]
31 
32 response = co.chat(
33     model="command-a-03-2025",
34     messages=messages,
35     documents=documents,
36 )
37 
38 print(response.message.content[0].text)

Example response:

1 Both NSYNC and Backstreet Boys were huge in the US at the turn of the millennium. However, Backstreet Boys achieved a greater level of success than NSYNC. They dominated the music business for a few years all over the world, including in some countries that were traditionally hard to penetrate for Western artists. Their success included massive album sales across the globe, great singles sales, plenty of chart-topping releases, hugely hyped tours and tremendous media coverage.

In this RAG setting, Cohere models are trained to generate fine-grained citations, out-of-the-box, alongside their text output. Here, we see a sample list of citations, one for each specific span in its response, where it uses the document(s) to answer the question.

For a deeper dive into the citations feature, see the RAG citations guide.

PYTHON

1 print(response.message.citations)

Example response:

1 # (truncated for brevity)
2 [Citation(start=36, 
3           end=81, 
4           text='huge in the US at the turn of the millennium.', 
5           sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': "↓ Skip to Main Content\n\nMusic industry – One step closer ...", 'title': 'CSPC: NSYNC Popularity Analysis - ChartMasters'})]),
6 Citation(start=107, 
7           end=154, 
8           text='achieved a greater level of success than NSYNC.', 
9           sources=[DocumentSource(type='document', id='doc:2', document={'id': 'doc:2', 'snippet': ' 1997, 1998, 2000 and 2001 also rank amongst some of the very best ...', 'title': 'CSPC: Backstreet Boys Popularity Analysis - ChartMasters'})]),
10 Citation(start=160, 
11         end=223,
12         ...
13 ...]

Not only will we discover that the Backstreet Boys were the more popular band, but the model can also Tell Me Why, by providing details supported by citations.

For a more in-depth RAG example that leverages the Embed and Rerank endpoints for retrieval, see End-to-end example of RAG with Chat, Embed, and Rerank.

Caveats

It’s worth underscoring that RAG does not guarantee accuracy. It involves giving a model context which informs its replies, but if the provided documents are themselves out-of-date, inaccurate, or biased, whatever the model generates might be as well. What’s more, RAG doesn’t guarantee that a model won’t hallucinate. It greatly reduces the risk, but doesn’t necessarily eliminate it altogether. This is why we put an emphasis on including inline citations, which allow users to verify the information.