Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response. When used in conjunction with a Command A or Command R7B, the Chat API makes it easy to generate text that is grounded on supplementary information.

The code snippet below, for example, will produce a grounded answer to "Where do the tallest penguins live?", along with inline citations based on the provided documents.

Request

PYTHON

1 import cohere
2 
3 co = cohere.Client(api_key="<YOUR API KEY>")
4 
5 co.chat(
6     model="command-a-03-2025",
7     message="Where do the tallest penguins live?",
8     documents=[
9         {
10             "title": "Tall penguins",
11             "snippet": "Emperor penguins are the tallest.",
12         },
13         {
14             "title": "Penguin habitats",
15             "snippet": "Emperor penguins only live in Antarctica.",
16         },
17         {
18             "title": "What are animals?",
19             "snippet": "Animals are different from plants.",
20         },
21     ],
22 )

Response

JSON

1 {
2   "text": "The tallest penguins, Emperor penguins, live in Antarctica.",  
3   "citations": [  
4     {"start": 22, "end": 38, "text": "Emperor penguins", "document_ids": ["doc_0"]},  
5     {"start": 48, "end": 59, "text": "Antarctica.", "document_ids": ["doc_1"]}  
6   ],  
7   "documents": [  
8     {"id": "doc_0", "title": "Tall penguins", "snippet": "Emperor penguins are the tallest.", "url": ""},  
9     {"id": "doc_1", "title": "Penguin habitats", "snippet": "Emperor penguins only live in Antarctica.", "url": ""}  
10   ] 
11 }

The resulting generation is"The tallest penguins, Emperor penguins, live in Antarctica". The model was able to combine partial information from multiple sources and ignore irrelevant documents to arrive at the full answer.

Nice 🐧❄️!

The response also includes inline citations that reference the first two documents, since they hold the answers.

As you will learn in the following section, the Chat API will also assist you with generating search queries to fetch documents from a data source.

You can find more code and context in this colab notebook.

Three steps of RAG

The RAG workflow generally consists of 3 steps:

Generating search queries for finding relevant documents. What does the model recommend looking up before answering this question?
Fetching relevant documents from an external data source using the generated search queries. Performing a search to find some relevant information.
Generating a response with inline citations using the fetched documents. Using the acquired knowledge to produce an educated answer.

Example: Using RAG to identify the definitive 90s boy band

In this section, we will use the three step RAG workflow to finally settle the score between the notorious boy bands Backstreet Boys and NSYNC. We ask the model to provide an informed answer to the question "Who is more popular: Nsync or Backstreet Boys?"

Step 1: Generating search queries

Option 1: Using the `search_queries_only` parameter

Calling the Chat API with the search_queries_only parameter set to True will return a list of search queries. In the example below, we ask the model to suggest some search queries that would be useful when answering the question.

Request

PYTHON

1 import cohere
2 
3 co = cohere.Client(api_key="<YOUR API KEY>")
4 
5 co.chat(
6     model="command-a-03-2025",
7     message="Who is more popular: Nsync or Backstreet Boys?",
8     search_queries_only=True,
9 )

Response

JSON

1 {
2   "is_search_required": true,
3   "search_queries": [
4     {"text": "Nsync popularity", "generation_id": "b560dd68-743e-4c32-98a2-a9b7e3e96861"},
5     {"text": "Backstreet Boys popularity", "generation_id": "b560dd68-743e-4c32-98a2-a9b7e3e96861"}
6   ]
7 }

Indeed, to generate a factually accurate answer to the question "Who is more popular: Nsync or Backstreet Boys?", looking up Nsync popularity and Backstreet Boys popularity first would be helpful.

Option 2: Using a tool

If you are looking for greater control over how search queries are generated, you can use Cohere’s Tools capabilities to generate search queries

Here, we build a tool that takes a user query and returns a list of relevant document snippets for that query. The tool can generate zero, one or multiple search queries depending on the user query.

PYTHON

1 query_gen_tool = [
2     {
3         "name": "internet_search",
4         "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
5         "parameter_definitions": {
6             "queries": {
7                 "description": "a list of queries to search the internet with.",
8                 "type": "List[str]",
9                 "required": True,
10             }
11         },
12     }
13 ]
14 
15 instructions = "Write a search query that will find helpful information for answering the user's question accurately. If you need more than one search query, write a list of search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."
16 
17 response = co.chat(
18     preamble=instructions,
19     model="command-a-03-2025",
20     message="Who is more popular: Nsync or Backstreet Boys?",
21     force_single_step=True,
22     tools=query_gen_tool,
23 )
24 
25 search_queries = []
26 
27 if response.tool_calls:
28     search_queries = response.tool_calls[0].parameters["queries"]
29 
30 print(search_queries)

# Sample response
['popularity of NSync', 'popularity of Backstreet Boys']

You can then customize the preamble and/or the tool definition to generate queries that are more relevant to your use case.

For example, you can customize the preamble to encourage a longer list of search queries to be generated.

PYTHON

1 instructions_verbose = "Write many search queries that will find helpful information for answering the user's question accurately. Always write a very long list of at least 7 search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."

# Sample response
['NSync popularity', 'Backstreet Boys popularity', 'NSync vs Backstreet Boys popularity comparison', 'Which boy band is more popular NSync or Backstreet Boys', 'NSync and Backstreet Boys fan base size comparison', 'Who has sold more albums NSync or Backstreet Boys', 'NSync and Backstreet Boys chart performance comparison']

Step 2: Fetching relevant documents

The next step is to fetch documents from the relevant data source using the generated search queries. For example, to answer the question about the two pop sensations NSYNC and Backstreet Boys, one might want to use an API from a web search engine, and fetch the contents of the websites listed at the top of the search results.

We won’t go into details of fetching data in this guide, since it’s very specific to the search API you’re querying. However we should mention that breaking up long documents into smaller ones first (1-2 paragraphs) will help you not go over the context limit. When trying to stay within the context length limit, you might need to omit some of the documents from the request. To make sure that only the least relevant documents are omitted, we recommend using the Rerank endpoint endpoint which will sort the documents by relevancy to the query. The lowest ranked documents are the ones you should consider dropping first.

Step 3: Generating a response

In the final step, we will be calling the Chat API again, but this time passing along the documents you acquired in Step 2. A document object is a dictionary containing the content and the metadata of the text. We recommend using a few descriptive keys such as "title", "snippet", or "last updated" and only including semantically relevant data. The keys and the values will be formatted into the prompt and passed to the model.

Request

import cohere
co = cohere.Client(api_key="<YOUR API KEY>")
co.chat(
  model="command-a-03-2025",
  message="Who is more popular: Nsync or Backstreet Boys?",
  documents=[
    {
      "title": "CSPC: Backstreet Boys Popularity Analysis - ChartMasters",
      "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: Backstreet Boys Popularity Analysis\n\nHernán Lopez Posted on February 9, 2017 Posted in CSPC 72 Comments Tagged with Backstreet Boys, Boy band\n\nAt one point, Backstreet Boys defined success: massive albums sales across the globe, great singles sales, plenty of chart topping releases, hugely hyped tours and tremendous media coverage.\n\nIt is true that they benefited from extraordinarily good market conditions in all markets. After all, the all-time record year for the music business, as far as revenues in billion dollars are concerned, was actually 1999. That is, back when this five men group was at its peak."
    },
    {
      "title": "CSPC: NSYNC Popularity Analysis - ChartMasters",
      "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: NSYNC Popularity Analysis\n\nMJD Posted on February 9, 2018 Posted in CSPC 27 Comments Tagged with Boy band, N'Sync\n\nAt the turn of the millennium three teen acts were huge in the US, the Backstreet Boys, Britney Spears and NSYNC. The latter is the only one we haven’t study so far. It took 15 years and Adele to break their record of 2,4 million units sold of No Strings Attached in its first week alone.\n\nIt wasn’t a fluke, as the second fastest selling album of the Soundscan era prior 2015, was also theirs since Celebrity debuted with 1,88 million units sold."
    },
    {
      "title": "CSPC: Backstreet Boys Popularity Analysis - ChartMasters",
      "snippet": " 1997, 1998, 2000 and 2001 also rank amongst some of the very best years.\n\nYet the way many music consumers – especially teenagers and young women’s – embraced their output deserves its own chapter. If Jonas Brothers and more recently One Direction reached a great level of popularity during the past decade, the type of success achieved by Backstreet Boys is in a completely different level as they really dominated the business for a few years all over the world, including in some countries that were traditionally hard to penetrate for Western artists.\n\nWe will try to analyze the extent of that hegemony with this new article with final results which will more than surprise many readers."
    },
    {
      "title": "CSPC: NSYNC Popularity Analysis - ChartMasters",
      "snippet": " Was the teen group led by Justin Timberlake really that big? Was it only in the US where they found success? Or were they a global phenomenon?\n\nAs usual, I’ll be using the Commensurate Sales to Popularity Concept in order to relevantly gauge their results. This concept will not only bring you sales information for all NSYNC‘s albums, physical and download singles, as well as audio and video streaming, but it will also determine their true popularity. If you are not yet familiar with the CSPC method, the next page explains it with a short video. I fully recommend watching the video before getting into the sales figures."
    }
  ])

Response

JSON

1 {
2   "text": "Both the Backstreet Boys and *NSYNC enjoyed immense popularity during the late 1990s and early 2000s. \n\nThe Backstreet Boys, with massive album sales, chart-topping releases and highly successful tours, dominated the music industry for several years worldwide. They sold millions of copies of their albums No Strings Attached and Celebrity, breaking even the sales records of Adele and ranking as the second-fastest-selling album of the Soundscan era before 2015. \n\n*NSYNC, led by Justin Timberlake, also achieved tremendous success, selling millions of copies of their albums and finding popularity not just in the US but globally. \n\nHowever, when comparing the two groups, the extent of the Backstreet Boys' success puts them at a significantly higher level. They dominated the industry for several years, achieving unparalleled success in traditionally non-Western markets. Therefore, I can conclude that the Backstreet Boys are the more popular group.",
3   "citations": [
4     { "start": 44, "end": 101, "text": "immense popularity during the late 1990s and early 2000s.", "document_ids": ["doc_0", "doc_1", "doc_2"]},
5     { "start": 130, "end": 201, "text": "massive album sales, chart-topping releases and highly successful tours", "document_ids": ["doc_0"]},
6     // ...
7   ],
8   "documents": [
9     { "id": "doc_0", "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: Backstreet Boys Popularity Analysis\n\nHernán Lopez Posted on February 9, 2017 Posted in CSPC 72 Comments Tagged with Backstreet Boys, Boy band\n\nAt one point, Backstreet Boys defined success: massive albums sales across the globe, great singles sales, plenty of chart topping releases, hugely hyped tours and tremendous media coverage.\n\nIt is true that they benefited from extraordinarily good market conditions in all markets. After all, the all-time record year for the music business, as far as revenues in billion dollars are concerned, was actually 1999. That is, back when this five men group was at its peak.", "title": "CSPC: Backstreet Boys Popularity Analysis - ChartMasters"},
10     { "id": "doc_1", "snippet": "↓ Skip to Main Content\n\nMusic industry – One step closer to being accurate\n\nCSPC: NSYNC Popularity Analysis\n\nMJD Posted on February 9, 2018 Posted in CSPC 27 Comments Tagged with Boy band, N'Sync\n\nAt the turn of the millennium three teen acts were huge in the US, the Backstreet Boys, Britney Spears and NSYNC. The latter is the only one we haven’t study so far. It took 15 years and Adele to break their record of 2,4 million units sold of No Strings Attached in its first week alone.\n\nIt wasn’t a fluke, as the second fastest selling album of the Soundscan era prior 2015, was also theirs since Celebrity debuted with 1,88 million units sold.", "title": "CSPC: NSYNC Popularity Analysis - ChartMasters"},
11     //...
12   ]
13 }

Not only will we discover that the Backstreet Boys were the more popular band, but the model can also Tell Me Why, by providing details supported by citations.

Connectors

As an alternative to manually implementing the 3 step workflow, the Chat API offers a 1-line solution for RAG using Connectors. In the example below, specifying the web-search connector will generate search queries, use them to conduct an internet search and use the results to inform the model and produce an answer.

Request

PYTHON

1 import cohere
2 
3 co = cohere.Client(api_key="<YOUR API KEY>")
4 
5 co.chat(
6     model="command-a-03-2025",
7     message="Who is more popular: Nsync or Backstreet Boys?",
8     connectors=[{"id": "web-search"}],
9 )

Response

JSON

1 {
2   "text": "The Backstreet Boys have sold over 100 million records worldwide, making them the best-selling boy band of all time, and one of the world's best-selling music artists. They are the only boy band to have their first ten albums reach the top 10 on the Billboard 200. \n\n'NSYNC sold over 2.4 million copies in the United States during its first week of release, setting the record as the first album to have sold more than two million copies in a single week since the chart adopted Nielsen SoundScan data in May 1991. Their best-selling album, No Strings Attached sold over 11 million copies in the US alone. \n\nIt's clear that both bands have enjoyed enormous commercial success, however, based on the available sales data, The Backstreet Boys take the crown as the better-selling band of the two.",
3   "search_queries": [
4     {"text": "Nsync popularity", "generation_id": "d9c91674-a27e-493e-954c-7393dc896c5d"},
5     {"text": "Backstreet Boys popularity", "generation_id": "d9c91674-a27e-493e-954c-7393dc896c5d"}
6   ],
7   "search_results": [
8     {"search_query": {"text": "Nsync popularity", "generation_id": "d9c91674-a27e-493e-954c-7393dc896c5d"}, "document_ids": ["web-search_9:25", "web-search_9:26", "web-search_9:27"], "connector": {"id": "web-search"}},
9     {"search_query": {"text": "Backstreet Boys popularity", "generation_id": "d9c91674-a27e-493e-954c-7393dc896c5d"}, "document_ids": ["web-search_10:0", "web-search_11:0", "web-search_13:3", "web-search_18:2"], "connector": {"id": "web-search"}}
10   ],
11   "citations": [
12     {"start": 4, "end": 19, "text": "Backstreet Boys", "document_ids": ["web-search_13:3", "web-search_11:0", "web-search_18:2"]},
13     {"start": 35, "end": 64, "text": "100 million records worldwide", "document_ids": ["web-search_10:0", "web-search_13:3", "web-search_11:0", "web-search_18:2"]},
14     //...
15   ],
16   "documents": [
17     {"id": "web-search_13:3", "snippet": " The Backstreet Boys have sold over 150 million records worldwide, making them the best-selling boy band of all time, and one of the world's best-selling music artists. They are the first group since Led Zeppelin to have their first ten albums reach the top 10 on the Billboard 200, and the only boy band to do so. The albums Backstreet Boys and Millennium were both certified diamond by the Recording Industry Association of America (RIAA), making them one of the few bands to have multiple diamond albums. The group received a star on the Hollywood Walk of Fame on April 22, 2013. They also released their first documentary movie, titled Backstreet Boys: Show 'Em What You're Made Of in January 2015. In March 2017, the group began a residency in Las Vegas that lasted two years, titled Backstreet Boys: Larger Than Life and ended up being the fastest selling residency in Vegas history.", "title": "Backstreet Boys Biography, Discography, Chart History @ Top40-Charts.com - New Songs & Videos from 49 Top 20 & Top 40 Music Charts from 30 Countries", "url": "https://top40-charts.com/artist.php?aid=137"},
18     {"id": "web-search_11:0", "snippet": "The discography of American pop vocal group Backstreet Boys consists of ten studio albums, 31 singles, one live album, three compilation albums and 33 music videos. As of 2019, they have sold more than 130 million records worldwide, becoming the best-selling boy band of all time. Formed in Orlando, Florida in 1993, the group consists of Nick Carter, Brian Littrell, Kevin Richardson, A.J. McLean and Howie Dorough. Richardson left the group in 2006 to pursue other interests, but rejoined in 2012. The Backstreet Boys released their debut single \"We've Got It Goin' On\" in 1995, which peaked at number sixty-nine on the Billboard Hot 100. The single, however, entered the top ten in many European countries.", "title": "Backstreet Boys discography - Wikipedia", "url": "https://en.wikipedia.org/wiki/Backstreet_Boys_discography"},
19     //...
20   ]
21 }

In addition to the Cohere provided web-search Connector, you can register your own custom Connectors such as Google Drive, Confluence etc.

Prompt Truncation

LLMs come with limitations; specifically, they can only handle so much text as input. This means that you will often need to figure out which document sections and chat history elements to keep, and which ones to omit.

For more information, check out our dedicated doc on prompt truncation.

Citation modes

When using Retrieval Augmented Generation (RAG) in streaming mode, it’s possible to configure how citations are generated and presented. You can choose between fast citations or accurate citations, depending on your latency and precision needs:

Accurate citations: The model produces its answer first, and then, after the entire response is generated, it provides citations that map to specific segments of the response text. This approach may incur slightly higher latency, but it ensures the citation indices are more precisely aligned with the final text segments of the model’s answer. This is the default option, though you can explicitly specify it by adding the citation_quality="accurate" argument in the API call.
Fast citations: The model generates citations inline, as the response is being produced. In streaming mode, you will see citations injected at the exact moment the model uses a particular piece of external context. This approach provides immediate traceability at the expense of slightly less precision in citation relevance. You can specify it by adding the citation_quality="fast" argument in the API call.

Below are example code snippets demonstrating both approaches.

Accurate citations

PYTHON

1 documents = [
2     {
3         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
4     },
5     {
6         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
7     },
8 ]
9 
10 message = "Are there fitness-related benefits?"
11 
12 response = co.chat_stream(
13     model="command-a-03-2025",
14     message=message,
15     documents=documents,
16     citation_quality="accurate",
17 )
18 
19 for chunk in response:
20     if chunk.event_type == "text-generation":
21         print(chunk.text, end="")
22     if chunk.event_type == "citation-generation":
23         for citation in chunk.citations:
24             print("", citation.document_ids, end="")

Example response:

1 Yes, we offer gym memberships, on-site yoga classes, and comprehensive health insurance. ['doc_1']

Fast citations

PYTHON

1 documents = [
2     {
3         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
4     },
5     {
6         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
7     },
8 ]
9 
10 message = "Are there fitness-related benefits?"
11 
12 response = co.chat_stream(
13     model="command-a-03-2025",
14     message=message,
15     documents=documents,
16     citation_quality="fast",
17 )
18 
19 for chunk in response:
20     if chunk.event_type == "text-generation":
21         print(chunk.text, end="")
22     if chunk.event_type == "citation-generation":
23         for citation in chunk.citations:
24             print("", citation.document_ids, end="")

Example response:

1 Yes, we offer gym memberships, ['doc_1'] on-site yoga classes, ['doc_1'] and comprehensive health insurance. ['doc_1']

Caveats

It’s worth underscoring that RAG does not guarantee accuracy. It involves giving a model context which informs its replies, but if the provided documents are themselves out-of-date, inaccurate, or biased, whatever the model generates might be as well. What’s more, RAG doesn’t guarantee that a model won’t hallucinate. It greatly reduces the risk, but doesn’t necessarily eliminate it altogether. This is why we put an emphasis on including inline citations, which allow users to verify the information.

Three steps of RAG

Example: Using RAG to identify the definitive 90s boy band

Step 1: Generating search queries

Option 1: Using the search_queries_only parameter

Option 2: Using a tool

Step 2: Fetching relevant documents

Step 3: Generating a response

Connectors

Prompt Truncation

Citation modes

Accurate citations

Fast citations

Caveats

Option 1: Using the `search_queries_only` parameter