RAG Citations | Cohere

Accessing citations

The Chat endpoint generates fine-grained citations for its RAG response. This capability is included out-of-the-box with the Command family of models.

The following sections describe how to access the citations in both the non-streaming and streaming modes.

Non-streaming

First, define the documents to be passed as the context of the model’s response.

Cohere platform

Private deployment

PYTHON

1 # ! pip install -U cohere
2 import cohere
3 import json
4 
5 co = cohere.ClientV2(
6     "COHERE_API_KEY"
7 )  # Get your free API key here: https://dashboard.cohere.com/api-keys

PYTHON

1 documents = [
2     {
3         "data": {
4             "title": "Tall penguins",
5             "snippet": "Emperor penguins are the tallest.",
6         }
7     },
8     {
9         "data": {
10             "title": "Penguin habitats",
11             "snippet": "Emperor penguins only live in Antarctica.",
12         }
13     },
14 ]

In the non-streaming mode (using chat to generate the model response), the citations are provided in the message.citations field of the response object.

Each citation object contains:

start and end: the start and end indices of the text that cites a source(s)
text: its corresponding span of text
sources: the source(s) that it references

1 messages = [
2     {"role": "user", "content": "Where do the tallest penguins live?"}
3 ]
4 
5 response = co.chat(
6     model="command-r-08-2024",
7     messages=messages,
8     documents=documents,
9 )
10 
11 print(response.message.content[0].text)
12 
13 for citation in response.message.citations:
14     print(citation, "\n")

Example response:

1 The tallest penguins are the Emperor penguins. They only live in Antarctica.
2 
3 start=29 end=46 text='Emperor penguins.' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT' 
4 
5 start=65 end=76 text='Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'

Streaming

In a streaming scenario (using chat_stream to generate the model response), the citations are provided in the citation-start events.

Each citation object contains the same fields as the non-streaming scenario.

1 messages = [
2     {"role": "user", "content": "Where do the tallest penguins live?"}
3 ]
4 
5 response = co.chat_stream(
6     model="command-a-03-2025",
7     messages=messages,
8     documents=documents,
9 )
10 
11 response_text = ""
12 citations = []
13 for chunk in response:
14     if chunk:
15         if chunk.type == "content-delta":
16             response_text += chunk.delta.message.content.text
17             print(chunk.delta.message.content.text, end="")
18         if chunk.type == "citation-start":
19             citations.append(chunk.delta.message.citations)
20 
21 for citation in citations:
22     print(citation, "\n")

Example response:

1 The tallest penguins are the Emperor penguins, which only live in Antarctica.
2 
3 start=29 end=45 text='Emperor penguins' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT' 
4 
5 start=66 end=77 text='Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'

Document ID

When passing the documents as context, you can optionally add custom IDs to the id field in the document object. These IDs will be used by the endpoint as the citation reference.

If you don’t provide the id field, the ID will be auto-generated in the the format of doc:<auto_generated_id>. Example: doc:0.

Here is an example of using custom IDs. Here, we are adding custom IDs 100 and 101 to each of the two documents we are passing as context.

PYTHON

1 # ! pip install -U cohere
2 import cohere
3 import json
4 
5 co = cohere.ClientV2(
6     "COHERE_API_KEY"
7 )  # Get your free API key here: https://dashboard.cohere.com/api-keys
8 
9 documents = [
10     {
11         "data": {
12             "title": "Tall penguins",
13             "snippet": "Emperor penguins are the tallest.",
14         },
15         "id": "100",
16     },
17     {
18         "data": {
19             "title": "Penguin habitats",
20             "snippet": "Emperor penguins only live in Antarctica.",
21         },
22         "id": "101",
23     },
24 ]

When document IDs are provided, the citation will refer to the documents using these IDs.

1 messages = [
2     {"role": "user", "content": "Where do the tallest penguins live?"}
3 ]
4 
5 response = co.chat(
6     model="command-a-03-2025",
7     messages=messages,
8     documents=documents,
9 )
10 
11 print(response.message.content[0].text)

Note the id fields in the citations, which refer to the IDs in the document object.

Example response:

1 The tallest penguins are the Emperor penguins, which only live in Antarctica.
2 
3 start=29 end=45 text='Emperor penguins' sources=[DocumentSource(type='document', id='100', document={'id': '100', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT' 
4 
5 start=66 end=77 text='Antarctica.' sources=[DocumentSource(type='document', id='101', document={'id': '101', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'

In contrast, here’s an example citation when the IDs are not provided.

Example response:

1 The tallest penguins are the Emperor penguins, which only live in Antarctica.
2 
3 start=29 end=45 text='Emperor penguins' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT' 
4 
5 start=66 end=77 text='Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'

Citation modes

When running RAG in streaming mode, it’s possible to configure how citations are generated and presented. You can choose between fast citations or accurate citations, depending on your latency and precision needs.

Accurate citations

The model produces its answer first, and then, after the entire response is generated, it provides citations that map to specific segments of the response text. This approach may incur slightly higher latency, but it ensures the citation indices are more precisely aligned with the final text segments of the model’s answer.

This is the default option, or you can explicitly specify it by adding the citation_options={"mode": "accurate"} argument in the API call.

Here is an example using the same list of pre-defined messages as the above.

With the citation_options mode set to accurate, we get the citations after the entire response is generated.

1 documents = [
2     {
3         "data": {
4             "title": "Tall penguins",
5             "snippet": "Emperor penguins are the tallest.",
6         },
7         "id": "100",
8     },
9     {
10         "data": {
11             "title": "Penguin habitats",
12             "snippet": "Emperor penguins only live in Antarctica.",
13         },
14         "id": "101",
15     },
16 ]
17 
18 messages = [
19     {"role": "user", "content": "Where do the tallest penguins live?"}
20 ]
21 
22 response = co.chat_stream(
23     model="command-a-03-2025",
24     messages=messages,
25     documents=documents,
26     citation_options={"mode": "accurate"},
27 )
28 
29 response_text = ""
30 citations = []
31 for chunk in response:
32     if chunk:
33         if chunk.type == "content-delta":
34             response_text += chunk.delta.message.content.text
35             print(chunk.delta.message.content.text, end="")
36         if chunk.type == "citation-start":
37             citations.append(chunk.delta.message.citations)
38 
39 print("\n")
40 for citation in citations:
41     print(citation, "\n")

Example response:

1 The tallest penguins are the Emperor penguins. They live in Antarctica.
2 
3 start=29 end=46 text='Emperor penguins.' sources=[DocumentSource(type='document', id='100', document={'id': '100', 'snippet': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type='TEXT_CONTENT' 
4 
5 start=60 end=71 text='Antarctica.' sources=[DocumentSource(type='document', id='101', document={'id': '101', 'snippet': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type='TEXT_CONTENT'

Fast citations

The model generates citations inline, as the response is being produced. In streaming mode, you will see citations injected at the exact moment the model uses a particular piece of external context. This approach provides immediate traceability at the expense of slightly less precision in citation relevance.

You can specify it by adding the citation_options={"mode": "fast"} argument in the API call.

With the citation_options mode set to fast, we get the citations inline as the model generates the response.

1 documents = [
2     {
3         "data": {
4             "title": "Tall penguins",
5             "snippet": "Emperor penguins are the tallest.",
6         },
7         "id": "100",
8     },
9     {
10         "data": {
11             "title": "Penguin habitats",
12             "snippet": "Emperor penguins only live in Antarctica.",
13         },
14         "id": "101",
15     },
16 ]
17 
18 messages = [
19     {"role": "user", "content": "Where do the tallest penguins live?"}
20 ]
21 
22 response = co.chat_stream(
23     model="command-a-03-2025",
24     messages=messages,
25     documents=documents,
26     citation_options={"mode": "fast"},
27 )
28 
29 response_text = ""
30 for chunk in response:
31     if chunk:
32         if chunk.type == "content-delta":
33             response_text += chunk.delta.message.content.text
34             print(chunk.delta.message.content.text, end="")
35         if chunk.type == "citation-start":
36             print(
37                 f" [{chunk.delta.message.citations.sources[0].id}]",
38                 end="",
39             )

Example response:

1 The tallest penguins [100] are the Emperor penguins [100] which only live in Antarctica. [101]