Documents and Citations

With retrieval augmented generation (RAG), it’s possible to feed the model context to ground its replies. Large language models are often quite good at generating sensible output on their own, but they’re well-known to hallucinate factually incorrect, nonsensical, or incomplete information in their replies, which can be problematic for certain use cases.

RAG substantially reduces this problem by giving the model source material to work with. Rather than simply generating an output based on the input prompt, the model can pull information out of this material and incorporate it into its reply.

Here’s an example of using RAG with the Chat endpoint. We’re asking the co.chat() about penguins, and uploading documents for it to use:

PYTHON
1import cohere
2co = cohere.ClientV2(api_key="<YOUR API KEY>")
3
4# Retrieve the documents
5documents = [
6 {
7 "data": {
8 "title": "Tall penguins",
9 "snippet": "Emperor penguins are the tallest."
10 }
11 },
12 {
13 "data": {
14 "title": "Penguin habitats",
15 "snippet": "Emperor penguins only live in Antarctica."
16 }
17 },
18 {
19 "data": {
20 "title": "What are animals?",
21 "snippet": "Animals are different from plants."
22 }
23 }
24]
25
26messages = [{'role': 'user', 'content': "Where do the tallest penguins live?"}]
27
28response = co.chat(
29 model="command-r-plus-08-2024",
30 documents=documents,
31 messages=messages)

Here’s an example reply:

# response.message.content
[AssistantMessageResponseContentItem_Text(text='The tallest penguins are the Emperor penguins. They only live in Antarctica.', type='text')]
# response.message.citations
[Citation(start=29,
end=46,
text='Emperor penguins.',
sources=[Source_Document(id='doc:0:0',
document={'id': 'doc:0:0',
'snippet': 'Emperor penguins are the tallest.',
'title': 'Tall penguins'},
type='document')]),
Citation(start=65,
end=76,
text='Antarctica.',
sources=[Source_Document(id='doc:0:1',
document={'id': 'doc:0:1',
'snippet': 'Emperor penguins only live in Antarctica.',
'title': 'Penguin habitats'},
type='document')])]

Observe that the payload includes a list of documents with a “snippet” field containing the information we want the model to use. The recommended length for the snippet of each document is relatively short, 300 words or less. We recommend using field names similar to the ones we’ve included in this example (i.e. “title” and “snippet” ), but RAG is quite flexible with respect to how you structure the documents. You can give the fields any names you want, and can pass in other fields as well, such as a “date” field. All field names and field values are passed to the model.

Also, we can clearly see that it has utilized the document. Our first document says that Emperor penguins are the tallest penguin species, and our second says that Emperor penguins can only be found in Antarctica. The model’s reply, response.message.content[0].text,successfully synthesizes both of these facts: “The tallest penguins, Emperor penguins, live in Antarctica.”

Finally, note that the output contains a citations object, response.message.citations, that tells us not only which documents the model relied upon (from the sources fields), but also the particular part of the claim supported by a particular document (with the start and end fields, which are spans that tell us the location of the supported claim inside the reply). This citation object is included because the model was able to use the documents provided, but if it hadn’t been able to do so, no citation object would be present.

You can experiment with RAG in the chat playground.