Citations for tool use (function calling)
Accessing citations
The Chat endpoint generates fine-grained citations for its tool use response. This capability is included out-of-the-box with the Command family of models.
The following sections describe how to access the citations in both the non-streaming and streaming modes.
Non-streaming
First, define the tool and its associated schema.
Cohere platform
Private deployment
Next, run the tool calling and execution steps.
In the non-streaming mode (using chat
to generate the model response), the citations are provided in the message.citations
field of the response object.
Each citation object contains:
start
andend
: the start and end indices of the text that cites a source(s)text
: its corresponding span of textsources
: the source(s) that it references
Example response:
Streaming
In a streaming scenario (using chat_stream
to generate the model response), the citations are provided in the citation-start
events.
Each citation object contains the same fields as the non-streaming scenario.
Example response:
Document ID
When passing the tool results from the tool execution step, you can optionally add custom IDs to the id
field in the document
object. These IDs will be used by the endpoint as the citation reference.
If you don’t provide the id
field, the ID will be auto-generated in the the format of <tool_call_id>:<auto_generated_id>
. Example: get_weather_1byjy32y4hvq:0
.
Here is an example of using custom IDs. To keep it concise, let’s start with a pre-defined list of messages
with the user query, tool calling, and tool results are already available.
When document IDs are provided, the citation will refer to the documents using these IDs.
Note the id
fields in the citations, which refer to the IDs in the document
object.
Example response:
In contrast, here’s an example citation when the IDs are not provided.
Example response:
Citation modes
When running tool use in streaming mode, it’s possible to configure how citations are generated and presented. You can choose between fast citations or accurate citations, depending on your latency and precision needs.
Accurate citations
The model produces its answer first, and then, after the entire response is generated, it provides citations that map to specific segments of the response text. This approach may incur slightly higher latency, but it ensures the citation indices are more precisely aligned with the final text segments of the model’s answer.
This is the default option, or you can explicitly specify it by adding the citation_options={"mode": "accurate"}
argument in the API call.
Here is an example using the same list of pre-defined messages
as the above.
With the citation_options
mode set to accurate
, we get the citations after the entire response is generated.
Example response:
Fast citations
The model generates citations inline, as the response is being produced. In streaming mode, you will see citations injected at the exact moment the model uses a particular piece of external context. This approach provides immediate traceability at the expense of slightly less precision in citation relevance.
You can specify it by adding the citation_options={"mode": "fast"}
argument in the API call.
With the citation_options
mode set to fast
, we get the citations inline as the model generates the response.
Example response: