Summarize API


This Guide Uses the Summarize Endpoint.

You can find the API reference for the endpoint here.

This endpoint generates a succinct version of the original text that relays the most important information.

Ideal use cases include, but are not limited to: news articles, blogs, chat transcripts, scientific articles, meeting notes, and any text that you should like to see a summary of!

The endpoint can:

  • Summarize a single document
  • Control output length


Experimental Features

These features are extremely experimental. Using these feature could lead to a substantial decrease in performance over the overall model. It is included as a feature based on user feedback — and our team is actively working on delivering a better solution. Because it is critical for some applications, we have exposed an experimental version. If you do try it out, we welcome your feedback.

  • Format chosen output
  • Handle long documents
  • Provide additional instructions to focus the summary

We recommend leveraging the playground for quick use cases, but for any repeated utilizations we strongly recommend the API. An example is provided below.

In this example, we want to summarize a passage from a news article into its main point.

1. Set up

First, let's install the SDK (the examples below are in Python, Typescript, and Go):

pip install cohere
npm i -s cohere-ai
go get

Import dependencies and set up the Cohere client.

import cohere
co = cohere.Client('Your API key')
import { CohereClient } from "cohere-ai";

const cohere = new CohereClient({
    token: "YOUR_API_KEY",

(async () => {
    const prediction = await cohere.generate({
        prompt: "hello",
        maxTokens: 10,
    console.log("Received prediction", prediction);
import cohereclient ""

client := cohereclient.NewClient(cohereclient.WithToken("<YOUR_AUTH_TOKEN>"))

(All the rest of the examples on this page will be in Python, but you can find more detailed instructions for getting set up by checking out the Github repositories for Python, Typescript, and Go.)

2. Create prompt

Store the document you want to summarize into a variable

text ="""It's an exciting day for the development community. Cohere's state-of-the-art language AI is now available through Amazon SageMaker. This makes it easier for developers to deploy Cohere's pre-trained generation language model to Amazon SageMaker, an end-to-end machine learning (ML) service. Developers, data scientists, and business analysts use Amazon SageMaker to build, train, and deploy ML models quickly and easily using its fully managed infrastructure, tools, and workflows.
At Cohere, the focus is on language. The company's mission is to enable developers and businesses to add language AI to their technology stack and build game-changing applications with it. Cohere helps developers and businesses automate a wide range of tasks, such as copywriting, named entity recognition, paraphrasing, text summarization, and classification. The company builds and continually improves its general-purpose large language models (LLMs), making them accessible via a simple-to-use platform. Companies can use the models out of the box or tailor them to their particular needs using their own custom data.
Developers using SageMaker will have access to Cohere's Medium generation language model. The Medium generation model excels at tasks that require fast responses, such as question answering, copywriting, or paraphrasing. The Medium model is deployed in containers that enable low-latency inference on a diverse set of hardware accelerators available on AWS, providing different cost and performance advantages for SageMaker customers.

3. Define model settings

The endpoint has a number of settings you can use to control the kind of output it generates. The full list is available in the API reference, but let’s look at a few:

  • model - command or command-lite. Generally, lite models are faster while larger models will perform better.
  • temperature - This parameter ranges from 1 to 5, and controls the randomness of the output. Higher values tend to generate more creative outcomes, and gives you the opportunity of generating various summaries for the same input text. It also might include more hallucinations, and it might make the model less likely to ground its replies in the context you've provided when using retrieval augmented generation. Use a higher value if for example you plan to perform a selection of various summaries afterwards.
  • length - You can choose between short, medium and long. short summaries are roughly up to two sentences long, medium between three and five, and long might have more six or more sentences.
  • format - You can choose between paragraph and bullets. Paragraph generates a coherent sequence of sentences, while bullets outputs the summary in bullet points.
  • extractiveness - This parameter can be set at low, medium, high values.

4. Generate the summary

Call the endpoint via the co.summarize() method, specifying the prompt and the rest of the model settings.

response = co.summarize(

summary = response.summary

5. Limitations

As with any work building atop statistical large language models, there is the risk that the output contains facts not present in the original document. These hallucinations might be innocuous, in the sense that they enrich the summary with additional facts, but they can also contain inaccuracies.

The control parameters of length and extractivenesss have an impact on the final output, but are not absolute. For instance, a low extractive summary can still contain a sentence taken verbatim from the original document, and a long summary can still be less than six sentences long.