Release Notes


Command models get an August refresh

Today we’re announcing updates to our flagship generative AI model series: Command R and Command R+. These models demonstrate improved performance on a variety of tasks.

The latest model versions are designated with timestamps, as follows:

  • The updated Command R is command-r-08-2024 on the API.
  • The updated Command R+ is command-r-plus-08-2024 on the API.

In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing.

Technical Details

command-r-08-2024 shows improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, command-r-08-2024 is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.

command-r-08-2024 delivers around 50% higher throughput and 20% lower latencies as compared to the previous Command R version, while cutting the hardware footprint required to serve the model by half. Similarly, command-r-plus-08-2024 delivers roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.

Both models include the following feature improvements:

  • For tool use, command-r-08-2024 and command-r-plus-08-2024 have demonstrated improved decision-making around which tool to use in which context, and whether or not to use a tool.
  • Improved instruction following in the preamble.
  • Improved multilingual RAG searches in the language of the user with improved responses.
  • Better structured data analysis for structured data manipulation.
  • Better structured data creation from unstructured natural language instructions.
  • Improved robustness to non-semantic prompt changes like white space or new lines.
  • The models will decline unanswerable questions.
  • The models have improved citation quality and users can now turn off citations for RAG workflows.
  • For command-r-08-2024 there are meaningful improvements on length and formatting control.

New Feature: Safety Modes

The primary new feature available in both command-r-08-2024 and command-r-plus-08-2024 is Safety Modes (in beta). For our enterprise customers building with our models, what is considered safe depends on their use case and the context the model is deployed in. To support diverse enterprise applications, we have developed safety modes, acknowledging that safety and appropriateness are context-dependent, and that predictability and control are critical in building confidence in Cohere models.

Safety guardrails have traditionally been reactive and binary, and we’ve observed that users often have difficulty defining what safe usage means to them for their use case. Safety Modes introduce a nuanced approach that is context sensitive.

(Note: Command R/R+ have built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.)

Safety modes are activated through a safety_mode parameter, which can (currently) be in one of two modes:

  • "STRICT": Encourages avoidance of all sensitive topics. Strict content guardrails provide an extra safe experience by prohibiting inappropriate responses or recommendations. Ideal for general and enterprise use.
  • "CONTEXTUAL": (enabled by default): For wide-ranging interactions with fewer constraints on output while maintaining core protections. The model responds as instructed while still rejecting harmful or illegal suggestions. Well-suited for entertainment, creative, educational use.

You can also opt out of the safety modes beta by setting safety_mode="NONE". For more information, check out our dedicated guide to Safety Modes.

Pricing

Here’s a breakdown the pricing structure for the new models:

  • For command-r-plus-08-2024, input tokens are priced at $2.50/M and output tokens at $10.00/M.
  • For command-r-08-2024, input tokens are priced at $0.15/M and output tokens at $0.60/M.

Force JSON object response format

Users can now force command-nightlyto generate outputs in JSON objects by setting the response_format parameter in the Chat API. Users can also specify a JSON schema for the output.

This feature is available across all of Cohere’s SDKs (Python, Typescript, Java, Go).

Example request for forcing JSON response format:

cURL
1POST https://api.cohere.ai/v1/chat
2{
3 "message": "Generate a JSON that represents a person, with name and age",
4 "model": "command-nightly",
5 "response_format": {
6 "type": "json_object"
7 }
8}

Example request for forcing JSON response format in user defined schema:

cURL
1POST https://api.cohere.ai/v1/chat
2{
3 "message": "Generate a JSON that represents a person, with name and age",
4 "model": "command-nightly",
5 "response_format": {
6 "type": "json_object",
7 "schema": {
8 "type": "object",
9 "required": ["name", "age"],
10 "properties": {
11 "name": { "type": "string" },
12 "age": { "type": "integer" }
13 }
14 }
15 }
16}

Currently only compatible with `command-nightly model.


Release Notes for June 10th 2024: Updates to Tool Use, SDKs, Billing

Multi-step tool use now default in Chat API

Tool use is a technique which allows developers to connect Cohere’s Command family of models to external tools like search engines, APIs, functions, databases, etc. It comes in two variants, single-step and multi-step, both of which are available through Cohere’s Chat API.

As of today, tool use will now be multi-step by default. Here are some resources to help you get started:

We’ve published additional docs!

Cohere’s models and functionality are always improving, and we’ve recently dropped the following guides to help you make full use of our offering:

  • Predictable outputs - Information about the seed parameter has been added, giving you more control over the predictability of the text generated by Cohere models.
  • Using Cohere SDKs with private cloud models - To maximize convenience in building on and switching between Cohere-supported environments, our SDKs have been developed to allow seamless support of whichever cloud backend you choose. This guide walks you through when you can use Python, Typescript, Go, and Java on Amazon Bedrock, Amazon SageMaker, Azure, and OCI, what features and parameters are supported, etc.

Changes to Billing

Going forward, Cohere is implementing the following two billing policies:

  • When a user accrues $150 of outstanding debts, a warning email will be sent alerting them of upcoming charges.
  • When a self-serve customer (i.e. a non-contracted organization with a credit card on file) accumulates $250 in outstanding debts, a charge will be forced via Stripe.

Advanced Retrieval Launch

We’re pleased to announce the release of Rerank 3 our newest and most performant foundational model for ranking. Rerank 3 boast a context length of 4096, SOTA performance on Code Retrieval, Long Document, and Semi-Structured Data. In addition to quality improvements, we’ve improved inference speed by a factor of 2x for short documents (doc length < 512 tokens) and 3x for long documents (doc length ~4096 tokens).


Python SDK v5.2.0 release

We’ve released an additional update for our Python SDK! Here are the highlights.

  • The tokenize and detokenize functions in the Python SDK now default to using a local tokenizer.
  • When using the local tokenizer, the response will not include token_strings, but users can revert to using the hosted tokenizer by specifying offline=False.
  • Also, model will now be a required field.
  • For more information, see the guide for tokens and tokenizers.


Command R: Retrieval-Augmented Generation at Production Scale

Today, we are introducing Command R, a new LLM aimed at large-scale production workloads. Command R targets the emerging “scalable” category of models that balance high efficiency with strong accuracy, enabling companies to move beyond proof of concept, and into production.

Command R is a generative model optimized for long context tasks such as retrieval-augmented generation (RAG) and using external APIs and tools. It is designed to work in concert with our industry-leading Embed and Rerank models to provide best-in-class integration for RAG applications and excel at enterprise use cases. As a model built for companies to implement at scale, Command R boasts:

  • Strong accuracy on RAG and Tool Use
  • Low latency, and high throughput
  • Longer 128k context and lower pricing
  • Strong capabilities across 10 key languages
  • Model weights available on HuggingFace for research and evaluation

For more information, check out the official blog post or the Command R documentation.



Python SDK v5.0.0

With the release of our latest Python SDK, there are a number of functions that are no longer supported, including create_custom_models.

For more granular instructions on upgrading to the new SDK, and what that will mean for your Cohere integrations, see the comprehensive migration guide.


Release Notes January 22, 2024

Apply Cohere’s AI with Connectors!

One of the most exciting applications of generative AI is known as “retrieval augmented generation” (RAG). This refers to the practice of grounding the outputs of a large language model (LLM) by offering it resources — like your internal technical documentation, chat logs, etc. — from which to draw as it formulates its replies.

Cohere has made it much easier to utilize RAG in bespoke applications via Connectors. As the name implies, Connectors allow you to connect Cohere’s generative AI platform up to whatever resources you’d like it to ground on, facilitating the creation of a wide variety of applications — customer service chatbots, internal tutors, or whatever else you want to build.

Our docs cover how to create and deploy connectors, how to manage your connectors , how to handle authentication, and more!

Expanded Fine-tuning Functionality

Cohere’s ready-to-use LLMs, such as Command, are very good at producing responses to natural language prompts. However, there are many cases in which getting the best model performance requires performing an additional round of training on custom user data. This is process known as fine-tuning, and we’ve dramatically revamped our fine-tuning documentation.

The new docs are organized according to the major endpoints, and we support fine-tuning for Generate, Classify, Rerank, and Chat.

But wait, there’s more: many developers work with generative AI through popular cloud-compute platforms like Amazon Web Services (AWS), and we support fine-tuning on AWS Bedrock. We also support fine-tuning with Sagemaker, and the relevant documentation will be published in the coming weeks.

A new Embed Jobs API Endpoint Has Been Released

The Embed Jobs API was designed for users who want to leverage the power of retrieval over large corpuses of information. Encoding a large volume of documents with an API can be tedious and difficult, but the Embed Jobs API makes it a breeze to handle encoding workflows involving 100,000 documents, or more!

The API works in conjunction with co.embed(). For more information, consult the docs.

Our SDK now Supports More Languages

Throughout our documentation you’ll find code-snippets for performing common tasks with Python. Recently, we made the decision to expand these code snippets to include Typescript and Go, and are working to include several other popular languages as well.


Release Notes September 29th 2023

We’re Releasing co.chat() and the Chat + RAG Playground

We’re pleased to announce that we’ve released our co.chat() beta! Of particular importance is the fact that the co.chat() API is able to utilize retrieval augmented generation (RAG), meaning developers can provide sources of context that inform and ground the model’s output.

This represents a leap forward in the accuracy, verifiability, and timeliness of our generative AI offering. For our public beta, developers can connect co.chat() to web search or plain text documents.

Access to the co.chat() public beta is available through an API key included with a Cohere account.

Our Command Model has Been Updated

We’ve updated both the command and command-light models. Expect improved question answering, generation quality, rewriting and conversational capabilities.

New Rate Limits

For all trial keys and all endpoints, there is now a rate limit of 5000 calls per month.


Release Notes August 8th 2023

Command Model Updated The Command model has been updated. Expect improvements in reasoning and conversational capabilities.

Finetuning SDK Programmatic finetuning via the Cohere SDK has been released. Full support for the existing finetuning capabilites along with added capabilities such as configuring hyperparameters. Learn more here.

Okta OIDC Support We’ve introduced support for Okta SSO leveraging the OIDC protocol. If you are interested in support for your account, please reach out to support@cohere.com directly.


Release Notes June 28th 2023

Command Model Updated
The Command model has been updated. Expect improved code and conversational capabilities, as well as reasoning skills on various tasks.

Co.rerank()
co.rerank(), a new API that sorts a list of text inputs by semantic relevance to a query. Learn more here.

Streaming Now Part of Generate API
Token streaming is now supported via the Co.generate() api. Learn more here.

Usage and Billing Table Improvements
The usage and billing table in the Cohere dashboard now has filtering and sorting capabilities. See here.


New Maximum Number of Input Documents for Rerank

We have updated how the maximum number of documents is calculated for co.rerank. The endpoint will error if len(documents) * max_chunks_per_doc >10,000 where max_chunks_per_doc is set to 10 as default.


Model Names Are Changing!

We are updating the names of our models to bring consistency and simplicity to our product offerings. As of today, you will be able to call Cohere’s models via our API and SDK with the new model names, and all of our documentation has been updated to reflect the new naming convention.

These changes are backwards compatible, so you can also continue to call our models with the Previous Names found in the table below until further notice.

Model TypePrevious NameNew NameEndpoint
Generativecommand-xlargecommandco.generate
Generativecommand-mediumcommand-lightco.generate
Generativecommand-xlarge-nightlycommand-nightlyco.generate
Generativecommand-medium-nightlycommand-light-nightlyco.generate
Generativexlargebaseco.generate
Generativemediumbase-lightco.generate
Embeddingslargeembed-english-v2.0co.embed
Embeddingssmallembed-english-light-v2.0co.embed
Embeddingsmultilingual-22-12embed-multilingual-v2.0co.embed

See the latest information about our models here: /docs/models

If you experience any issues in accessing our models, please reach out to support@cohere.com.


Multilingual Support for Co.classify

The co.classify endpoint now supports the use of Cohere’s multilingual embedding model. The multilingual-22-12 model is now a valid model input in the co.classify call.


Command Model Nightly Available!

Nightly versions of our Common models are now available. This means that every week, you can expect the performance of command-nightly to improve as we continually retrain them.

Command-nightly will be available in two sizes - medium and xlarge. The xlarge model demonstrates better performance, and medium is a great option for developers who require fast response, like those building chatbots. You can find more information here.

If you were previously using the command-xlarge-20221108 model, you will now be redirected to the command-xlarge-nightly model. Please note that access to the command-xlarge-20221108 model will be discontinued after January 30, 2023. The command-xlarge-nightly model has shown enhancements in all generative tasks, and we anticipate you will notice an improvement.


Command R+ is a scalable LLM for business

We’re pleased to announce the release of Command R+, our newest and most performant large language model. Command R+ is optimized for conversational interaction and long-context tasks, and it is the recommended model for use cases requiring high performance and accuracy.

Command R+ has been trained on a massive corpus of diverse texts in multiple languages, and can perform a wide array of text-generation tasks. You’ll find it especially strong for complex RAG functionality, as well as workflows that lean on multi-step tool use to build agents.

Multi-step Tool Use

Speaking of multi-step tool use, this functionality is now available for Command R+ models.

Multi-step tool use allows the model to call any number of tools in any sequence of steps, using the results from one tool call in a subsequent step until it has found a solution to a user’s problem. This process allows the model to reason, perform dynamic actions, and quickly adapt on the basis of information coming from external sources.


Multilingual Text Understanding Model + Language Detection!

Cohere’s multilingual text understanding model is now available! The multilingual-22-12 model can be used to semantically search within a single language, as well as across languages. Compared to keyword search, where you often need separate tokenizers and indices to handle different languages, the deployment of the multilingual model for search is trivial: no language-specific handling is needed — everything can be done by a single model within a single index.

In addition to our new model, you can now detect the language of a data source using co.detect_language() endpoint.

For more information, see our multilingual docs.


Model Sizing Update + Improvements

Effective December 2, 2022, we will be consolidating our generative models and only serving our Medium (focused on speed) and X-Large (focused on quality). We will also be discontinuing support for our Medium embedding model.

This means that as of this date, our Small and Large generative models and Medium embedding model will be deprecated.

If you are currently using a Small or Large generative model, then we recommend that you proactively change to a Medium or X-Large model before December 2, 2022. Additionally, if you are currently using a Medium embed model, we recommend that you change to a Small embed model.

Calls to Generate large-20220926 and xlarge-20220609 will route to the new and improved X-Large model (xlarge-20221108). Calls to Generate small-20220926 will route to the new and improved Medium model (medium-20221108).

If you have any questions or concerns about this change, please don’t hesitate to contact us at: team@cohere.com.


Improvements to Current Models + New Beta Model (Command)!

New & Improved Medium & Extremely Large

The new and improved medium and x-large outperform our existing generation models on most downstream tasks, including summarization, paraphrasing, classification, and extraction, as measured by our internal task-based benchmarks.
At this time, all baseline calls to x-large and medium will still route to previous versions of the models (namely, xlarge-20220609 and medium-20220926). To access the new and improved versions, you’ll need to specify the release date in the Playground or your API call: xlarge-20221108 and medium-20221108.

Older versions of the models (xlarge-20220609 & medium-20220926) will be deprecated on December 2, 2022.

NEW Command Model (Beta)

We’re also introducing a Beta of our new Command model, a generative model that’s conditioned to respond well to single-statement commands. Learn more about how to prompt command-xlarge-20221108. You can expect to see command-xlarge-20221108 evolve dramatically in performance over the coming weeks.


New Look For Docs!

We’ve updated our docs to better suit our new developer journey! You’ll have a sleeker, more streamlined documentation experience.

New Features

  • Interactive quickstart tutorials!
  • Improved information architecture!
  • UI refresh!

Try out the new experience, and let us know what you think.


Co.classify powered by our Representational model embeddings

The Co.classify endpoint now serves few-shot classification tasks using embeddings from our Representational model for the small, medium, and large default models.


New Logit Bias experimental parameter

Our Generative models have now the option to use the new logit_bias parameter to prevent the model from generating unwanted tokens or to incentivize it to include desired tokens. Logit bias is supported in all our default Generative models.


Pricing Update and New Dashboard UI

  • Free, rate limited Trial Keys for experimentation, testing, and playground usage
  • Production keys with no rate limit for serving Cohere in production applications
  • Flat rate pricing for Generate and Embed endpoints
  • Reduced pricing for Classify endpoint
  • New UI for dashboard including sign up and onboarding - everything except playground
  • New use-case specific Quickstart Guides to learn about using Cohere API
  • Replacing “Finetune” nomenclature with “Custom Model”
  • Inviting team members is now more intuitive. Teams enable users to share custom models with each other
  • Generative custom models now show accuracy and loss metrics alongside logs
  • Embed and Classify custom models now show logs alongside accuracy, loss, precision, f1, recall
  • Custom model details now show number of each label in dataset

Introducing Moderate (Beta)!

Use Moderate (Beta) to classify harmful text across the following categories: profane, hate speech, violence, self-harm, sexual, sexual (non-consenual), harassment, spam, information hazard (e.g., pii). Moderate returns an array containing each category and its associated confidence score. Over the coming weeks, expect performance to improve significantly as we optimize the underlying model.


Model parameter now optional.

Our APIs no longer require a model to be specified. Each endpoint comes with great defaults. For more control, a model can still be specified by adding a model param in the request.


New & Improved Generation and Representation Models

We’ve retrained our small, medium, and large generation and representation models. Updated representation models now support contexts up to 4096 tokens (previously 1024 tokens). We recommend keeping text lengths below 512 tokens for optimal performance; for any text longer than 512 tokens, the text is spliced and the resulting embeddings of each component are then averaged and returned.


New Extremely Large Model!

Our new and improved xlarge has better generation quality and a 4x faster prediction speed. This model now supports a maximum token length of 2048 tokens and frequency and presence penalties.


Updated Small, Medium, and Large Generation Models

Updated small, medium, and large models are more stable and resilient against abnormal inputs due to a FP16 quantization fix. We also fixed a bug in generation presence & frequency penalty, which will result in more effective penalties.


Classification Endpoint

Classification is now available via our classification endpoint. This endpoint is currently powered by our generation models (small and medium) and supports few-shot classification. We will be deprecating support for Choose Best by May 18th. To learn more about classification at Cohere check out the docs here.


Finetuning Available + Policy Updates

Finetuning is Generally Available

You no longer need to wait for Full Access approval to build your own custom finetuned generation or representation model. Upload your dataset and start seeing even better performance for your specific task.

Policy Updates

The Cohere team continues to be focused on improving our products and features to enable our customers to build powerful NLP solutions. To help reflect some of the changes in our product development and research process, we have updated our Terms of Use, Privacy Policy, and click-through SaaS Agreement. Please carefully read and review these updates. By continuing to use Cohere’s services, you acknowledge that you have read, understood, and consent to all of the changes. If you have any questions or concerns about these updates, please contact us at support@cohere.ai.


New & Improved Generation Models

We’ve shipped updated small, medium, and large generation models. You’ll find significant improvements in performance that come from our newly assembled high quality dataset.


Extremely Large (Beta) Release

Our biggest and most performant generation model is now available. Extremely Large (Beta) outperforms our previous large model on a variety of downstream tasks including sentiment analysis, named entity recognition (NER) and common sense reasoning, as measured by our internal benchmarks. You can access Extremely Large (Beta) as xlarge-20220301. While in Beta, note that this model will have a maximum token length of 1024 tokens and maximum num_generations of 1.


Larger Representation Models

Representation Models are now available in the sizes of medium-20220217 and large-20220217 as well as an updated version of small-20220217. Our previous small model will be available as small-20211115. In addition, the maximum tokens length per text has increased from 512 to 1024. We recommend keeping text lengths below 128 tokens for optimal performance; for any text longer than 128 tokens, the text is spliced and the resulting embeddings of each component are then averaged and returned.