Release Notes


Announcing Rerank-v3.5

We’re pleased to announce the release of Rerank 3.5 our newest and most performant foundational model for ranking. Rerank 3.5 has a context length of 4096, SOTA performance on Multilingual Retrieval tasks and Reasoning Capabilities. In addition, Rerank 3.5 has SOTA performance on BEIR and domains such as Finance, E-commerce, Hospitality, Project Management, and Email/Messaging Retrieval tasks.

In the rest of these release notes, we’ll provide more details about changes to the api.

Technical Details

API Changes:

Along with the model, we are releasing V2 of the Rerank API. It includes the following major changes:

  • model is now a required parameter
  • max_chunks_per_doc has been replaced by max_tokens_per_doc; max_tokens_per_doc will determine the maximum amount of tokens a document can have before truncation. The default value for max_tokens_per_doc is 4096.
  • support for passing a list of objects for the documents parameter has been removed - if your documents contain structured data, for best performance we recommend formatting them as YAML strings.

Example request

cURL
1POST https://api.cohere.ai/v2/rerank
2{
3 "model": "rerank-v3.5",
4 "query": "What is the capital of the United States?",
5 "top_n": 3,
6 "documents": ["Carson City is the capital city of the American state of Nevada.",
7 "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
8 "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
9 "Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
10 "Capital punishment has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."]
11}

Structured Outputs support for tool use

Today, we’re pleased to announce that we have added Structured Outputs support for tool use in the Chat API.

In addition to supporting Structured Outputs with JSON generation via the response_format parameter, Structured Outputs will be available with Tools as well via the strict_tools parameter.

Setting strict_tools to true ensures that tool calls will follow the provided tool schema exactly. This means the tool calls are guaranteed to adhere to the tool names, parameter names, parameter data types, and required parameters, without the risk of hallucinations.

See the Structured Outputs documentation to learn more.


Embed v3.0 Models are now Multimodal

Today we’re announcing updates to our embed-v3.0 family of models. These models now have the ability to process images into embeddings. There is no change to existing text capabilities which means there is no need to re-embed texts you have already processed with our embed-v3.0 models.

In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing.

Technical Details

API Changes:

The Embed API has two major changes:

  • Introduced a new input_type called image
  • Introduced a new parameter called images

Example request on how to process

cURL
1POST https://api.cohere.ai/v1/embed
2{
3 "model": "embed-multilingual-v3.0",
4 "input_type": "image",
5 "embedding_types": ["float"],
6 "images": [enc_img]
7}

Restrictions:

  • The API only accepts images in the base format of the following: png, jpeg,Webp, and gif
  • Image embeddings currently does not support batching so the max images sent per request is 1
  • The maximum image sizez is 5mb
  • The images parameter only accepts a base64 encoded image formatted as a Data Url


New Embed, Rerank, Chat, and Classify APIs

We’re excited to introduce improvements to our Chat, Classify, Embed, and Rerank APIs in a major version upgrade, making it easier and faster to build with Cohere. We are also releasing new versions of our Python, TypeScript, Java, and Go SDKs which feature cohere.ClientV2 for access to the new API.

New at a glance

Other updates

We are simplifying the Chat API by removing support for the following parameters available in V1:

  • search_queries_only, which generates only a search query given a user’s message input. search_queries_only is not supported in the V2 Chat API today, but will be supported at a later date.
  • connectors, which enables users to register a data source with Cohere for RAG queries. To use the Chat V2 API with web search, see our migration guide for instructios to implement a web search tool.
  • conversation_id, used to manage chat history on behalf of the developer. This will not be supported in the V2 Chat API.
  • prompt_truncation, used to automatically rerank and remove documents if the query did not fit in the model’s context limit. This will not be supported in the V2 Chat API.
  • force_single_step, which forced the model to finish tool calling in one set of turns. This will not be supported in the V2 Chat API.
  • preamble, used for giving the model task, context, and style instructions. Use a system turn at the beginning of your messages array in V2.
  • citation_quality, for users to select between fast citations, accurate citations (slightly higher latency than fast), or citations off. In V2 Chat, we are introducing a top level citation_options parameter for all citation settings. citation_quality will be replaced by a mode parameter within citation_options.

See our Chat API migration guide for detailed instructions to update your implementation.

These APIs are in Beta and are subject to updates. We welcome feedback in our Discord channel.



Command models get an August refresh

Today we’re announcing updates to our flagship generative AI model series: Command R and Command R+. These models demonstrate improved performance on a variety of tasks.

The latest model versions are designated with timestamps, as follows:

  • The updated Command R is command-r-08-2024 on the API.
  • The updated Command R+ is command-r-plus-08-2024 on the API.

In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing.

Technical Details

command-r-08-2024 shows improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, command-r-08-2024 is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.

command-r-08-2024 delivers around 50% higher throughput and 20% lower latencies as compared to the previous Command R version, while cutting the hardware footprint required to serve the model by half. Similarly, command-r-plus-08-2024 delivers roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.

Both models include the following feature improvements:

  • For tool use, command-r-08-2024 and command-r-plus-08-2024 have demonstrated improved decision-making around which tool to use in which context, and whether or not to use a tool.
  • Improved instruction following in the preamble.
  • Improved multilingual RAG searches in the language of the user with improved responses.
  • Better structured data analysis for structured data manipulation.
  • Better structured data creation from unstructured natural language instructions.
  • Improved robustness to non-semantic prompt changes like white space or new lines.
  • The models will decline unanswerable questions.
  • The models have improved citation quality and users can now turn off citations for RAG workflows.
  • For command-r-08-2024 there are meaningful improvements on length and formatting control.

New Feature: Safety Modes

The primary new feature available in both command-r-08-2024 and command-r-plus-08-2024 is Safety Modes (in beta). For our enterprise customers building with our models, what is considered safe depends on their use case and the context the model is deployed in. To support diverse enterprise applications, we have developed safety modes, acknowledging that safety and appropriateness are context-dependent, and that predictability and control are critical in building confidence in Cohere models.

Safety guardrails have traditionally been reactive and binary, and we’ve observed that users often have difficulty defining what safe usage means to them for their use case. Safety Modes introduce a nuanced approach that is context sensitive.

(Note: Command R/R+ have built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.)

Safety modes are activated through a safety_mode parameter, which can (currently) be in one of two modes:

  • "STRICT": Encourages avoidance of all sensitive topics. Strict content guardrails provide an extra safe experience by prohibiting inappropriate responses or recommendations. Ideal for general and enterprise use.
  • "CONTEXTUAL": (enabled by default): For wide-ranging interactions with fewer constraints on output while maintaining core protections. The model responds as instructed while still rejecting harmful or illegal suggestions. Well-suited for entertainment, creative, educational use.

You can also opt out of the safety modes beta by setting safety_mode="NONE". For more information, check out our dedicated guide to Safety Modes.

Pricing

Here’s a breakdown the pricing structure for the new models:

  • For command-r-plus-08-2024, input tokens are priced at $2.50/M and output tokens at $10.00/M.
  • For command-r-08-2024, input tokens are priced at $0.15/M and output tokens at $0.60/M.

Force JSON object response format

Users can now force command-nightlyto generate outputs in JSON objects by setting the response_format parameter in the Chat API. Users can also specify a JSON schema for the output.

This feature is available across all of Cohere’s SDKs (Python, Typescript, Java, Go).

Example request for forcing JSON response format:

cURL
1POST https://api.cohere.ai/v1/chat
2{
3 "message": "Generate a JSON that represents a person, with name and age",
4 "model": "command-nightly",
5 "response_format": {
6 "type": "json_object"
7 }
8}

Example request for forcing JSON response format in user defined schema:

cURL
1POST https://api.cohere.ai/v1/chat
2{
3 "message": "Generate a JSON that represents a person, with name and age",
4 "model": "command-nightly",
5 "response_format": {
6 "type": "json_object",
7 "schema": {
8 "type": "object",
9 "required": ["name", "age"],
10 "properties": {
11 "name": { "type": "string" },
12 "age": { "type": "integer" }
13 }
14 }
15 }
16}

Currently only compatible with `command-nightly model.


Release Notes for June 10th 2024: Updates to Tool Use, SDKs, Billing

Multi-step tool use now default in Chat API

Tool use is a technique which allows developers to connect Cohere’s Command family of models to external tools like search engines, APIs, functions, databases, etc. It comes in two variants, single-step and multi-step, both of which are available through Cohere’s Chat API.

As of today, tool use will now be multi-step by default. Here are some resources to help you get started:

We’ve published additional docs!

Cohere’s models and functionality are always improving, and we’ve recently dropped the following guides to help you make full use of our offering:

  • Predictable outputs - Information about the seed parameter has been added, giving you more control over the predictability of the text generated by Cohere models.
  • Using Cohere SDKs with private cloud models - To maximize convenience in building on and switching between Cohere-supported environments, our SDKs have been developed to allow seamless support of whichever cloud backend you choose. This guide walks you through when you can use Python, Typescript, Go, and Java on Amazon Bedrock, Amazon SageMaker, Azure, and OCI, what features and parameters are supported, etc.

Changes to Billing

Going forward, Cohere is implementing the following two billing policies:

  • When a user accrues $150 of outstanding debts, a warning email will be sent alerting them of upcoming charges.
  • When a self-serve customer (i.e. a non-contracted organization with a credit card on file) accumulates $250 in outstanding debts, a charge will be forced via Stripe.

Advanced Retrieval Launch

We’re pleased to announce the release of Rerank 3 our newest and most performant foundational model for ranking. Rerank 3 boast a context length of 4096, SOTA performance on Code Retrieval, Long Document, and Semi-Structured Data. In addition to quality improvements, we’ve improved inference speed by a factor of 2x for short documents (doc length < 512 tokens) and 3x for long documents (doc length ~4096 tokens).