Release Notes


Cohere Releases Arabic-Optimized Command Model!

Cohere is thrilled to announce the release of Command R7B Arabic (c4ai-command-r7b-12-2024). This is an open weights release of an advanced, 8-billion parameter custom model optimized for the Arabic language (MSA dialect), in addition to English. As with Cohere’s other command models, this one comes with context length of 128,000 tokens; it excels at a number of critical enterprise tasks — instruction following, length control, retrieval-augmented generation (RAG), minimizing code-switching — and it demonstrates excellent general purpose knowledge and understanding of the Arabic language and culture.

Try Command R7B Arabic

If you want to try Command R7B Arabic, it’s very easy: you can use it through the Cohere playground or in our dedicated Hugging Face Space.

Alternatively, you can use the model in your own code. To do that, first install the transformers library from its source repository:

$pip install 'git+https://github.com/huggingface/transformers.git'

Then, use this Python snippet to run a simple text-generation task with the model:

1from transformers import AutoTokenizer, AutoModelForCausalLM
2
3model_id = "CohereForAI/c4ai-command-r7b-12-2024"
4tokenizer = AutoTokenizer.from_pretrained(model_id)
5model = AutoModelForCausalLM.from_pretrained(model_id)
6
7# Format message with the c4ai-command-r7b-12-2024 chat template
8messages = [{"role": "user", "content": "مرحبا، كيف حالك؟"}]
9input_ids = tokenizer.apply_chat_template(
10 messages,
11 tokenize=True,
12 add_generation_prompt=True,
13 return_tensors="pt",
14)
15
16gen_tokens = model.generate(
17 input_ids,
18 max_new_tokens=100,
19 do_sample=True,
20 temperature=0.3,
21)
22
23gen_text = tokenizer.decode(gen_tokens[0])
24print(gen_text)

Chat Capabilities

Command R7B Arabic can be operated in two modes, “conversational” and “instruct” mode:

  • Conversational mode conditions the model on interactive behaviour, meaning it is expected to reply in a conversational fashion, provide introductory statements and follow-up questions, and use Markdown as well as LaTeX where appropriate. This mode is optimized for interactive experiences, such as chatbots, where the model engages in dialogue.
  • Instruct mode conditions the model to provide concise yet comprehensive responses, and to not use Markdown or LaTeX by default. This mode is designed for non-interactive, task-focused use cases such as extracting information, summarizing text, translation, and categorization.

Multilingual RAG Capabilities

Command R7B Arabic has been trained specifically for Arabic and English tasks, such as the generation step of Retrieval Augmented Generation (RAG).

Command R7B Arabic’s RAG functionality is supported through chat templates in Transformers. Using our RAG chat template, the model takes a conversation (with an optional user-supplied system preamble) and a list of document snippets as input. The resulting output contains a response with in-line citations. Here’s what that looks like:

1# Define conversation input
2conversation = [
3 {
4 "role": "user",
5 "content": "اقترح طبقًا يمزج نكهات من عدة دول عربية",
6 }
7]
8
9# Define documents for retrieval-based generation
10documents = [
11 {
12 "heading": "المطبخ العربي: أطباقنا التقليدية",
13 "body": "يشتهر المطبخ العربي بأطباقه الغنية والنكهات الفريدة. في هذا المقال، سنستكشف ...",
14 },
15 {
16 "heading": "وصفة اليوم: مقلوبة",
17 "body": "المقلوبة هي طبق فلسطيني تقليدي، يُحضر من الأرز واللحم أو الدجاج والخضروات. في وصفتنا اليوم ...",
18 },
19]
20
21# Get the RAG prompt
22input_prompt = tokenizer.apply_chat_template(
23 conversation=conversation,
24 documents=documents,
25 tokenize=False,
26 add_generation_prompt=True,
27 return_tensors="pt",
28)
29# Tokenize the prompt
30input_ids = tokenizer.encode_plus(input_prompt, return_tensors="pt")

You can then generate text from this input as normal.

Notes on Usage

We recommend document snippets be short chunks (around 100-400 words per chunk) instead of long documents. They should also be formatted as key-value pairs, where the keys are short descriptive strings and the values are either text or semi-structured.

You may find that simply including relevant documents directly in a user message works as well as or better than using the documents parameter to render the special RAG template (though the template is a strong default for those wanting citations). We encourage users to experiment with both approaches, and to evaluate which mode works best for their specific use case.


Use Cohere models via the OpenAI SDK with the Compatibility API

Today, we are releasing our Compatibility API, enabling developers to seamlessly use Cohere’s models via OpenAI’s SDK.

This API enables you to switch your existing OpenAI-based applications to use Cohere’s models without major refactoring.

It includes comprehensive support for chat completions, such as function calling and structured outputs, as well as support for text embeddings generation.

Check out our documentation on how to get started with the Compatibility API, with examples in Python, TypeScript, and cURL.


Cohere's Rerank v3.5 Model is on Azure AI Foundry!

In December 2024, Cohere released Rerank v3.5 model. It demonstrates SOTA performance on multilingual retrieval, reasoning, and tasks in domains as varied as finance, eCommerce, hospitality, project management, and email/messaging retrieval.

This model has been available through the Cohere API, but today we’re pleased to announce that it can also be utilized through Microsoft Azure’s AI Foundry!

You can find more information about using Cohere’s embedding models on AI Foundry here.


Cohere's Rerank v3.5 Model is on Azure AI Foundry!

In December 2024, Cohere released Rerank v3.5 model. It demonstrates SOTA performance on multilingual retrieval, reasoning, and tasks in domains as varied as finance, eCommerce, hospitality, project management, and email/messaging retrieval.

This model has been available through the Cohere API, but today we’re pleased to announce that it can also be utilized through Microsoft Azure’s AI Foundry!

You can find more information about using Cohere’s embedding models on AI Foundry here.


Deprecation of the Classify endpoint via default Embed models

Effective January 31st, 2025, we are deprecating the use of default Embed models with the Classify endpoint.

This deprecation does not affect usage of the Classify endpoint with fine-tuned Embed models. Fine-tuned models continue to be fully supported and are recommended for achieving optimal classification performance.

For guidance on implementing Classify with fine-tuned models, please refer to our Classify fine-tuning documentation.



Aya Expanse is Available on WhatsApp!

Aya Expanse is a multilingual large language model that is designed to expand the number of languages covered by generative AI. It is optimized to perform well in 23 languages, including Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Russian, Spanish, and more.

Now, you can talk to Aya Expanse directly in the popular messaging service WhatsApp! All of Aya’s functionality is avaible through the app, and you can find more details here.


Announcing Command R7b

We’re thrilled to announce the release of Command R7B, the smallest, fastest, and final model in our R family of enterprise-focused large language models (LLMs). With a context window of 128K, Command R7B offers state-of-the-art performance across a variety of real-world tasks, and is designed for use cases in which speed, cost, and compute are important. Specifically, Command R7B is excellent for retrieval-augmented generation, tool use, and agentic applications where complex reasoning, multiple actions, and information-seeking are important for success.

Command R7B is available today on the Cohere Platform as well as accessible on HuggingFace, or you can access it in the SDK with command-r7b-12-2024. For more information, check out our dedicated blog post.


Announcing Rerank-v3.5

We’re pleased to announce the release of Rerank 3.5 our newest and most performant foundational model for ranking. Rerank 3.5 has a context length of 4096, SOTA performance on Multilingual Retrieval tasks and Reasoning Capabilities. In addition, Rerank 3.5 has SOTA performance on BEIR and domains such as Finance, E-commerce, Hospitality, Project Management, and Email/Messaging Retrieval tasks.

In the rest of these release notes, we’ll provide more details about changes to the api.

Technical Details

API Changes:

Along with the model, we are releasing V2 of the Rerank API. It includes the following major changes:

  • model is now a required parameter
  • max_chunks_per_doc has been replaced by max_tokens_per_doc; max_tokens_per_doc will determine the maximum amount of tokens a document can have before truncation. The default value for max_tokens_per_doc is 4096.
  • support for passing a list of objects for the documents parameter has been removed - if your documents contain structured data, for best performance we recommend formatting them as YAML strings.

Example request

cURL
1POST https://api.cohere.ai/v2/rerank
2{
3 "model": "rerank-v3.5",
4 "query": "What is the capital of the United States?",
5 "top_n": 3,
6 "documents": ["Carson City is the capital city of the American state of Nevada.",
7 "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
8 "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
9 "Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
10 "Capital punishment has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."]
11}

Structured Outputs support for tool use

Today, we’re pleased to announce that we have added Structured Outputs support for tool use in the Chat API.

In addition to supporting Structured Outputs with JSON generation via the response_format parameter, Structured Outputs will be available with Tools as well via the strict_tools parameter.

Setting strict_tools to true ensures that tool calls will follow the provided tool schema exactly. This means the tool calls are guaranteed to adhere to the tool names, parameter names, parameter data types, and required parameters, without the risk of hallucinations.

See the Structured Outputs documentation to learn more.

Built with