Cohere Releases Arabic-Optimized Command Model!

Cohere is thrilled to announce the release of Command R7B Arabic (c4ai-command-r7b-12-2024). This is an open weights release of an advanced, 8-billion parameter custom model optimized for the Arabic language (MSA dialect), in addition to English. As with Cohere’s other command models, this one comes with context length of 128,000 tokens; it excels at a number of critical enterprise tasks — instruction following, length control, retrieval-augmented generation (RAG), minimizing code-switching — and it demonstrates excellent general purpose knowledge and understanding of the Arabic language and culture.

Try Command R7B Arabic

If you want to try Command R7B Arabic, it’s very easy: you can use it through the Cohere playground or in our dedicated Hugging Face Space.

Alternatively, you can use the model in your own code. To do that, first install the transformers library from its source repository:

$ pip install 'git+https://github.com/huggingface/transformers.git'

Then, use this Python snippet to run a simple text-generation task with the model:

1 from transformers import AutoTokenizer, AutoModelForCausalLM
2 
3 model_id = "CohereForAI/c4ai-command-r7b-12-2024"
4 tokenizer = AutoTokenizer.from_pretrained(model_id)
5 model = AutoModelForCausalLM.from_pretrained(model_id)
6 
7 # Format message with the c4ai-command-r7b-12-2024 chat template
8 messages = [{"role": "user", "content": "مرحبا، كيف حالك؟"}]
9 input_ids = tokenizer.apply_chat_template(
10     messages,
11     tokenize=True,
12     add_generation_prompt=True,
13     return_tensors="pt",
14 )
15 
16 gen_tokens = model.generate(
17     input_ids,
18     max_new_tokens=100,
19     do_sample=True,
20     temperature=0.3,
21 )
22 
23 gen_text = tokenizer.decode(gen_tokens[0])
24 print(gen_text)

Chat Capabilities

Command R7B Arabic can be operated in two modes, “conversational” and “instruct” mode:

Conversational mode conditions the model on interactive behaviour, meaning it is expected to reply in a conversational fashion, provide introductory statements and follow-up questions, and use Markdown as well as LaTeX where appropriate. This mode is optimized for interactive experiences, such as chatbots, where the model engages in dialogue.
Instruct mode conditions the model to provide concise yet comprehensive responses, and to not use Markdown or LaTeX by default. This mode is designed for non-interactive, task-focused use cases such as extracting information, summarizing text, translation, and categorization.

Multilingual RAG Capabilities

Command R7B Arabic has been trained specifically for Arabic and English tasks, such as the generation step of Retrieval Augmented Generation (RAG).

Command R7B Arabic’s RAG functionality is supported through chat templates in Transformers. Using our RAG chat template, the model takes a conversation (with an optional user-supplied system preamble) and a list of document snippets as input. The resulting output contains a response with in-line citations. Here’s what that looks like:

1 # Define conversation input
2 conversation = [
3     {
4         "role": "user",
5         "content": "اقترح طبقًا يمزج نكهات من عدة دول عربية",
6     }
7 ]
8 
9 # Define documents for retrieval-based generation
10 documents = [
11     {
12         "heading": "المطبخ العربي: أطباقنا التقليدية",
13         "body": "يشتهر المطبخ العربي بأطباقه الغنية والنكهات الفريدة. في هذا المقال، سنستكشف ...",
14     },
15     {
16         "heading": "وصفة اليوم: مقلوبة",
17         "body": "المقلوبة هي طبق فلسطيني تقليدي، يُحضر من الأرز واللحم أو الدجاج والخضروات. في وصفتنا اليوم ...",
18     },
19 ]
20 
21 # Get the RAG prompt
22 input_prompt = tokenizer.apply_chat_template(
23     conversation=conversation,
24     documents=documents,
25     tokenize=False,
26     add_generation_prompt=True,
27     return_tensors="pt",
28 )
29 # Tokenize the prompt
30 input_ids = tokenizer.encode_plus(input_prompt, return_tensors="pt")

You can then generate text from this input as normal.

Notes on Usage

We recommend document snippets be short chunks (around 100-400 words per chunk) instead of long documents. They should also be formatted as key-value pairs, where the keys are short descriptive strings and the values are either text or semi-structured.

You may find that simply including relevant documents directly in a user message works as well as or better than using the documents parameter to render the special RAG template (though the template is a strong default for those wanting citations). We encourage users to experiment with both approaches, and to evaluate which mode works best for their specific use case.