Cohere’s Command A+ Model
Cohere’s Command A+ Model
For both trial keys and production keys, Command A+ is free until rate limits are reached. Learn more about rate limits for different models and key types here.
Command A+ can be used in production through Cohere's Model Vault.
Description
Command A+ is Cohere’s first Mixture of Experts model, bringing together the power of a strong agentic model, supporting vision and text on input, and expanding the language support to 48 languages, including all of the official European Union languages. With the MoE architecture, the model brings balance with providing a highly accurate user experience, while balancing out the need for enterprises to have a high throughput, low latency solution that minimizes the amount of GPUs required per instance, offering 1xB200 / 2xH100 support.
What Can Command A+ Be Used For?
Command A+ is excellent for:
- Complex, Multimodal Agentic Tasks: With mixed modality (vision / text) inputs, the model can autonomously take actions and interact with its environment to solve complex tasks.
- Multilingual Tasks: With support for 48 languages, combined with the model’s reasoning, agentic problem solving & vision processing capabilities, Command A+ expands not only the number of languages supported, but what enterprises across the globe can do with Cohere’s models.
- Balancing Accuracy & Efficiency: Providing support for deploying the model with as few as a single B200 or two H100 chips, and with vast improvements in reasoning, image processing, and agentic tasks over the rest of the Command A family, Command A+ is the fastest and most performant model in the Command A family by far.
There’s more to be said about token budgets, enabling and disabling the thinking operation, etc., which can be found in our dedicated Reasoning guide.
General
- Name of the model provider: Cohere Inc.
- Release Date: May 20, 2026
- Model dependencies: N/A
- Contact: support@cohere.com
Model Properties
- Model architecture: Command A+ is a sparse mixture-of-experts model.
- Input modalities: Text, image
- Output modalities: Text
- Model Size: 218B total, 25B active
Distribution and Technical Integration
- Cohere Deployment: Deployment options are subject to terms of use applicable to the deployment type or platform. Usage is subject to limitations set out in commercial agreements with customers, which includes Cohere’s Usage Policy.
- Open Source Deployment: Command A+ is available under an Apache 2.0 License on Hugging Face.
- Technical Integration and Required Software (if any): For Cohere Deployments, see our Installation Guide on getting started with Command A+. For Open Source Deployments, instructions are available on the HuggingFace page.
- Required Hardware: 1× B200 at W4A4 or 2× H100s at W4A4
Model Data and Training
- Pre-training: The model was trained on a large-scale collection of unlabelled data using self-supervised learning to learn general patterns, syntax, and semantics across multiple languages.
- Post-training: During post training, labelled datasets were used with supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to maximize the model’s performance over a wide spectrum of domains and capabilities.
- Data: Across both pre-training and post-training, various data sources and modalities are used, which include text, image, and audio content sourced from publicly available information, proprietary datasets developed or generated by Cohere with human annotation or through automated means, including synthetic data, and datasets sourced from specialized data vendors. In most cases, Cohere customers use Cohere models in their own environments, meaning Cohere has no access to inputs submitted to its models and does not use them for model training. De-identified data from the use of Cohere models on Cohere-hosted environments (e.g. user inputs) may be used in limited circumstances where permitted by user controls and Cohere’s relevant terms of service.
- Training Data Processing: Cohere employs various techniques to ensure data quality and suitability for training, including deduplication, filtering for toxic or harmful content, and quality filtering.