Cohere's Command A Vision Model

Capabilities

MultilingualImage InputsSafety ModesCitationsStructured OutputsReasoningTool Use

Pricing

For both trial keys and production keys, Command A Vision is free until rate limits are reached. Learn more about rate limits for different models and key types here.

To use Command A Vision in production, please reach out to sales at sales@cohere.com.

Specifications

Context Window: 128,000 tokens

Max Output Tokens: 8,000 tokens

Knowledge Cutoff: June 1, 2024

API Endpoints

Model ID

command-a-vision-07-2025

Chat V2Chat CompletionsChat V1

Try in Playground

Description

Command A Vision is Cohere’s first multimodal model capable of understanding and interpreting visual data alongside text. With a 128K context length and support for up to 20 images per request, Command Vision excels at enterprise use cases including document analysis, chart interpretation, optical character recognition (OCR), and processing images featuring multiple languages. The model maintains the same API interface as other Command models, making it easy to integrate vision capabilities into existing applications.

What Can Command A Vision be Used For?

Command A Vision is excellent in enterprise use cases such as:

Analysis of charts, graphs, and diagrams;
Extracting and understanding in-image tables;
Document optical character recognition (OCR) and question answering;
Natural-language image processing.

Limitations

Be aware that tool use isn’t supported with this model.

Also, it’s important to mention that Command A Vision can accept images as input, but doesn’t generate them.

For more detailed breakdowns of these and other applications, check out our cookbooks. To learn more about how token counts work, the maximum number of images, and so on, check out our dedicated Image Inputs document.