Announcing Cohere's Command A Vision Model

We’re excited to announce the release of Command A Vision, Cohere’s first commercial model capable of understanding and interpreting visual data alongside text. This addition to our Command family brings enterprise-grade vision capabilities to your applications with the same familiar Command API interface.

Key Features

Multimodal Capabilities

Text + Image Processing: Combine text prompts with image inputs
Enterprise-Focused Use Cases: Optimized for business applications like document analysis, chart interpretation, and OCR
Multiple Languages: Officially supports English, Portuguese, Italian, French, German, and Spanish

Technical Specifications

Model Name: command-a-vision-07-2025
Context Length: 128K tokens
Maximum Output: 8K tokens
Image Support: Up to 20 images per request (or 20MB total)
API Endpoint: Chat API

What You Can Do

Command A Vision excels in enterprise use cases including:

📊 Chart & Graph Analysis: Extract insights from complex visualizations
📋 Table Understanding: Parse and interpret data tables within images
📄 Document OCR: Optical character recognition with natural language processing
🌐 Image Processing for Multiple Languages: Handle text in images across multiple languages
🔍 Scene Analysis: Identify and describe objects within images

💻 Getting Started

The API structure is identical to our existing Command models, making integration straightforward:

1 import cohere
2 
3 co = cohere.Client("your-api-key")
4 
5 response = co.chat(
6     model="command-a-vision-07-2025",
7     messages=[
8         {
9             "role": "user",
10             "content": [
11                 {
12                     "type": "text",
13                     "text": "Analyze this chart and extract the key data points",
14                 },
15                 {
16                     "type": "image_url",
17                     "image_url": {"url": "your-image-url"},
18                 },
19             ],
20         }
21     ],
22 )

There’s much more to be said about working with images, various limitations, and best practices, which you can find in our dedicated Command A Vision and Image Inputs documents.