For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
We’re excited to announce the release of Command A Vision, Cohere’s first commercial model capable of understanding and interpreting visual data alongside text. This addition to our Command family brings enterprise-grade vision capabilities to your applications with the same familiar Command API interface.
Key Features
Multimodal Capabilities
Text + Image Processing: Combine text prompts with image inputs
Enterprise-Focused Use Cases: Optimized for business applications like document analysis, chart interpretation, and OCR
Image Support: Up to 20 images per request (or 20MB total)
API Endpoint: Chat API
What You Can Do
Command A Vision excels in enterprise use cases including:
📊 Chart & Graph Analysis: Extract insights from complex visualizations
📋 Table Understanding: Parse and interpret data tables within images
📄 Document OCR: Optical character recognition with natural language processing
🌐 Image Processing for Multiple Languages: Handle text in images across multiple languages
🔍 Scene Analysis: Identify and describe objects within images
💻 Getting Started
The API structure is identical to our existing Command models, making integration straightforward:
1
import cohere
2
3
co = cohere.Client("your-api-key")
4
5
response = co.chat(
6
model="command-a-vision-07-2025",
7
messages=[
8
{
9
"role": "user",
10
"content": [
11
{
12
"type": "text",
13
"text": "Analyze this chart and extract the key data points",
14
},
15
{
16
"type": "image_url",
17
"image_url": {"url": "your-image-url"},
18
},
19
],
20
}
21
],
22
)
There’s much more to be said about working with images, various limitations, and best practices, which you can find in our dedicated Command A Vision and Image Inputs documents.