๐Ÿš€ New multimodal model: Command A Vision! (Learn more) ๐Ÿš€

Announcing Cohere's Command A Vision Model

Weโ€™re excited to announce the release of Command A Vision, Cohereโ€™s first commercial model capable of understanding and interpreting visual data alongside text. This addition to our Command family brings enterprise-grade vision capabilities to your applications with the same familiar Command API interface.

Key Features

Multimodal Capabilities

  • Text + Image Processing: Combine text prompts with image inputs
  • Enterprise-Focused Use Cases: Optimized for business applications like document analysis, chart interpretation, and OCR
  • Multiple Languages: Officially supports English, Portuguese, Italian, French, German, and Spanish

Technical Specifications

  • Model Name: command-a-vision-07-2025
  • Context Length: 128K tokens
  • Maximum Output: 8K tokens
  • Image Support: Up to 20 images per request (or 20MB total)
  • API Endpoint: Chat API

What You Can Do

Command A Vision excels in enterprise use cases including:

  • ๐Ÿ“Š Chart & Graph Analysis: Extract insights from complex visualizations
  • ๐Ÿ“‹ Table Understanding: Parse and interpret data tables within images
  • ๐Ÿ“„ Document OCR: Optical character recognition with natural language processing
  • ๐ŸŒ Image Processing for Multiple Languages: Handle text in images across multiple languages
  • ๐Ÿ” Scene Analysis: Identify and describe objects within images

๐Ÿ’ป Getting Started

The API structure is identical to our existing Command models, making integration straightforward:

1import cohere
2
3co = cohere.Client("your-api-key")
4
5response = co.chat(
6 model="command-a-vision-07-2025",
7 messages=[
8 {
9 "role": "user",
10 "content": [
11 {
12 "type": "text",
13 "text": "Analyze this chart and extract the key data points",
14 },
15 {
16 "type": "image_url",
17 "image_url": {"url": "your-image-url"},
18 },
19 ],
20 }
21 ],
22)

Thereโ€™s much more to be said about working with images, various limitations, and best practices, which you can find in our dedicated Command A Vision and Image Inputs documents.