For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
      • Aya Vision
      • Aya Expanse
      • Tiny Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Model Details
  • Multimodal Capabilities
  • How Can I Get Access to the Aya Models?
  • Find More
ModelsAya

Aya Vision

Was this page helpful?
Edit this page
Previous

Aya Expanse

Next
Built with

Aya Vision is a state-of-the-art multimodal and massively multilingual large language model excelling at critical benchmarks for language, text, and image capabilities. A natural extension of the Aya Expanse model family, Aya Vision provides deep capability in 23 languages, helping eliminate technological and communication divides between people and geographies.

Built as a foundation for multilingual and multimodal communication, Aya Vision supports tasks such as image captioning, visual question answering, text generation, and translations from both texts and images into coherent text.

Model Details

Model NameDescriptionModalityContext LengthMaximum Output TokensEndpoints
c4ai-aya-vision-32bAya Vision is a state-of-the-art multimodal model excelling at a variety of critical benchmarks for language, text, and image capabilities. Serves 23 languages. This 32 billion parameter variant is focused on state-of-art multilingual performance.Text, Images16k4kChat

Multimodal Capabilities

Aya Vision’s multimodal capabilities enable it to understand content across different media types, including text and images as input. Purpose-built to unify cultures, geographies, and people, Aya Vision is optimized for elite performance in 23 different languages. Its image captioning capabilities allow it to generate descriptive captions for images, and interpret images dynamically to answer various questions about images. Likewise, Aya Vision allows question answering, and translation across these materials, whether written or image based, laying a foundation to bridge communication and collaboration divides.

Like Aya Expanse, Aya Vision is highly proficient in 23 languages, making it a valuable tool for researchers, academics, and developers working on multilingual projects.

How Can I Get Access to the Aya Models?

If you want to test Aya, you have three options. First (and simplest), you can use the Cohere playground or Hugging Face Space to play around with them and see what they’re capable of.

Second, you can use the Cohere Chat API to work with Aya programmatically. Here’s a very lightweight example of using the Cohere SDK to get Aya Vision to describe the contents of an image; if you haven’t installed the Cohere SDK, you can do that with pip install cohere.

PYTHON
1import cohere
2import base64
3import os
4
5
6def generate_text(image_path, message):
7
8 model = "c4ai-aya-vision-32b"
9
10 co = cohere.ClientV2("<YOUR_API_KEY>")
11
12 with open(image_path, "rb") as img_file:
13 base64_image_url = f"data:image/jpeg;base64,{base64.b64encode(img_file.read()).decode('utf-8')}"
14
15 response = co.chat(
16 model=model,
17 messages=[
18 {
19 "role": "user",
20 "content": [
21 {"type": "text", "text": message},
22 {
23 "type": "image_url",
24 "image_url": {"url": base64_image_url},
25 },
26 ],
27 }
28 ],
29 temperature=0.3,
30 )
31
32 print(response.message.content[0].text)

Here’s an image we might feed to Aya Vision: A guitar-focused room

And here’s an example output we might get when we run generate_text(image_path, "What items are in the wall of this room?")

(remember: these models are stochastic, and what you see might look quite different).

The wall in this room showcases a collection of musical instruments and related items, creating a unique and personalized atmosphere. Here's a breakdown of the items featured:
1. **Guitar Wall Mount**: The centerpiece of the wall is a collection of guitars mounted on a wall. There are three main guitars visible:
- A blue electric guitar with a distinctive design.
- An acoustic guitar with a turquoise color and a unique shape.
- A red electric guitar with a sleek design.
2. **Ukulele Display**: Above the guitars, there is a display featuring a ukulele and its case. The ukulele has a traditional wooden body and a colorful design.
3. **Artwork and Posters**:
- A framed poster or artwork depicting a scene from *The Matrix*, featuring the iconic green pill and red pill.
- A framed picture or album artwork of *Fleetwood Mac McDonald*, including *Rumours*, *Tusk*, and *Dreams*.
- A framed image of the *Dark Side of the Moon* album cover by Pink Floyd.
- A framed poster or artwork of *Star Wars* featuring *R2-D2* (Robotic Man).
4. **Album Collection**: Along the floor, there is a collection of vinyl records or album artwork displayed on a carpeted area. Some notable albums include:
- *Dark Side of the Moon* by Pink Floyd.
- *The Beatles* (White Album).
- *Abbey Road* by The Beatles.
- *Nevermind* by Nirvana.
5. **Lighting and Accessories**:
- A blue lamp with a distinctive design, possibly serving as a floor lamp.
- A small table lamp with a warm-toned shade.

Finally, you can directly download the raw models for research purposes because Cohere Labs has released Aya Vision as open-weight models, through HuggingFace. We also released a new valuable evaluation set — Aya Vision Benchmark — to measure progress on multilingual models here.

Find More

We hope you’re as excited about the possibilities of Aya Vision as we are! If you want to see more substantial projects, you can check out these notebooks:

  • Walkthrough and Use Cases