Aya Multimodal
Aya Vision is a state-of-the-art multimodal and massively multilingual large language model excelling at critical benchmarks for language, text, and image capabilities. A natural extension of the Aya Expanse model family, Aya Vision provides deep capability in 23 languages, helping eliminate technological and communication divides between people and geographies.
Built as a foundation for multilingual and multimodal communication, Aya Vision supports tasks such as image captioning, visual question answering, text generation, and translations from both texts and images into coherent text.
Model Details
Multimodal Capabilities
Aya Vision’s multimodal capabilities enable it to understand content across different media types, including text and images as input. Purpose-built to unify cultures, geographies, and people, Aya Vision is optimized for elite performance in 23 different languages. Its image captioning capabilities allow it to generate descriptive captions for images, and interpret images dynamically to answer various questions about images. Likewise, Aya Vision allows question answering, and translation across these materials, whether written or image based, laying a foundation to bridge communication and collaboration divides.
Like Aya Expanse, Aya Vision is highly proficient in 23 languages, making it a valuable tool for researchers, academics, and developers working on multilingual projects.
How Can I Get Access to the Aya Models?
If you want to test Aya, you have three options. First (and simplest), you can use the Cohere playground or Hugging Face Space to play around with them and see what they’re capable of.
Second, you can use the Cohere Chat API to work with Aya programmatically. Here’s a very lightweight example of using the Cohere SDK to get Aya Vision to describe the contents of an image; if you haven’t installed the Cohere SDK, you can do that with pip install cohere
.
Here’s an image we might feed to Aya Vision:
And here’s an example output we might get when we run generate_text(image_path, "What items are in the wall of this room?")
(remember: these models are stochastic, and what you see might look quite different).
Finally, you can directly download the raw models for research purposes because Cohere For AI has released Aya Vision as open-weight models, through HuggingFace. We also released a new valuable evaluation set — Aya Vision Benchmark — to measure progress on multilingual models here.
Aya Expanse Integration with WhatsApp.
On our Aya Expanse page, you can find more information about Aya models in general, including a detailed FAQ section on how to use it with WhatsApp. There, we walk through using Aya Vision with WhatsApp.
Find More
We hope you’re as excited about the possibilities of Aya Vision as we are! If you want to see more substantial projects, you can check out these notebooks: