Tiny Aya
Tiny Aya is a compact 3.35-billion parameter multilingual model that achieves state-of-the-art translation quality, strong multilingual understanding, and high-quality target-language generation across 70 languages. Despite its small size, Tiny Aya is designed to run locally on any device — including phones — without requiring cloud dependency.
Tiny Aya represents an alternative scaling path for multilingual AI: one centered on efficiency, balanced performance across languages, and practical deployment rather than simply increasing parameters.
Supported Languages
Tiny Aya supports 70 languages: English, Dutch, French, Italian, Portuguese, Romanian, Spanish, Czech, Polish, Ukrainian, Russian, Greek, German, Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Hungarian, Serbian, Bulgarian, Arabic, Persian, Urdu, Turkish, Maltese, Hebrew, Hindi, Marathi, Bengali, Gujarati, Punjabi, Tamil, Telugu, Nepali, Tagalog, Malay, Indonesian, Vietnamese, Javanese, Khmer, Thai, Lao, Chinese, Burmese, Japanese, Korean, Amharic, Hausa, Igbo, Malagasy, Shona, Swahili, Wolof, Xhosa, Yoruba, and Zulu.
Model Variants
Tiny Aya is released as a family of five models: a pretrained foundation model, a globally balanced instruction-tuned variant, and three region-specialized variants.
Key Capabilities
Tiny Aya excels at the following tasks:
- Translation: State-of-the-art multilingual translation quality across 70 languages.
- Multilingual understanding: Strong performance on multilingual comprehension benchmarks.
- Target-language generation: High-quality text generation in the target language.
- On-device deployment: Compact enough to run locally on phones and edge devices.
How Can I Access Tiny Aya?
The instruction-tuned Tiny Aya variants (tiny-aya-global, tiny-aya-earth, tiny-aya-fire, tiny-aya-water) are available on the Cohere API via the Chat endpoint. You can use them with the Cohere SDK just like any other model:
Tiny Aya is also available as open-weight models through Hugging Face. You can download and run the models locally:
GGUF quantized versions are also available for efficient local inference: