Cohere’s first vision model Aya Vision is here with broad, multilingual understanding and open weights — but there’s a catch
1 min read
Summary
Cohere for AI, the non-profit research division of Canadian AI startup Cohere, has launched Aya Vision, its first vision model.
The open-weight multimodal AI model integrates language and vision capabilities and supports 23 different languages, making it appealing to a global audience.
Aya Vision is designed to enhance AI’s ability to interpret images, generate text, and translate visual content into natural language in multiple languages.
The model is available on Cohere’s website and AI code communities Hugging Face and Kaggle under a Creative Commons license, allowing researchers and developers to use, modify and share the model for non-commercial purposes.
Aya Vision is also available through WhatsApp, where users can interact with the model directly.
The model’s efficiencies and performance relative to model size are standout features, outperforming larger multimodal models in several key benchmarks.
Cohere For AI credits this to innovations such as synthetic data generation for training, multilingual data scaling, and model merging techniques.
Although ostensibly catering to enterprises, restrictive non-commercial licensing terms limit Aya Vision’s business use.