New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks
1 min read
Summary
Canadian AI start-up Cohere has launched Command A Vision, a new tool designed to help enterprises gain insights from optical character recognition (OCR) and image analysis.
Built on top of Cohere’s existing Command A model, Command A Vision is a visual model requiring just two GPUs to run and supporting at least 23 languages.
In tests, the model outperformed competitors such as OpenAI’s GPT 4.1, Meta’s Llama 4 Maverick, and Mistral’s Pixtal Large and Medium 3 in tasks such as ChartQA, OCRBench, AI2D and TextVQA.
The model’s performance means it is optimally suited to analyse the graphs, charts, diagrams and PDFs that are commonly used in enterprises.
Cohere has released the model in open weights format in the hope that its tools will be adopted by enterprises that wish to move away from closed, or propriety models.