Summary

  • OpenCUA is an open-source computer-use agent (CUA) framework launched by researchers at the University of Hong Kong.
  • The CUA is designed to autonomously complete tasks on computers, from navigating websites to operating complex software, and helps enterprises to automate workflows.
  • Existing systems are usually proprietary, meaning their training methods, architectures and data are kept private, limiting transparency and raising safety concerns, according to the researchers.
  • OpenCUA aims to address these challenges by scaling both the data collection and the models themselves, and includes the AgentNet tool for recording human demonstrations of computer tasks on different operating systems, as well as the AgentNet dataset.
  • The framework introduces a pipeline for processing data and training computer-use agents, turning raw human demonstrations into suitable training data for vision-language models, which are then trained on state-action pairs.
  • The code, dataset and weights have been released for the models.

By Ben Dickson

Original Article