Summary

  • Researchers from Salesforce and the University of Southern California have developed an AI agent that can execute code and navigate graphical user interfaces.
  • The current system, called CoAct-1 (Computer-using Agent with Coding as Actions), consists of three specialised agents which work together to complete tasks that are easiest done with point-and-click action or those that require coding.
  • The system breaks the task down into subtasks and delegate the best agent for that task.
  • The researchers’ tests showed it achieved a success rate of 60.76% on comprehensive benchmark OSWorld, and solved tasks in around 10.15 steps, saving around four steps compared with existing models.
  • The researchers believe the system could be used to automate complex, multi-tool processes in areas such as customer service, sales and marketing.

By Ben Dickson

Original Article