Simular AI has created an AI agent called S2, which combines frontier models with models that are designed to use computers, allowing it to achieve state-of-the-art performance when it comes to tasks like using apps and manipulating files.
It uses a powerful general-purpose AI model to reason about how best to complete a task, while smaller open-source models will carry out other tasks, such as interpreting web pages.
It learns from experience and can record user feedback and actions to improve future actions.
However, even the most advanced AI agents are still troubled by edge cases and can sometimes exhibit odd behaviour.
One possible solution is to add human intelligence to the mix, with systems like CowPilot, a Chrome plugin that allows a human to intervene if the AI gets stuck on a task.
The combo of a human and AI agent can perform more tasks than either party alone, with the human-agent combo completing 95% of given tasks, compared to 95%/15% when each works alone.