Google’s new robot AI can fold delicate origami, close zipper bags without damage
1 min read
Summary
London-based AI research company DeepMind has created two new AI systems, Gemini Robotics and Gemini Robotics-ER, to enable robots to better understand and interact with the physical world.
The models build on the company’s language model, Gemini 2.0, and are designed to work with different types of robots, including humanoid robot assistants.
While hardware for robot platforms is advancing, creating an AI system that can pilot these robots through novel scenarios with safety and precision remains difficult, according to DeepMind.
Google’s new models employ a “vision-language-action” system, which uses camera footage of a scene to allow a robot to recognise objects and carry out commands, and an “embodied reasoning” function with enhanced spatial understanding.
The ASIMOV dataset has been released to help researchers assess the safety implications of robotic actions in real-world scenarios.