Google DeepMind aims for helpful AI robots


Google DeepMind has introduced Gemini Robotics, new AI models designed to bring advanced reasoning and physical capabilities to robots.

Built on the foundation of Gemini 2.0, the new models represent a leap towards creating robots that can understand and interact with the physical world in ways that were previously confined to the digital realm.  

The new models, Gemini Robotics and Gemini Robotics-ER (Embodied Reasoning), aim to enable robots to perform a wider range of real-world tasks by combining advanced vision, language, and action capabilities.

Gemini Robotics aims to bridge the digital-physical gap 

Until now, AI models like Gemini have excelled in multimodal reasoning across text, images, audio, and video. However, their abilities have largely been limited to digital applications.

To make AI models truly useful in everyday life, they must possess “embodied reasoning” (i.e., the ability to comprehend and react to the physical world, much like humans do.)

Gemini Robotics addresses this challenge by introducing physical actions as a new output modality, allowing the model to directly control robots. Meanwhile, Gemini Robotics-ER enhances spatial understanding—enabling roboticists to integrate the model’s reasoning capabilities into their own systems.  

These models represent a foundational step towards a new generation of helpful robots. By combining advanced AI with physical action, Google DeepMind is unlocking the potential for robots to assist in a variety of real-world settings, from homes to workplaces.

Key features of Gemini Robotics  

Gemini Robotics is designed with three core qualities in mind: generality, interactivity, and dexterity. These attributes ensure that the model can adapt to diverse situations, respond to dynamic environments, and perform complex tasks with precision.

Generality

Gemini Robotics leverages the world-understanding capabilities of Gemini 2.0 to generalise across novel situations. This means the model can tackle tasks it has never encountered before, adapt to new objects, and operate in unfamiliar environments. According to Google DeepMind, Gemini Robotics more than doubles the performance of state-of-the-art vision-language-action models on generalisation benchmarks.

Interactivity

To function effectively in the real world, robots must seamlessly interact with people and their surroundings. Gemini Robotics excels in this area, thanks to its advanced language understanding capabilities. The model can interpret and respond to natural language instructions, monitor its environment for changes, and adjust its actions accordingly.  

For example, if an object slips from a robot’s grasp or is moved by a person, Gemini Robotics can quickly replan and continue the task. This level of adaptability is crucial for real-world applications, where unpredictability is the norm.

Dexterity

Many everyday tasks require fine motor skills that have traditionally been challenging for robots. Gemini Robotics, however, demonstrates remarkable dexterity, enabling it to perform complex, multi-step tasks such as folding origami or packing a snack into a Ziploc bag.

Multiple embodiments for diverse applications 

One of the standout features of Gemini Robotics is its ability to adapt to different types of robots. While the model was primarily trained using data from the bi-arm robotic platform ALOHA 2, it has also been successfully tested on other platforms, including the Franka arms used in academic labs.  

Google DeepMind is also collaborating with Apptronik to integrate Gemini Robotics into their humanoid robot, Apollo. This partnership aims to develop robots capable of completing real-world tasks with unprecedented efficiency and safety.  

Gemini Robotics-ER is a model specifically designed to enhance spatial reasoning capabilities. This model allows roboticists to connect Gemini’s advanced reasoning abilities with their existing low-level controllers, enabling tasks such as object detection, 3D perception, and precise manipulation.  

For instance, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp for picking it up by the handle and plan a safe trajectory to approach it. The model achieves a 2x-3x success rate compared to Gemini 2.0 in end-to-end tasks, making it a powerful tool for roboticists.  

Prioritising safety and responsibility

Google DeepMind says that safety is a top priority and has subsequently implemented a layered approach to ensure the physical safety of robots and the people around them. This includes integrating classic safety measures – such as collision avoidance and force limitation – with Gemini’s advanced reasoning capabilities.

To further advance safety research, Google DeepMind is releasing the ASIMOV dataset, a new resource for evaluating and improving semantic safety in embodied AI and robotics. The dataset is inspired by Isaac Asimov’s Three Laws of Robotics and aims to help researchers develop robots that are safer and more aligned with human values.

Google DeepMind is working with a select group of testers – including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools – to explore the capabilities of Gemini Robotics-ER. Google says these collaborations will help refine the models and guide their development towards real-world applications.

By combining advanced reasoning with physical action, Google DeepMind is paving the way for a future where robots can assist humans in a wide range of tasks—from household chores to industrial applications.  

See also: ‘Golf bag’ of robots will tackle hazardous environments

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: , , , , , , , ,



Source link


Google DeepMind has introduced Gemini Robotics, new AI models designed to bring advanced reasoning and physical capabilities to robots.

Built on the foundation of Gemini 2.0, the new models represent a leap towards creating robots that can understand and interact with the physical world in ways that were previously confined to the digital realm.  

The new models, Gemini Robotics and Gemini Robotics-ER (Embodied Reasoning), aim to enable robots to perform a wider range of real-world tasks by combining advanced vision, language, and action capabilities.

Gemini Robotics aims to bridge the digital-physical gap 

Until now, AI models like Gemini have excelled in multimodal reasoning across text, images, audio, and video. However, their abilities have largely been limited to digital applications.

To make AI models truly useful in everyday life, they must possess “embodied reasoning” (i.e., the ability to comprehend and react to the physical world, much like humans do.)

Gemini Robotics addresses this challenge by introducing physical actions as a new output modality, allowing the model to directly control robots. Meanwhile, Gemini Robotics-ER enhances spatial understanding—enabling roboticists to integrate the model’s reasoning capabilities into their own systems.  

These models represent a foundational step towards a new generation of helpful robots. By combining advanced AI with physical action, Google DeepMind is unlocking the potential for robots to assist in a variety of real-world settings, from homes to workplaces.

Key features of Gemini Robotics  

Gemini Robotics is designed with three core qualities in mind: generality, interactivity, and dexterity. These attributes ensure that the model can adapt to diverse situations, respond to dynamic environments, and perform complex tasks with precision.

Generality

Gemini Robotics leverages the world-understanding capabilities of Gemini 2.0 to generalise across novel situations. This means the model can tackle tasks it has never encountered before, adapt to new objects, and operate in unfamiliar environments. According to Google DeepMind, Gemini Robotics more than doubles the performance of state-of-the-art vision-language-action models on generalisation benchmarks.

Interactivity

To function effectively in the real world, robots must seamlessly interact with people and their surroundings. Gemini Robotics excels in this area, thanks to its advanced language understanding capabilities. The model can interpret and respond to natural language instructions, monitor its environment for changes, and adjust its actions accordingly.  

For example, if an object slips from a robot’s grasp or is moved by a person, Gemini Robotics can quickly replan and continue the task. This level of adaptability is crucial for real-world applications, where unpredictability is the norm.

Dexterity

Many everyday tasks require fine motor skills that have traditionally been challenging for robots. Gemini Robotics, however, demonstrates remarkable dexterity, enabling it to perform complex, multi-step tasks such as folding origami or packing a snack into a Ziploc bag.

Multiple embodiments for diverse applications 

One of the standout features of Gemini Robotics is its ability to adapt to different types of robots. While the model was primarily trained using data from the bi-arm robotic platform ALOHA 2, it has also been successfully tested on other platforms, including the Franka arms used in academic labs.  

Google DeepMind is also collaborating with Apptronik to integrate Gemini Robotics into their humanoid robot, Apollo. This partnership aims to develop robots capable of completing real-world tasks with unprecedented efficiency and safety.  

Gemini Robotics-ER is a model specifically designed to enhance spatial reasoning capabilities. This model allows roboticists to connect Gemini’s advanced reasoning abilities with their existing low-level controllers, enabling tasks such as object detection, 3D perception, and precise manipulation.  

For instance, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp for picking it up by the handle and plan a safe trajectory to approach it. The model achieves a 2x-3x success rate compared to Gemini 2.0 in end-to-end tasks, making it a powerful tool for roboticists.  

Prioritising safety and responsibility

Google DeepMind says that safety is a top priority and has subsequently implemented a layered approach to ensure the physical safety of robots and the people around them. This includes integrating classic safety measures – such as collision avoidance and force limitation – with Gemini’s advanced reasoning capabilities.

To further advance safety research, Google DeepMind is releasing the ASIMOV dataset, a new resource for evaluating and improving semantic safety in embodied AI and robotics. The dataset is inspired by Isaac Asimov’s Three Laws of Robotics and aims to help researchers develop robots that are safer and more aligned with human values.

Google DeepMind is working with a select group of testers – including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools – to explore the capabilities of Gemini Robotics-ER. Google says these collaborations will help refine the models and guide their development towards real-world applications.

By combining advanced reasoning with physical action, Google DeepMind is paving the way for a future where robots can assist humans in a wide range of tasks—from household chores to industrial applications.  

See also: ‘Golf bag’ of robots will tackle hazardous environments

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: , , , , , , , ,



Source link

More from author

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Related posts

Advertismentspot_img

Latest posts

Space Norway selects Telesat Lightspeed Low Earth Orbit (LEO) connectivity solution

Telesat, one of the world’s largest satellite operators, and Space Norway, one of Northern Europe’s satellite operator, has announced the parties have signed...

Friday Squid Blogging: SQUID Band

Friday Squid Blogging: SQUID Band A bagpipe and drum band: SQUID transforms traditional Bagpipe and Drum Band entertainment into a multi-sensory rush of excitement, featuring...

Moving from AI Experiments to Company-Wide Process Integration

Are you struggling to move beyond random AI experiments? Wondering how to truly integrate AI into your core business processes?  In this article,...

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!