Google's DeepMind Uses Gemini to Train Robots for Complex Tasks and Navigation

TapTechNews July 13th news, The Verge, a technology media, reported yesterday that Google's DeepMind team is using Gemini to train its robots to enable them to complete more complex tasks and move freely in complex environments.

Googles DeepMind Uses Gemini to Train Robots for Complex Tasks and Navigation_0

The DeepMind team has published the latest research paper, using the context window (reaching 2 million tokens) of Gemini 1.5Pro to allow users to interact with the RT-2 robot more easily using natural language instructions.

TapTechNews note: The context window refers to the size range of the previous token or text fragment that a language model considers when making predictions or generating text.

Its working principle is to shoot a video tour of the specified area (such as home or office space), and researchers use Gemini 1.5Pro to let the robot watch the video to understand the environment; then, the robot can execute commands through language and/or image output according to the observed situation.

For example, when a user shows a mobile phone to the robot and asks Where can I charge it?, the robot will guide the user to find the indoor power socket.

DeepMind said that in an operation area of 9,000 square feet (TapTechNews note: about 836.13 square meters), after upgrading the robot with Gemini, more than 50 user instructions were tested, with a success rate of up to 90%.

Researchers also found preliminary evidence that Gemini 1.5Pro can allow the robot to plan how to complete instructions other than navigation.

For example, when a user with many cola cans on the table asks the robot if there is their favorite drink, Gemini knows that the robot should navigate to the refrigerator, check if there is cola, and then return to the user to report the result. DeepMind said it plans to further study these results.

Likes