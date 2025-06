Google DeepMind introduced a vision language action (VLA) model that runs locally on robotic devices, without accessing a data network.

The new Gemini Robotics On-Device robotics foundation model features general-purpose dexterity and fast task adaptation, the company said in a Tuesday (June 24) blog post.

“Since the model operates independent of a data network, it’s helpful for latency sensitive applications and ensures robustness in environments with intermittent or zero connectivity,” Google DeepMind Senior Director and Head of Robotics Carolina Parada said in the post.

Building on the task generalization and dexterity capabilities of Gemini Robotics, which was introduced in March, Gemini Robotics On-Device is meant for bi-arm robots and is designed to enable rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning, according to the post.

The model follows natural language instructions and is dexterous enough to perform tasks like unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing and assembling products, per the post.

It is also Google DeepMind’s first VLA model that is available for fine-tuning, per the post.

“While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” Parada said in the post. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.”

Google DeepMind’s Gemini Robotics is one of several companies’ efforts to develop humanoid robots that can do general tasks, PYMNTS reported in March.

Robotics are in fashion as in Silicon Valley as large language models are giving robots the capability to understand natural language commands and do complex tasks.

The company’s advancements in Gemini Robotics show that the decision to make Gemini multimodal — taking and generating text, images and audio — is the path toward better reasoning. Gemini’s multimodality can spawn a whole new genre of consumer products for Google, PYMNTS reported in April.

Several other companies are also developing AI-powered robots demonstrating advancements in general tasks, making for a crowded market, PYMNTS reported in February.

