TMR-VLA:Vision-Language-Action Model for Magnetic Motion Control of Tri-leg Silicone-based Soft Robot

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the challenge of accurately mapping natural language instructions to low-level voltage control for magnetically actuated tripod silicone soft robots operating in complex in vivo environments. To this end, the authors propose TMR-VLA, an end-to-end multimodal system that, for the first time, applies vision–language–action joint modeling to magnetically controlled multi-legged soft robots. By integrating multi-frame endoscopic visual inputs with natural language commands, the system leverages embodied endoscopic localization to directly generate time-series voltage signals, enabling unified language-conditioned dynamic deformation, shape-adaptive navigation, and motion control. Experimental results demonstrate that the system achieves an average success rate of 74% on composite locomotion tasks and accurately predicts the effects of voltage variations on the robot’s dynamic behavior.

Technology Category

Application Category

📝 Abstract

In-vivo environments, magnetically actuated soft robots offer advantages such as wireless operation and precise control, showing promising potential for painless detection and therapeutic procedures. We developed a trileg magnetically driven soft robot (TMR) whose multi-legged design enables more flexible gaits and diverse motion patterns. For the silicone made of reconfigurable soft robots, its navigation ability can be separated into sequential motions, namely squatting, rotation, lifting a leg, walking and so on. Its motion and behavior depend on its bending shapes. To bridge motion type description and specific low-level voltage control, we introduced TMR-VLA, an end-to-end multi-modal system for a trileg magnetic soft robot capable of performing hybrid motion types, which is promising for developing a navigation ability by adapting its shape to language-constrained motion types. The TMR-VLA deploys embodied endoluminal localization ability from EndoVLA, and fuses sequential frames and natural language commands as input. Low-level voltage output is generated based on the current observation state and specific motion type description. The result shows the TMR-VLA can predict how the voltage applied to TMR will change the dynamics of a silicon-made soft robot. The TMR-VLA reached a 74% average success rate.

Problem

Research questions and friction points this paper is trying to address.

soft robot

magnetic actuation

vision-language-action

motion control

endoluminal navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language-action model

magnetic soft robot

end-to-end control