Tabero: Learning Gentle Manipulation with Closed-Loop Force Feedback from Vision, Touch, and Language

πŸ“… 2026-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing vision-language-action models struggle to perform delicate, gentle manipulation due to the absence of aligned vision-tactile-language data and closed-loop force feedback mechanisms. This work proposes the Tabero benchmark and the Tabero-VTLA architecture, which employs a data-efficient pipeline to generate diverse multimodal tasks and introduces a decoupled force-position command interface integrated with a fixed hybrid controller to enable high-bandwidth force-aware manipulation. The method establishes the first evaluation framework for gentle robotic interaction that coherently integrates vision, touch, and language, complemented by a multidimensional assessment protocol. It achieves over 70% reduction in average grasping force according to linguistic instructions while maintaining high task success rates, significantly enhancing the robot’s fine-grained control over interaction forces.
πŸ“ Abstract
Tactile sensing is essential for robots to achieve human-like gentle manipulation. However, existing Vision-Language-Action (VLA) models struggle to exploit tactile feedback for gentle manipulation due to scarce aligned vision-tactile-language data and the lack of effective closed-loop force feedback mechanisms. To address these challenges, we introduce Tabero, a benchmark and model suite for gentle, language-conditioned robotic manipulation that demands fine-grained contact force perception. First, the Tabero benchmark addresses the scarcity of tactile data by presenting a data-efficient pipeline that repurposes open-source robot manipulation trajectories to generate diverse vision-tactile-language tasks, and establishes a multidimensional evaluation protocol that measures task success alongside physical interaction quality. Second, we propose Tabero-VTLA, an architecture with a decoupled force-position command interface; the resulting force-position commands are executed by a fixed hybrid controller to enable real-time, force-aware manipulation. Evaluated on Tabero, our model maintains high task success while reducing average grip force by over 70\% under gentle instructions, demonstrating its ability to modulate interaction forces based on multimodal experience. Our code is publicly available at https://github.com/NathanWu7/Tabero.
Problem

Research questions and friction points this paper is trying to address.

gentle manipulation
tactile sensing
closed-loop force feedback
Vision-Language-Action models
force perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

gentle manipulation
closed-loop force feedback
vision-tactile-language alignment
force-position decoupling
multimodal robotic control