Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the challenge of abrupt gesture transitions in high-degree-of-freedom dexterous manipulation caused by command mismatches when humans intervene in vision–language–action (VLA) models, often leading to task failure. The authors propose HandITL, a novel method that introduces a seamless human–robot intervention mechanism by integrating human teleoperation with autonomous policies through interactive imitation learning. Implemented on a bimanual robotic system, HandITL enables smooth action transitions and consistent intent alignment. The approach dramatically suppresses gesture jitter in high-dimensional action spaces: compared to direct human takeover, it reduces motion jitter by 99.8%, decreases grasp failure rates by 87.5%, and shortens task completion time by 19.1%. Furthermore, the learned policy demonstrates an average performance improvement of 19% across three long-horizon manipulation tasks.
📝 Abstract
Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human takeover data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the takeover moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with direct teleoperation takeover, HandITL reduces takeover jitter by 99.8% and preserves robust post-takeover manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect intervention data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.
Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation
Vision-Language-Action (VLA)
gesture jumps
Interactive Imitation Learning
high-DoF robotic hands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hand-in-the-Loop
Dexterous Manipulation
Interactive Imitation Learning
Gesture Jump Mitigation
Bimanual Coordination
🔎 Similar Papers
No similar papers found.
Zhuohang Li
Zhuohang Li
Vanderbilt University
L
Liqun Huang
ByteDance Seed
W
Wei Xu
ByteDance Seed
Z
Zhengming Zhu
ByteDance Seed
N
Nie Lin
ByteDance Seed; The University of Tokyo
Xiao Ma
Xiao Ma
ByteDance Seed
Robot LearningReinforcement LearningRobotics
Xinjun Sheng
Xinjun Sheng
Shanghai Jiao Tong University
BiomechatronicsBioroboticsMicroelectronics Packaging
R
Ruoshi Wen
ByteDance Seed