DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing vision-language-action (VLA) models suffer from insufficient reliability and adaptability in dexterous manipulation tasks and struggle with efficient post-training. This work proposes DexHiL, the first human-in-the-loop framework tailored for arm-hand coordinated dexterous manipulation. DexHiL enables real-time human intervention during execution via a lightweight teleoperation interface and introduces an intervention-aware data sampling strategy to optimize post-training. Notably, it unifies human-in-the-loop intervention for both robotic arms and dexterous hands within a single system. Real-robot experiments demonstrate that DexHiL improves average task success rates by 25% over purely offline fine-tuning baselines across multiple dexterous manipulation tasks.

Technology Category

Application Category

📝 Abstract

While Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities in robotic manipulation, deploying them on specific and complex downstream tasks still demands effective post-training. In parallel, Human-in-the-Loop (HiL) learning has proven to be a powerful mechanism for refining robot policies. However, extending this paradigm to dexterous manipulation remains challenging: multi-finger control is high-dimensional, contact-intensive, and exhibits execution distributions that differ markedly from standard arm motions, leaving existing dexterous VLA systems limited in reliability and adaptability. We present DexHiL, the first integrated arm-hand human-in-the-loop framework for dexterous VLA models, enabling coordinated interventions over the arm and the dexterous hand within a single system. DexHiL introduces an intervention-aware data sampling strategy that prioritizes corrective segments for post-training, alongside a lightweight teleoperation interface that supports instantaneous human corrections during execution. Real-robot experiments demonstrate that DexHiL serves as an effective post-training framework, yielding a substantial performance leap, outperforming standard offline-only fine-tuning baselines by an average of 25% in success rates across distinct tasks. Project page: https://chenzhongxi-sjtu.github.io/dexhil/

Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation

Vision-Language-Action models

Human-in-the-Loop

post-training

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-Loop

Dexterous Manipulation

Vision-Language-Action Model

Intervention-Aware Sampling

Teleoperation

🔎 Similar Papers

No similar papers found.