Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the vision–motor co-learning problem for in-hand manipulation tasks (e.g., unscrewing a bottle cap with a single multi-fingered dexterous hand). We propose an end-to-end vision–motor diffusion policy framework. Methodologically, we integrate AR-enhanced teleoperation to collect high-quality expert demonstrations; employ a joint HDBSCAN–GLOSH anomaly detection algorithm to automatically filter low-quality trajectories, significantly improving training robustness and cross-task generalization of the diffusion policy; and incorporate real-time inverse kinematics solving with motion retargeting for closed-loop control. The framework is successfully deployed on a physical Allegro four-fingered robotic hand, achieving high-precision, low-latency in-hand manipulation. All experimental videos and source code are publicly released.

Technology Category

Application Category

📝 Abstract

We present a framework for learning dexterous in-hand manipulation with multifingered hands using visuomotor diffusion policies. Our system enables complex in-hand manipulation tasks, such as unscrewing a bottle lid with one hand, by leveraging a fast and responsive teleoperation setup for the four-fingered Allegro Hand. We collect high-quality expert demonstrations using an augmented reality (AR) interface that tracks hand movements and applies inverse kinematics and motion retargeting for precise control. The AR headset provides real-time visualization, while gesture controls streamline teleoperation. To enhance policy learning, we introduce a novel demonstration outlier removal approach based on HDBSCAN clustering and the Global-Local Outlier Score from Hierarchies (GLOSH) algorithm, effectively filtering out low-quality demonstrations that could degrade performance. We evaluate our approach extensively in real-world settings and provide all experimental videos on the project website: https://dex-manip.github.io/

Problem

Research questions and friction points this paper is trying to address.

Learning dexterous in-hand manipulation with multifingered hands.

Enabling complex tasks like unscrewing bottle lids via teleoperation.

Filtering low-quality demonstrations to enhance policy learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visuomotor diffusion policies for dexterous manipulation

AR interface with inverse kinematics for precise control

HDBSCAN and GLOSH for demonstration outlier removal

🔎 Similar Papers

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

2024-04-24arXiv.orgCitations: 4