Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenge of learning dexterous manipulation for humanoid robots, stemming from morphological discrepancies between humans and robots and noise in motion-capture (MoCap) demonstrations. We propose a single-stage end-to-end framework that treats MoCap demonstrations as soft guidance—rather than hard constraints—and jointly optimizes action mapping and trajectory tracking. A key innovation is an adaptive exploration mechanism confined within reference bounds derived from MoCap data, effectively converting noisy demonstrations into spatial constraints while avoiding error accumulation and suboptimal data utilization inherent in multi-stage approaches. Our method integrates vision-conditioned skill generation, high-dimensional latent skill representations, and policy distillation to enable robot-specific policy learning. Trained on large-scale hand-object MoCap data, it achieves significantly improved robustness to demonstration noise and higher task success rates. Real-robot deployment demonstrates strong generalization across unseen objects and proficiency in diverse dexterous manipulation tasks.

Technology Category

Application Category

📝 Abstract

Hand-object motion-capture (MoCap) repositories offer large-scale, contact-rich demonstrations and hold promise for scaling dexterous robotic manipulation. Yet demonstration inaccuracies and embodiment gaps between human and robot hands limit the straightforward use of these data. Existing methods adopt a three-stage workflow, including retargeting, tracking, and residual correction, which often leaves demonstrations underused and compound errors across stages. We introduce Dexplore, a unified single-loop optimization that jointly performs retargeting and tracking to learn robot control policies directly from MoCap at scale. Rather than treating demonstrations as ground truth, we use them as soft guidance. From raw trajectories, we derive adaptive spatial scopes, and train with reinforcement learning to keep the policy in-scope while minimizing control effort and accomplishing the task. This unified formulation preserves demonstration intent, enables robot-specific strategies to emerge, improves robustness to noise, and scales to large demonstration corpora. We distill the scaled tracking policy into a vision-based, skill-conditioned generative controller that encodes diverse manipulation skills in a rich latent representation, supporting generalization across objects and real-world deployment. Taken together, these contributions position Dexplore as a principled bridge that transforms imperfect demonstrations into effective training signals for dexterous manipulation.

Problem

Research questions and friction points this paper is trying to address.

Addressing demonstration inaccuracies and embodiment gaps in human-robot hand motion transfer

Overcoming limitations of three-stage workflows that underuse demonstrations and compound errors

Developing scalable neural control policies directly from motion-capture data for dexterous manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified single-loop optimization for retargeting and tracking

Adaptive spatial scopes with reinforcement learning training

Vision-based skill-conditioned generative controller encoding manipulation

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids

2024-07-16Neural Information Processing SystemsCitations: 16