๐ค AI Summary
Vision-action models for dexterous grasping suffer from poor generalization and instability when adapting to objects with diverse geometries. Method: This paper proposes a reinforcement learningโdriven simulation data generation framework. It features a modular architecture combining parameterized reference trajectories with a PPO-based residual policy, balancing demonstration fidelity and cross-object geometric adaptability; integrates contact-aware trajectory modeling and vision-conditioned policy networks to automatically synthesize high-quality, contact-rich, geometry-diverse grasping trajectories in simulation; and leverages real-to-sim-to-real transfer alongside simulation data augmentation. Contribution/Results: Experiments demonstrate significant improvements in zero-shot grasping success rates on unseen objects, effectively mitigating the training bottleneck caused by scarcity of real-world grasping data.
๐ Abstract
This work presents reinforcement learning (RL)-driven data augmentation to improve the generalization of vision-action (VA) models for dexterous grasping. While real-to-sim-to-real frameworks, where a few real demonstrations seed large-scale simulated data, have proven effective for VA models, applying them to dexterous settings remains challenging: obtaining stable multi-finger contacts is nontrivial across diverse object shapes. To address this, we leverage RL to generate contact-rich grasping data across varied geometries. In line with the real-to-sim-to-real paradigm, the grasp skill is formulated as a parameterized and tunable reference trajectory refined by a residual policy learned via RL. This modular design enables trajectory-level control that is both consistent with real demonstrations and adaptable to diverse object geometries. A vision-conditioned policy trained on simulation-augmented data demonstrates strong generalization to unseen objects, highlighting the potential of our approach to alleviate the data bottleneck in training VA models.