🤖 AI Summary
This work addresses the challenge of jointly reasoning about geometric constraints, kinematic limits, and nonsmooth contact dynamics in dexterous manipulation by proposing a hierarchical RL-MPC framework. The high-level policy employs reinforcement learning to predict object-centric “contact intentions”—specifying desired contact locations and target poses—while the low-level model predictive controller (MPC), grounded in an implicit contact model, optimizes local contact patterns and generates robust actions accordingly. This approach achieves, for the first time, zero-shot sim-to-real transfer in non-grasping tasks such as pushing and 3D reorientation, attaining near-perfect success rates. Moreover, it reduces training data requirements by an order of magnitude compared to end-to-end methods, substantially enhancing both generalization and robustness.
📝 Abstract
A key challenge in contact-rich dexterous manipulation is the need to jointly reason over geometry, kinematic constraints, and intricate, nonsmooth contact dynamics. End-to-end visuomotor policies bypass this structure, but often require large amounts of data, transfer poorly from simulation to reality, and generalize weakly across tasks/embodiments. We address those limitations by leveraging a simple insight: dexterous manipulation is inherently hierarchical - at a high level, a robot decides where to touch (geometry) and move the object (kinematics); at a low level it determines how to realize that plan through contact dynamics. Building on this insight, we propose a hierarchical RL--MPC framework in which a high-level reinforcement learning (RL) policy predicts a contact intention, a novel object-centric interface that specifies (i) an object-surface contact location and (ii) a post-contact object-level subgoal pose. Conditioned on this contact intention, a low-level contact-implicit model predictive control (MPC) optimizes local contact modes and replans with contact dynamics to generate robot actions that robustly drive the object toward each subgoal. We evaluate the framework on non-prehensile tasks, including geometry-generalized pushing and object 3D reorientation. It achieves near-100% success with substantially reduced data (10x less than end-to-end baselines), highly robust performance, and zero-shot sim-to-real transfer.