🤖 AI Summary
Dexterous hand reorientation of symmetric or textureless objects in real-world environments suffers from heavy reliance on manual hyperparameter tuning, inaccurate pose estimation, and poor sim-to-real transfer robustness. Method: We propose a skill-driven hierarchical reinforcement learning framework: a high-level policy dynamically composes pre-trained low-level rotational skills based on proprioceptive feedback and control errors; a low-level module introduces a recursive pose estimator leveraging only joint encoder readings and execution error, enabling continuous pose tracking for symmetric and textureless objects. Contribution/Results: Our approach requires no modification to reward functions, task specifications, or system configurations, significantly reducing human intervention during sim-to-real transfer. Experiments demonstrate zero-shot, highly robust object reorientation across diverse complex objects, with markedly improved out-of-distribution disturbance robustness compared to end-to-end methods.
📝 Abstract
Learning policies in simulation and transferring them to the real world has become a promising approach in dexterous manipulation. However, bridging the sim-to-real gap for each new task requires substantial human effort, such as careful reward engineering, hyperparameter tuning, and system identification. In this work, we present a system that leverages low-level skills to address these challenges for more complex tasks. Specifically, we introduce a hierarchical policy for in-hand object reorientation based on previously acquired rotation skills. This hierarchical policy learns to select which low-level skill to execute based on feedback from both the environment and the low-level skill policies themselves. Compared to learning from scratch, the hierarchical policy is more robust to out-of-distribution changes and transfers easily from simulation to real-world environments. Additionally, we propose a generalizable object pose estimator that uses proprioceptive information, low-level skill predictions, and control errors as inputs to estimate the object pose over time. We demonstrate that our system can reorient objects, including symmetrical and textureless ones, to a desired pose.