Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

πŸ“… 2026-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of non-prehensile manipulation in cluttered environments, where selective exploitation of dynamic effects amidst multi-object coupled contacts is requiredβ€”a task for which existing methods lack explicit modeling of complex interactions. To this end, we propose the Dynamics-Aware Policy Learning (DAPL) framework, which learns an explicit world model to represent contact-induced object dynamics and leverages this representation to guide conditional reinforcement learning. This approach enables the emergent acquisition of extrinsic dexterity without handcrafted heuristics or finely tuned reward functions. DAPL achieves the first end-to-end learned dexterous manipulation in clutter by explicitly modeling contact dynamics, outperforming prior methods by over 25% in success rate on unseen simulated scenarios. In real-world experiments across ten cluttered scenes, it attains an average success rate of approximately 50% and demonstrates strong sim-to-real transferability, validated through deployment in a real supermarket setting.

Technology Category

Application Category

πŸ“ Abstract
Extrinsic dexterity leverages environmental contact to overcome the limitations of prehensile manipulation. However, achieving such dexterity in cluttered scenes remains challenging and underexplored, as it requires selectively exploiting contact among multiple interacting objects with inherently coupled dynamics. Existing approaches lack explicit modeling of such complex dynamics and therefore fall short in non-prehensile manipulation in cluttered environments, which in turn limits their practical applicability in real-world environments. In this paper, we introduce a Dynamics-Aware Policy Learning (DAPL) framework that can facilitate policy learning with a learned representation of contact-induced object dynamics in cluttered environments. This representation is learned through explicit world modeling and used to condition reinforcement learning, enabling extrinsic dexterity to emerge without hand-crafted contact heuristics or complex reward shaping. We evaluate our approach in both simulation and the real world. Our method outperforms prehensile manipulation, human teleoperation, and prior representation-based policies by over 25% in success rate on unseen simulated cluttered scenes with varying densities. The real-world success rate reaches around 50% across 10 cluttered scenes, while a practical grocery deployment further demonstrates robust sim-to-real transfer and applicability.
Problem

Research questions and friction points this paper is trying to address.

extrinsic dexterity
cluttered scenes
non-prehensile manipulation
contact dynamics
object interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extrinsic dexterity
Dynamics-aware policy learning
Contact-induced dynamics
Cluttered manipulation
Sim-to-real transfer
πŸ”Ž Similar Papers
No similar papers found.
Y
Yixin Zheng
Institute of Automation, Chinese Academy of Sciences; Beijing Academy of Artificial Intelligence; Galbot
J
Jiangran Lyu
Galbot; Peking University
Y
Yifan Zhang
Galbot
Jiayi Chen
Jiayi Chen
Peking University
Robotics3D Vision
M
Mi Yan
Galbot; Peking University
Yuntian Deng
Yuntian Deng
Assistant Professor, University of Waterloo
Natural Language ProcessingMachine Learning
Xuesong Shi
Xuesong Shi
Galbot
robotic visionheterogeneous computinggraph signal processingSLAM
Xiaoguang Zhao
Xiaoguang Zhao
Tsinghua University
MEMSMicrosystemsTHzMetamaterialWireless communication
Y
Yizhou Wang
Peking University
Z
Zhizheng Zhang
Beijing Academy of Artificial Intelligence; Galbot
He Wang
He Wang
Assistant Professor of Computer Science, Peking University
Embodied AIComputer VisionRobotics