🤖 AI Summary
To address the challenges of underactuation, contact uncertainty, and dynamic obstacle avoidance in non-prehensile manipulation within cluttered environments, this paper proposes a robust pushing-based physical control method leveraging position-aware attention. Our approach is the first to introduce position-based attention into non-grasping manipulation, explicitly encoding target orientation and obstacle avoidance without requiring predefined global trajectories or explicit contact modeling. We design a multimodal reinforcement learning classification framework that fuses visual–pose encodings with spatial attention, enabling efficient sim-to-real transfer onto a real-world KUKA iiwa robotic arm. Experiments in complex cluttered scenes—containing both static and dynamic obstacles—demonstrate significant improvements: enhanced target pose accuracy, a 62% reduction in collision rate, and superior generalization performance over existing baseline methods.
📝 Abstract
In complex scenarios where typical pick-and-place techniques are insufficient, often non-prehensile manipulation can ensure that a robot is able to fulfill its task. However, non-prehensile manipulation is challenging due to its underactuated nature with hybrid-dynamics, where a robot needs to reason about an object's long-term behavior and contact-switching, while being robust to contact uncertainty. The presence of clutter in the workspace further complicates this task, introducing the need to include more advanced spatial analysis to avoid unwanted collisions. Building upon prior work on reinforcement learning with multimodal categorical exploration for planar pushing, we propose to incorporate location-based attention to enable robust manipulation in cluttered scenes. Unlike previous approaches addressing this obstacle avoiding pushing task, our framework requires no predefined global paths and considers the desired target orientation of the manipulated object. Experimental results in simulation as well as with a real KUKA iiwa robot arm demonstrate that our learned policy manipulates objects successfully while avoiding collisions through complex obstacle configurations, including dynamic obstacles, to reach the desired target pose.