🤖 AI Summary
To address the challenge of robotic grasping under severe occlusion in cluttered scenes, this paper proposes a target-oriented joint push-grasp policy learning framework. Methodologically, we introduce the first SE(2)-equivariant push-grasp network architecture, explicitly encoding planar motion symmetry; we further design a lightweight joint training strategy optimized via grasp confidence scoring, decoupling the co-learning of pushing and grasping actions. Our approach integrates SE(2)-equivariant convolutional layers, grasp score modeling, and end-to-end vision-to-action policy learning. Experiments demonstrate that our method improves grasping success rates by 49% in simulation and 35% on real hardware over strong baselines, validating the effectiveness of geometric structure modeling and active scene rearrangement strategies.
📝 Abstract
Goal-conditioned robotic grasping in cluttered environments remains a challenging problem due to occlusions caused by surrounding objects, which prevent direct access to the target object. A promising solution to mitigate this issue is combining pushing and grasping policies, enabling active rearrangement of the scene to facilitate target retrieval. However, existing methods often overlook the rich geometric structures inherent in such tasks, thus limiting their effectiveness in complex, heavily cluttered scenarios. To address this, we propose the Equivariant Push-Grasp Network, a novel framework for joint pushing and grasping policy learning. Our contributions are twofold: (1) leveraging SE(2)-equivariance to improve both pushing and grasping performance and (2) a grasp score optimization-based training strategy that simplifies the joint learning process. Experimental results show that our method improves grasp success rates by 49% in simulation and by 35% in real-world scenarios compared to strong baselines, representing a significant advancement in push-grasp policy learning.