🤖 AI Summary
Dual-arm grasping of large or complex objects faces challenges including insufficient grasp stability, high collision risk, and poor generalization. To address these, we propose the first end-to-end SE(3)×SE(3) diffusion model specifically designed for dual-arm grasping, introducing classifier-free guidance to this task for the first time while eliminating reliance on region priors or heuristic rules. Our method jointly optimizes dual-arm grasp pose pairs during generation by integrating geometric feature extraction, force-closure stability criteria, and real-time collision awareness. Experiments demonstrate significant improvements in both simulation and real-world point clouds: +12.7% force-closure rate and +18.3% collision-free rate. The approach successfully enables a heterogeneous dual-arm system to reliably grasp and lift unseen objects, achieving physical plausibility, strong generalization, and operational safety.
📝 Abstract
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the SE(3) x SE(3) space. Our key insight is that stability and collision can be enforced more effectively by guiding the diffusion process with classifier signals, rather than relying on explicit region detection or object priors. To this end, DAGDiff integrates geometry-, stability-, and collision-aware guidance terms that steer the generative process toward grasps that are physically valid and force-closure compliant. We comprehensively evaluate DAGDiff through analytical force-closure checks, collision analysis, and large-scale physics-based simulations, showing consistent improvements over previous work on these metrics. Finally, we demonstrate that our framework generates dual-arm grasps directly on real-world point clouds of previously unseen objects, which are executed on a heterogeneous dual-arm setup where two manipulators reliably grasp and lift them.