MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited adaptability of robotic grasping to objects with diverse shapes and sizes, this paper proposes a voxel-based multi-scale contrastive learning framework for grasp planning. The method employs a dual-Transformer architecture—comprising an Insight Transformer and an Empower Transformer—that enables query-driven interaction between high-level semantic and low-level geometric features, facilitating cross-scale feature fusion. It further integrates multi-scale voxel convolutions with a contrastive learning objective to jointly optimize fine-grained geometric detail perception and holistic structural modeling. This design significantly enhances feature discriminability and cross-scale consistency. Evaluated on both simulated and real-world tabletop clutter clearing tasks, the approach achieves substantially higher grasp success rates and robustness compared to state-of-the-art baselines and ablation variants, demonstrating its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Adapting robotic grasping to varying object shapes and sizes
Balancing fine geometric details and overall structures in grasping
Ensuring multi-scale feature consistency via contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale feature extraction for grasping
Contrastive learning for feature enhancement
Query-based interaction with Insight Transformer
🔎 Similar Papers
No similar papers found.
Qingyu Fan
Qingyu Fan
Institue of Automation, Chinese Academy of Sciences
Computer VisionComputer GraphicsEmbodied AI
Yinghao Cai
Yinghao Cai
Institute of Automation, Chinese Academy of Sciences
C
Chao Li
Qiyuan Lab.
C
Chunting Jiao
Qiyuan Lab.
Xudong Zheng
Xudong Zheng
Qiyuan Lab.
T
Tao Lu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
B
Bin Liang
Qiyuan Lab.
S
Shuo Wang
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences