MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the limited adaptability of robotic grasping to objects with diverse shapes and sizes, this paper proposes a voxel-based multi-scale contrastive learning framework for grasp planning. The method employs a dual-Transformer architecture—comprising an Insight Transformer and an Empower Transformer—that enables query-driven interaction between high-level semantic and low-level geometric features, facilitating cross-scale feature fusion. It further integrates multi-scale voxel convolutions with a contrastive learning objective to jointly optimize fine-grained geometric detail perception and holistic structural modeling. This design significantly enhances feature discriminability and cross-scale consistency. Evaluated on both simulated and real-world tabletop clutter clearing tasks, the approach achieves substantially higher grasp success rates and robustness compared to state-of-the-art baselines and ablation variants, demonstrating its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Adapting robotic grasping to varying object shapes and sizes

Balancing fine geometric details and overall structures in grasping

Ensuring multi-scale feature consistency via contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale feature extraction for grasping

Contrastive learning for feature enhancement

Query-based interaction with Insight Transformer

🔎 Similar Papers

No similar papers found.

Amazon

The base pay for this position ranges from $65.38/hr in our lowest geographic market up to $107.40/hr in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits.

Seattle, Washington, USA / North Reading, Massachusetts, USA / Westboro, Wisconsin, USA

Research Scientist, Sensor and Systems Robotics (PhD)