XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing visual grasping methods are limited to single-gripper configurations and struggle to generalize across diverse end-effectors. To address this, we propose the first vision-based grasping framework supporting real-time multi-gripper detection and zero-shot generalization. Methodologically, we design a hierarchical two-stage architecture: a Grasp Point Predictor (GPP) jointly encodes scene-wide features and gripper-specific parameters to generate candidate grasp points; an Angle-Width Predictor (AWP) refines grasp pose estimation using local patch features. We introduce cross-gripper contrastive learning and systematically expand a multi-gripper dataset to mitigate annotation scarcity. The framework natively supports integration with vision foundation models and provides a vision-language interface for semantic grasp specification. Experiments demonstrate state-of-the-art grasping success rates across unseen grippers, significantly faster inference than existing gripper-aware methods, and strong zero-shot generalization—validating both efficiency and broad applicability.

Technology Category

Application Category

📝 Abstract

Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp

Problem

Research questions and friction points this paper is trying to address.

Detects grasps for multiple gripper types in real-time

Addresses data scarcity through multi-gripper dataset augmentation

Enables zero-shot generalization to unseen gripper configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates multi-gripper data to solve scarcity

Uses two-stage hierarchical architecture for grasp detection

Employs contrastive learning for zero-shot generalization

🔎 Similar Papers

No similar papers found.

Amazon

The base pay for this position ranges from $65.38/hr in our lowest geographic market up to $107.40/hr in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits.

Seattle, Washington, USA / North Reading, Massachusetts, USA / Westboro, Wisconsin, USA

Research Scientist, Sensor and Systems Robotics (PhD)