XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visual grasping methods are limited to single-gripper configurations and struggle to generalize across diverse end-effectors. To address this, we propose the first vision-based grasping framework supporting real-time multi-gripper detection and zero-shot generalization. Methodologically, we design a hierarchical two-stage architecture: a Grasp Point Predictor (GPP) jointly encodes scene-wide features and gripper-specific parameters to generate candidate grasp points; an Angle-Width Predictor (AWP) refines grasp pose estimation using local patch features. We introduce cross-gripper contrastive learning and systematically expand a multi-gripper dataset to mitigate annotation scarcity. The framework natively supports integration with vision foundation models and provides a vision-language interface for semantic grasp specification. Experiments demonstrate state-of-the-art grasping success rates across unseen grippers, significantly faster inference than existing gripper-aware methods, and strong zero-shot generalization—validating both efficiency and broad applicability.

Technology Category

Application Category

📝 Abstract
Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp
Problem

Research questions and friction points this paper is trying to address.

Detects grasps for multiple gripper types in real-time
Addresses data scarcity through multi-gripper dataset augmentation
Enables zero-shot generalization to unseen gripper configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates multi-gripper data to solve scarcity
Uses two-stage hierarchical architecture for grasp detection
Employs contrastive learning for zero-shot generalization
🔎 Similar Papers
No similar papers found.
Y
Yeonseo Lee
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
J
Jungwook Mun
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
H
Hyosup Shin
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
G
Guebin Hwang
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
J
Junhee Nam
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Taeyeop Lee
Taeyeop Lee
KAIST
Computer VisionRobotics
S
Sungho Jo
Korea Advanced Institute of Science and Technology, Daejeon, South Korea