HGDiffuser: Efficient Task-Oriented Grasp Generation via Human-Guided Grasp Diffusion Models

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing task-oriented grasping (TOG) methods rely on an inefficient two-stage paradigm: blind sampling of 6-DoF gripper poses followed by constraint-based filtering using human demonstrations, resulting in low sampling efficiency and poor stability. This work proposes DiffGrasp, the first single-stage end-to-end diffusion framework for TOG that explicitly incorporates human grasp demonstrations into the diffusion sampling process, enabling direct generation of physically stable, task-constrained 6-DoF parallel-jaw grasps. We adopt a Diffusion Transformer (DiT) as the backbone architecture to enhance pose representation learning and modeling fidelity. Experiments demonstrate significant improvements in grasp success rate and generation efficiency across multi-object, multi-task scenarios, while achieving more robust imitation of human grasping strategies. DiffGrasp establishes a new, efficient, and interpretable paradigm for TOG, bridging high-level task specifications with low-level physical feasibility in a unified generative framework.

Technology Category

Application Category

📝 Abstract
Task-oriented grasping (TOG) is essential for robots to perform manipulation tasks, requiring grasps that are both stable and compliant with task-specific constraints. Humans naturally grasp objects in a task-oriented manner to facilitate subsequent manipulation tasks. By leveraging human grasp demonstrations, current methods can generate high-quality robotic parallel-jaw task-oriented grasps for diverse objects and tasks. However, they still encounter challenges in maintaining grasp stability and sampling efficiency. These methods typically rely on a two-stage process: first performing exhaustive task-agnostic grasp sampling in the 6-DoF space, then applying demonstration-induced constraints (e.g., contact regions and wrist orientations) to filter candidates. This leads to inefficiency and potential failure due to the vast sampling space. To address this, we propose the Human-guided Grasp Diffuser (HGDiffuser), a diffusion-based framework that integrates these constraints into a guided sampling process. Through this approach, HGDiffuser directly generates 6-DoF task-oriented grasps in a single stage, eliminating exhaustive task-agnostic sampling. Furthermore, by incorporating Diffusion Transformer (DiT) blocks as the feature backbone, HGDiffuser improves grasp generation quality compared to MLP-based methods. Experimental results demonstrate that our approach significantly improves the efficiency of task-oriented grasp generation, enabling more effective transfer of human grasping strategies to robotic systems. To access the source code and supplementary videos, visit https://sites.google.com/view/hgdiffuser.
Problem

Research questions and friction points this paper is trying to address.

Improves efficiency in task-oriented robotic grasp generation.
Enhances grasp stability and sampling efficiency using human-guided models.
Directly generates 6-DoF grasps, eliminating exhaustive sampling stages.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-guided Grasp Diffuser (HGDiffuser) framework
Single-stage 6-DoF task-oriented grasp generation
Diffusion Transformer (DiT) blocks for feature backbone
🔎 Similar Papers
No similar papers found.
D
Dehao Huang
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China; Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
Wenlong Dong
Wenlong Dong
Southern University of Science and Technology
Robotics、Perception
C
Chao Tang
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China; Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
H
Hong Zhang
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China; Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China