DexTOG: Learning Task-Oriented Dexterous Grasp With Language Condition

📅 2025-02-01
🏛️ IEEE Robotics and Automation Letters
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Task-oriented grasping (TOG) with high-DOF dexterous hands faces challenges in satisfying task constraints within a high-dimensional, multimodal solution space while interpreting natural language instructions. Method: We propose DexDiffu—the first end-to-end differentiable, language-conditioned diffusion model for dexterous grasping generation. It introduces language embeddings into grasp synthesis for the first time and leverages DexTOG-80K, a large-scale TOG dataset comprising 80 objects × 5 tasks, integrating language-action aligned representations, Shadow-hand-based simulation and real-world data, and task-driven grasp quality evaluation. Contribution/Results: DexDiffu achieves high success rates in both simulated and physical experiments, significantly improves cross-task generalization, and establishes a new benchmark for dexterous TOG—demonstrating robust language grounding, task compliance, and real-world deployability.

Technology Category

Application Category

📝 Abstract
This study introduces a novel language-guided diffusion-based learning framework, DexTOG, aimed at advancing the field of task-oriented grasping (TOG) with dexterous hands. Unlike existing methods that mainly focus on 2-finger grippers, this research addresses the complexities of dexterous manipulation, where the system must identify non-unique optimal grasp poses under specific task constraints, cater to multiple valid grasps, and search in a high degree-of-freedom configuration space in grasp planning. The proposed DexTOG includes a diffusion-based grasp pose generation model, DexDiffu, and a data engine to support the DexDiffu. By leveraging DexTOG, we also proposed a new dataset, DexTOG-80K, which was developed using a shadow robot hand to perform various tasks on 80 objects from five categories, showcasing the dexterity and multi-tasking capabilities of the robotic hand. This research not only presents a significant leap in dexterous TOG but also provides a comprehensive dataset and simulation validation, setting a new benchmark in robotic manipulation research.
Problem

Research questions and friction points this paper is trying to address.

Develops language-guided dexterous grasp learning for complex tasks
Addresses high-DOF grasp planning with multiple valid solutions
Creates dataset for multi-task dexterous manipulation benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided diffusion-based learning framework
Diffusion-based grasp pose generation model
Comprehensive dataset with multi-tasking validation
🔎 Similar Papers
No similar papers found.
J
Jieyi Zhang
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Wenqiang Xu
Wenqiang Xu
Shanghai Jiao Tong University
Computer visionRobotics
Zhenjun Yu
Zhenjun Yu
Shanghai Jiao Tong University, MVIG
RoboticsTactile Sensing3D Vision3D Reconstruction
Pengfei Xie
Pengfei Xie
Ruijin Hospital, Shanghai Jiao Tong University School of Medicine
General Surgery
Tutian Tang
Tutian Tang
Shanghai Jiao Tong University
Robotics
C
Cewu Lu
Qing Yuan Research Institute and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China