DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of complex multimodal perception-control coupling and ineffective tactile integration in bimanual robotic dexterous manipulation by proposing a decoupled multimodal Diffusion Transformer (DiT) strategy. The approach disentangles fusion pathways for visual, action, proprioceptive, and tactile signals through a plug-in tactile adapter, leveraging self-attention, cross-attention, adaptive layer normalization, and lightweight LoRA fine-tuning. To support this research, the authors introduce DECO-50, a large-scale bimanual manipulation dataset comprising four scenarios, 28 subtasks, over 50 hours of video (∼5 million frames), and 8,000 successful trajectories. Experimental results demonstrate that the proposed method significantly enhances both task performance and generalization capability in bimanual robotic manipulation.

Technology Category

Application Category

📝 Abstract
Overview of the Proposed DECO Framework.} DECO is a DiT-based policy that decouples multimodal conditioning. Image and action tokens interact via joint self attention, while proprioceptive states and optional conditions are injected through adaptive layer normalization. Tactile signals are injected via cross attention, while a lightweight LoRA-based adapter is used to efficiently fine-tune the pretrained policy. DECO is also accompanied by DECO-50, a bimanual dexterous manipulation dataset with tactile sensing, consisting of 4 scenarios and 28 sub-tasks, covering more than 50 hours of data, approximately 5 million frames, and 8,000 successful trajectories.
Problem

Research questions and friction points this paper is trying to address.

bimanual dexterous manipulation
multimodal fusion
tactile sensing
policy learning
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Multimodal Diffusion
Tactile Adapter
Bimanual Dexterous Manipulation
LoRA-based Fine-tuning
Diffusion Transformer
🔎 Similar Papers
2021-08-01IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 48
Xukun Li
Xukun Li
Kansas State University
computer visionmachine learningdeep learningstatistical modeling
Y
Yu Sun
Beijing Academy of Artificial Intelligence, Beijing, China
Lei Zhang
Lei Zhang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Agentic CodingReinforcement LearningLarge Language Model
B
Bosheng Huang
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Yibo Peng
Yibo Peng
Carnegie Mellon University
Code GenerationMultimodal NLPAI Agents
Y
Yuan Meng
Beijing Academy of Artificial Intelligence, Beijing, China
H
Haojun Jiang
Department of Computer Science and Technology, Tsinghua University, Beijing, China
S
Shaoxuan Xie
Beijing Academy of Artificial Intelligence, Beijing, China
G
Guocai Yao
Beijing Academy of Artificial Intelligence, Beijing, China
A
A. Knoll
School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
Zhenshan Bing
Zhenshan Bing
Nanjing University / Technical University of Munich
Robotics
Xinlong Wang
Xinlong Wang
Beijing Academy of Artificial Intelligence
Computer VisionFoundation Models
Z
Zhenguo Sun
Beijing Academy of Artificial Intelligence, Beijing, China