ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited flexibility of traditional robot programming and the shortcomings of existing learning-based assembly methods in positional generalization, multi-stage task design, and multi-skill integration. The authors propose an end-to-end autoregressive trajectory generation framework that directly maps multimodal inputs—including RGB-D images, natural language instructions, and proprioceptive signals—onto manipulation trajectories, thereby abandoning the conventional paradigm of decoupled perception and control. By fusing multimodal features to interpret task semantics and leveraging autoregressive sequence modeling to produce temporally coherent trajectories, the approach integrates a Mixture-of-Experts (MoE) architecture to enable efficient multi-skill learning within a single model. Evaluated on eight distinct skills for pressure relief valve assembly, the method achieves a 96.3% average grasping success rate and a 91.8% overall success rate in simulation, demonstrating strong generalization and practical applicability to real-world scenarios.

Technology Category

Application Category

📝 Abstract
Flexible manufacturing requires robot systems that can adapt to constantly changing tasks, objects, and environments. However, traditional robot programming is labor-intensive and inflexible, while existing learning-based assembly methods often suffer from weak positional generalization, complex multi-stage designs, and limited multi-skill integration capability. To address these issues, this paper proposes ATG-MoE, an end-to-end autoregressive trajectory generation method with mixture of experts for assembly skill learning from demonstration. The proposed method establishes a closed-loop mapping from multi-modal inputs, including RGB-D observations, natural language instructions, and robot proprioception to manipulation trajectories. It integrates multi-modal feature fusion for scene and task understanding, autoregressive sequence modeling for temporally coherent trajectory generation, and a mixture-of-experts architecture for unified multi-skill learning. In contrast to conventional methods that separate visual perception and control or train different skills independently, ATG-MoE directly incorporates visual information into trajectory generation and supports efficient multi-skill integration within a single model. We train and evaluate the proposed method on eight representative assembly skills from a pressure-reducing valve assembly task. Experimental results show that ATG-MoE achieves strong overall performance in simulation, with an average grasp success rate of 96.3% and an average overall success rate of 91.8%, while also demonstrating strong generalization and effective multi-skill integration. Real-world experiments further verify its practicality for multi-skill industrial assembly. The project page can be found at https://hwh23.github.io/ATG-MoE
Problem

Research questions and friction points this paper is trying to address.

assembly skill learning
positional generalization
multi-skill integration
flexible manufacturing
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive trajectory generation
mixture-of-experts
multi-modal fusion
end-to-end skill learning
robotic assembly
🔎 Similar Papers
No similar papers found.
W
Weihang Huang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
C
Chaoran Zhang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
X
Xiaoxin Deng
Faculty of Engineering, Imperial College London, London SW7 2AZ, United Kingdom
Hao Zhou
Hao Zhou
University of Science and Technology of China
Z
Zhaobo Xu
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
S
Shubo Cui
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
Long Zeng
Long Zeng
Tsinghua University
Intelligent ManufacturingEmbodied AI RoboticsSketch Modeling