MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing end-to-end autonomous driving models based on generalist Mixture-of-Experts (MoE) architectures suffer from heavy data dependency, complex training, and limited generalization and interpretability. To address these challenges, this paper proposes MoSE—a skill-oriented MoE architecture inspired by the staged acquisition of driving skills in human drivers. MoSE constructs a hierarchical skill dataset, designs a skill-level routing mechanism to decompose driving tasks into learnable atomic skill units, and enables collaborative decision-making via multi-step reasoning alignment within a single forward pass. Integrating vision-language modeling, sparse activation MoE, skill-annotated routing, and hierarchical pretraining, MoSE achieves high computational efficiency while significantly improving generalization and interpretability. On the CODA AD extreme-scenario reasoning benchmark, MoSE attains state-of-the-art performance on open-source data using fewer than 3B activated parameters—reducing activated parameters by at least 62.5% compared to an 8B+ baseline.

Technology Category

Application Category

📝 Abstract
Recent studies show large language models (LLMs) and vision language models (VLMs) trained using web-scale data can empower end-to-end autonomous driving systems for a better generalization and interpretation. Specifically, by dynamically routing inputs to specialized subsets of parameters, the Mixture-of-Experts (MoE) technique enables general LLMs or VLMs to achieve substantial performance improvements while maintaining computational efficiency. However, general MoE models usually demands extensive training data and complex optimization. In this work, inspired by the learning process of human drivers, we propose a skill-oriented MoE, called MoSE, which mimics human drivers' learning process and reasoning process, skill-by-skill and step-by-step. We propose a skill-oriented routing mechanism that begins with defining and annotating specific skills, enabling experts to identify the necessary driving competencies for various scenarios and reasoning tasks, thereby facilitating skill-by-skill learning. Further align the driving process to multi-step planning in human reasoning and end-to-end driving models, we build a hierarchical skill dataset and pretrain the router to encourage the model to think step-by-step. Unlike multi-round dialogs, MoSE integrates valuable auxiliary tasks (e.g. description, reasoning, planning) in one single forward process without introducing any extra computational cost. With less than 3B sparsely activated parameters, our model outperforms several 8B+ parameters on CODA AD corner case reasoning task. Compared to existing methods based on open-source models and data, our approach achieves state-of-the-art performance with significantly reduced activated model size (at least by $62.5%$) with a single-turn conversation.
Problem

Research questions and friction points this paper is trying to address.

Enhance autonomous driving via skill-by-skill expert learning
Reduce computational cost while improving reasoning performance
Optimize Mixture-of-Experts for scenario-specific driving skills
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skill-oriented MoE for autonomous driving
Hierarchical skill dataset for step reasoning
Single forward process with auxiliary tasks
Lu Xu
Lu Xu
Postdoc, Riken AIP
deep learningmachine learningcomputer vision
Jiaqian Yu
Jiaqian Yu
Samsung R&D Institute China - Beijing
Machine LearningComputer Vision
X
Xiongfeng Peng
Advanced Research Lab, Samsung Research China-Beijing
Yiwei Chen
Yiwei Chen
Yunnan University, Zhejiang Uinversity
Signal processingDeep learningComputational imagingQuantum machine learning
Weiming Li
Weiming Li
Principal Engineer, Samsung Electronics
Computer VisionAugmented RealityComputational Imaging and Display
J
Jaewook Yoo
Manufacturing/Material Handling AI Lab (DS AI Center)
S
Sunghyun Chunag
Manufacturing/Material Handling AI Lab (DS AI Center)
D
Dongwook Lee
Manufacturing/Material Handling AI Lab (DS AI Center)
D
Daehyun Ji
Manufacturing/Material Handling AI Lab (DS AI Center)
C
Chao Zhang
Advanced Research Lab, Samsung Research China-Beijing