Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Autonomous driving systems exhibit insufficient prediction performance in rare yet safety-critical edge cases—primarily due to training data bias and limited model generalization. To address this, we propose a novel framework integrating a world model, Mixture-of-Experts (MoE), and a large language model (LLM). Our approach introduces a first-of-its-kind *scenario routing mechanism* to decompose complex edge cases, and a lightweight temporal tokenizer enabling zero-shot spatiotemporal context fusion and causal counterfactual reasoning. Notably, this is the first work to leverage LLMs to enhance the long-horizon reasoning capability of world models. To standardize evaluation, we release *nuScenes-corner*, a new benchmark dedicated to edge-case prediction. Experiments demonstrate state-of-the-art performance across four diverse datasets—nuScenes, NGSIM, HighD, and MoCAD—with significant robustness improvements under both edge-case conditions and data scarcity.

Technology Category

Application Category

📝 Abstract

Accurate and reliable motion forecasting is essential for the safe deployment of autonomous vehicles (AVs), particularly in rare but safety-critical scenarios known as corner cases. Existing models often underperform in these situations due to an over-representation of common scenes in training data and limited generalization capabilities. To address this limitation, we present WM-MoE, the first world model-based motion forecasting framework that unifies perception, temporal memory, and decision making to address the challenges of high-risk corner-case scenarios. The model constructs a compact scene representation that explains current observations, anticipates future dynamics, and evaluates the outcomes of potential actions. To enhance long-horizon reasoning, we leverage large language models (LLMs) and introduce a lightweight temporal tokenizer that maps agent trajectories and contextual cues into the LLM's feature space without additional training, enriching temporal context and commonsense priors. Furthermore, a mixture-of-experts (MoE) is introduced to decompose complex corner cases into subproblems and allocate capacity across scenario types, and a router assigns scenes to specialized experts that infer agent intent and perform counterfactual rollouts. In addition, we introduce nuScenes-corner, a new benchmark that comprises four real-world corner-case scenarios for rigorous evaluation. Extensive experiments on four benchmark datasets (nuScenes, NGSIM, HighD, and MoCAD) showcase that WM-MoE consistently outperforms state-of-the-art (SOTA) baselines and remains robust under corner-case and data-missing conditions, indicating the promise of world model-based architectures for robust and generalizable motion forecasting in fully AVs.

Problem

Research questions and friction points this paper is trying to address.

Improves motion forecasting for autonomous vehicles in rare safety-critical scenarios

Addresses limited generalization of existing models in corner cases

Enhances long-horizon reasoning using LLMs and mixture-of-experts architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

World model framework unifies perception, memory, decision making

LLMs enhance reasoning via lightweight temporal tokenizer mapping

Mixture-of-experts decomposes corner cases into specialized subproblems

🔎 Similar Papers

No similar papers found.