Map-World: Masked Action planning and Path-Integral World Model for Autonomous Driving

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous driving multimodal motion planning faces dual challenges in modeling multiple future scenarios and ensuring computational efficiency. Existing approaches rely on handcrafted anchors or reinforcement learning to select a single mode, leading to information loss and complex optimization. This paper proposes an anchor-free Masked Action Planning (MAP) framework: it pioneers the application of masked sequence completion to driving planning, coupled with latent variable expansion to generate diverse trajectory queries; introduces a path-weighted world model that jointly models scene dynamics and discrete path-integration weights in the bird’s-eye view (BEV) space, enabling end-to-end optimization of multimodal semantic loss; and enhances trajectory diversity via noise injection and latent-space state expansion. Evaluated on NAVSEM, MAP achieves state-of-the-art performance among world-model-based methods, matches anchor-based approaches in accuracy, supports real-time inference, and eliminates the need for reinforcement learning.

Technology Category

Application Category

📝 Abstract
Motion planning for autonomous driving must handle multiple plausible futures while remaining computationally efficient. Recent end-to-end systems and world-model-based planners predict rich multi-modal trajectories, but typically rely on handcrafted anchors or reinforcement learning to select a single best mode for training and control. This selection discards information about alternative futures and complicates optimization. We propose MAP-World, a prior-free multi-modal planning framework that couples masked action planning with a path-weighted world model. The Masked Action Planning (MAP) module treats future ego motion as masked sequence completion: past waypoints are encoded as visible tokens, future waypoints are represented as mask tokens, and a driving-intent path provides a coarse scaffold. A compact latent planning state is expanded into multiple trajectory queries with injected noise, yielding diverse, temporally consistent modes without anchor libraries or teacher policies. A lightweight world model then rolls out future BEV semantics conditioned on each candidate trajectory. During training, semantic losses are computed as an expectation over modes, using trajectory probabilities as discrete path weights, so the planner learns from the full distribution of plausible futures instead of a single selected path. On NAVSIM, our method matches anchor-based approaches and achieves state-of-the-art performance among world-model-based methods, while avoiding reinforcement learning and maintaining real-time inference latency.
Problem

Research questions and friction points this paper is trying to address.

Handles multiple plausible futures in autonomous driving motion planning
Eliminates reliance on handcrafted anchors or reinforcement learning
Learns from full distribution of futures instead of single path
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked action planning treats future motion as sequence completion
Compact latent state expands into diverse trajectory queries with noise
Lightweight world model rolls out BEV semantics for candidate trajectories
🔎 Similar Papers
No similar papers found.
B
Bin Hu
University of Macau
Z
Zijian Lu
National University of Singapore
H
Haicheng Liao
University of Macau
C
Chengran Yuan
National University of Singapore
Bin Rao
Bin Rao
University of Macau
Y
Yongkang Li
Purdue University
Guofa Li
Guofa Li
Chongqing University, China
Artificial IntelligenceDriver AssistanceAutonomous VehiclesIntelligent Transportation Systems
Zhiyong Cui
Zhiyong Cui
Professor, Beihang University
Foundation ModelsAutonomous DrivingUrban ComputingTraffic PredictionTraffic Control
C
Cheng-zhong Xu
University of Macau
Z
Zhenning Li
University of Macau