WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive decoding in autonomous vehicle trajectory planning suffers from inefficiency and limited scalability. Method: This paper proposes a parallel coarse-to-fine discrete flow matching paradigm, modeling trajectories as discrete flow matching over a structured token space. It introduces a metric-aligned numerical tokenizer, geometry-aware flow objectives, and a simulation-guided GRPO alignment mechanism. A non-causal flow model architecture is adopted, integrated with triplet-margin learning, multi-stage adaptation of pretrained Vision-Language-Action (VLA) models, and joint multimodal continual pretraining with consistency regularization. Results: On NAVSIM v1, the method achieves 89.1 PDMS in single-step inference and 90.3 PDMS in five-step inference—substantially outperforming autoregressive and diffusion-based baselines—while simultaneously improving computational efficiency, trajectory accuracy, and driving safety.

Technology Category

Application Category

📝 Abstract
We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenizer that preserves scalar geometry via triplet-margin learning, a geometry-aware flow objective and a simulator-guided GRPO alignment that integrates safety, ego progress, and comfort rewards while retaining parallel generation. A multi-stage adaptation converts a pre-trained auto-regressive backbone (Janus-1.5B) from causal decoding to non-causal flow model and strengthens road-scene competence through continued multimodal pretraining. Thanks to the inherent nature of consistency model training and parallel decoding inference, WAM-Flow achieves superior closed-loop performance against autoregressive and diffusion-based VLA baselines, with 1-step inference attaining 89.1 PDMS and 5-step inference reaching 90.3 PDMS on NAVSIM v1 benchmark. These results establish discrete flow matching as a new promising paradigm for end-to-end autonomous driving. The code will be publicly available soon.
Problem

Research questions and friction points this paper is trying to address.

Parallel motion planning for autonomous driving via flow matching
Coarse-to-fine trajectory refinement with tunable compute-accuracy trade-off
Integrating safety, progress, and comfort rewards in parallel generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel bidirectional denoising for coarse-to-fine motion planning
Metric-aligned tokenizer with triplet-margin learning for geometry
Simulator-guided GRPO alignment integrating safety and comfort rewards
🔎 Similar Papers
No similar papers found.