WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Autoregressive decoding in autonomous vehicle trajectory planning suffers from inefficiency and limited scalability. Method: This paper proposes a parallel coarse-to-fine discrete flow matching paradigm, modeling trajectories as discrete flow matching over a structured token space. It introduces a metric-aligned numerical tokenizer, geometry-aware flow objectives, and a simulation-guided GRPO alignment mechanism. A non-causal flow model architecture is adopted, integrated with triplet-margin learning, multi-stage adaptation of pretrained Vision-Language-Action (VLA) models, and joint multimodal continual pretraining with consistency regularization. Results: On NAVSIM v1, the method achieves 89.1 PDMS in single-step inference and 90.3 PDMS in five-step inference—substantially outperforming autoregressive and diffusion-based baselines—while simultaneously improving computational efficiency, trajectory accuracy, and driving safety.

Technology Category

Application Category

📝 Abstract

We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenizer that preserves scalar geometry via triplet-margin learning, a geometry-aware flow objective and a simulator-guided GRPO alignment that integrates safety, ego progress, and comfort rewards while retaining parallel generation. A multi-stage adaptation converts a pre-trained auto-regressive backbone (Janus-1.5B) from causal decoding to non-causal flow model and strengthens road-scene competence through continued multimodal pretraining. Thanks to the inherent nature of consistency model training and parallel decoding inference, WAM-Flow achieves superior closed-loop performance against autoregressive and diffusion-based VLA baselines, with 1-step inference attaining 89.1 PDMS and 5-step inference reaching 90.3 PDMS on NAVSIM v1 benchmark. These results establish discrete flow matching as a new promising paradigm for end-to-end autonomous driving. The code will be publicly available soon.

Problem

Research questions and friction points this paper is trying to address.

Parallel motion planning for autonomous driving via flow matching

Coarse-to-fine trajectory refinement with tunable compute-accuracy trade-off

Integrating safety, progress, and comfort rewards in parallel generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel bidirectional denoising for coarse-to-fine motion planning

Metric-aligned tokenizer with triplet-margin learning for geometry

Simulator-guided GRPO alignment integrating safety and comfort rewards

🔎 Similar Papers

Real-time Motion Planning for autonomous vehicles in dynamic environments