🤖 AI Summary
Imitation learning for autonomous driving is hindered by the difficulty of modeling complex interactive behaviors and the scarcity of high-quality interaction data. Method: We propose Flow Planner, a novel planning framework comprising (i) a trajectory segmentation tokenization scheme for fine-grained motion representation; (ii) a spatiotemporal fusion Transformer architecture that explicitly models multi-agent dynamic interactions; and (iii) a classifier-free guided flow matching generative framework enabling efficient, stable, and multimodal interaction-aware planning. Flow Planner dynamically modulates inter-agent interaction weights to enhance planning consistency and scene adaptability. Results: Flow Planner achieves state-of-the-art performance among learning-based methods on nuPlan and the high-interaction-density interPlan benchmark, demonstrating significant improvements in high-value interactive scenarios—particularly lane-change games and unmarked intersection navigation.
📝 Abstract
Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.