๐ค AI Summary
Existing approaches to autonomous driving environment modeling often overlook the feedback effect of ego-vehicle motion on scene observation, limiting comprehensive understanding and planning capabilities. This work proposes FlowAD, a novel framework that, for the first time, integrates ego-motion feedback into feature learning through ego-guided scene partitioning, spatiotemporal scene flow prediction, and task-aware enhancement to achieve unified dynamic scene modeling. Leveraging log-replay data to construct scene flow representations, the method introduces a new evaluation metric, Frame-level Collision Precision (FCP). On the nuScenes benchmark, FlowAD reduces collision rates by 19% and improves FCP by 60% (+1.39 frames) compared to SparseDrive; it also achieves a driving score of 51.77 on Bench2Drive, demonstrating its generality and effectiveness.
๐ Abstract
Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released here.