🤖 AI Summary
This work addresses the limitations of existing motion prediction methods that rely on the closed-world assumption and struggle with imperfect perception and dynamically evolving object categories in real-world scenarios. We propose the first open-world motion prediction setting and introduce an end-to-end class-incremental learning framework capable of directly predicting future trajectories for both known and novel object classes from raw images. Key innovations include leveraging a vision-language model for pseudo-label filtering to enhance annotation quality and designing a replay sampling strategy based on query feature variance to mitigate catastrophic forgetting. Experiments on nuScenes and Argoverse 2 demonstrate that our approach effectively adapts to new categories while preserving performance on previously learned ones, and further supports end-to-end planning and zero-shot transfer to real driving scenarios.
📝 Abstract
Motion forecasting aims to predict the future trajectories of dynamic agents in the scene, enabling autonomous vehicles to effectively reason about scene evolution. Existing approaches operate under the closed-world regime and assume fixed object taxonomy as well as access to high-quality perception. Therefore, they struggle in real-world settings where perception is imperfect and object taxonomy evolves over time. In this work, we bridge this fundamental gap by introducing open-world motion forecasting, a novel setting in which new object classes are sequentially introduced over time and future object trajectories are estimated directly from camera images. We tackle this setting by proposing the first end-to-end class-incremental motion forecasting framework to mitigate catastrophic forgetting while simultaneously learning to forecast newly introduced classes. When a new class is introduced, our framework employs a pseudo-labeling strategy to first generate motion forecasting pseudo-labels for all known classes which are then processed by a vision-language model to filter inconsistent and over-confident predictions. Parallelly, our approach further mitigates catastrophic forgetting by using a novel replay sampling strategy that leverages query feature variance to sample previous sequences with informative motion patterns. Extensive evaluation on the nuScenes and Argoverse 2 datasets demonstrates that our approach successfully resists catastrophic forgetting and maintains performance on previously learned classes while improving adaptation to novel ones. Further, we demonstrate that our approach supports zero-shot transfer to real-world driving and naturally extends to end-to-end class-incremental planning, enabling continual adaptation of the full autonomous driving system. We provide the code at https://omen.cs.uni-freiburg.de .