SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

📅 2024-10-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalization and cross-dataset transferability in autonomous driving motion forecasting—stemming from data scarcity—this paper proposes the first model-agnostic and dataset-agnostic self-supervised pretraining framework. Our method decouples model architecture from data sources and jointly optimizes contrastive learning with temporal reconstruction objectives. It further introduces multi-dataset collaborative distillation and dynamic scene-aware sampling to enable robust unsupervised representation learning. Evaluated on multiple benchmarks, our approach achieves significant improvements over state-of-the-art methods: Forecast-MAE decreases by 12.3%, MissRate drops by 10.6%, and both cross-dataset generalization and noise robustness are comprehensively enhanced. This work establishes a scalable, pretraining paradigm for open-world driving prediction.

Technology Category

Application Category

📝 Abstract
Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions and road geometries. Inspired by recent advances in natural language processing (NLP) and computer vision (CV), self-supervised learning (SSL) has gained significant attention in the motion prediction community for learning rich and transferable scene representations. Nonetheless, existing pre-training methods for motion prediction have largely focused on specific model architectures and single dataset, limiting their scalability and generalizability. To address these challenges, we propose SmartPretrain, a general and scalable SSL framework for motion prediction that is both model-agnostic and dataset-agnostic. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms to effectively represent spatiotemporal evolution and interactions without imposing architectural constraints. Additionally, SmartPretrain employs a dataset-agnostic scenario sampling strategy that integrates multiple datasets, enhancing data volume, diversity, and robustness. Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics. For instance, SmartPretrain significantly reduces the MissRate of Forecast-MAE by 10.6%. These results highlight SmartPretrain's effectiveness as a unified, scalable solution for motion prediction, breaking free from the limitations of the small-data regime. Codes are available at https://github.com/youngzhou1999/SmartPretrain
Problem

Research questions and friction points this paper is trying to address.

Enhances motion prediction for autonomous vehicles
Overcomes data scarcity in driving datasets
Improves model scalability and generalizability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic SSL framework
Dataset-agnostic scenario sampling
Integrates contrastive and reconstructive SSL
🔎 Similar Papers
No similar papers found.
Y
Yang Zhou
SenseTime Research
Hao Shao
Hao Shao
CUHK, MMLab
Large Language ModelsGenerative modelsAutonomous Driving
L
Letian Wang
University of Toronto
S
Steven L. Waslander
University of Toronto
H
Hongsheng Li
CUHK MMLab, CPII under InnoHK, Shanghai Artificial Intelligence Laboratory
Y
Yu Liu
SenseTime Research, Shanghai Artificial Intelligence Laboratory