PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting

📅 2024-12-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the poor generalizability of motion prediction in autonomous driving—stemming from reliance on costly, non-scalable, and non-reproducible manually annotated data—this paper proposes a novel self-supervised pretraining paradigm leveraging pseudo-labeled trajectories. Unlike prior approaches that treat automatically generated detection-and-tracking trajectories as noisy artifacts to be filtered out, our core innovation lies in deliberately harnessing these inherently diverse, uncurated pseudo-trajectories as valuable supervisory signals, thereby eliminating dependence on clean, single-label ground truth. The method jointly integrates 3D object detection, multi-object tracking, and contrastive/reconstructive self-supervised learning, enabling efficient lightweight fine-tuning. Evaluated on standard benchmarks, it achieves state-of-the-art performance, with substantial improvements in low-label-data regimes, cross-domain transfer, and multi-class end-to-end motion forecasting—while demonstrating strong robustness and practical applicability.

Technology Category

Application Category

📝 Abstract

Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike traditional pipelines aiming for clean, single-label annotations, PPT embraces noise and diversity as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting. Code and data will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on costly manual trajectory annotations for motion forecasting

Addressing domain gaps and improving generalization in dynamic scene prediction

Leveraging noisy, diverse pseudo-labels for robust representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses unprocessed diverse trajectories from detectors

Embraces noise and diversity for robust learning

Optional finetuning boosts cross-domain performance

🔎 Similar Papers

Past Movements-Guided Motion Representation Learning for Human Motion Prediction