Large Foundation Models for Trajectory Prediction in Autonomous Driving: A Comprehensive Survey

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address key challenges in autonomous driving trajectory prediction—including poor interpretability, heavy reliance on large-scale annotated data, and insufficient generalization to long-tail scenarios—this paper proposes a novel framework integrating large language models (LLMs) and multimodal large language models (MLLMs). Methodologically, we introduce a trajectory-language mapping mechanism to enable bidirectional alignment between motion semantics and natural language; design a lightweight multimodal fusion module that jointly encodes visual, trajectory, and linguistic features; and incorporate physics- and traffic-rule-based constraint reasoning to support causal, context-aware inference. Experiments on nuScenes and ETH/UCY benchmarks demonstrate significant improvements in prediction accuracy and robustness for long-tail scenarios, alongside enhanced interpretability of predictions. This work establishes a new paradigm for trajectory prediction grounded in cross-modal semantic understanding and constraint-guided reasoning.

Technology Category

Application Category

📝 Abstract

Trajectory prediction serves as a critical functionality in autonomous driving, enabling the anticipation of future motion paths for traffic participants such as vehicles and pedestrians, which is essential for driving safety. Although conventional deep learning methods have improved accuracy, they remain hindered by inherent limitations, including lack of interpretability, heavy reliance on large-scale annotated data, and weak generalization in long-tail scenarios. The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction. This survey offers a systematic review of recent advances in LFMs, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) for trajectory prediction. By integrating linguistic and scene semantics, LFMs facilitate interpretable contextual reasoning, significantly enhancing prediction safety and generalization in complex environments. The article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning. It covers prediction tasks for both vehicles and pedestrians, evaluation metrics, and dataset analyses. Key challenges such as computational latency, data scarcity, and real-world robustness are discussed, along with future research directions including low-latency inference, causality-aware modeling, and motion foundation models.

Problem

Research questions and friction points this paper is trying to address.

Enhancing trajectory prediction accuracy for autonomous vehicles

Overcoming interpretability and generalization limitations in deep learning

Integrating linguistic and multimodal reasoning for safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging Large Foundation Models for prediction

Integrating linguistic and scene semantics

Employing constraint-based reasoning methods

🔎 Similar Papers

No similar papers found.