Large Foundation Models for Trajectory Prediction in Autonomous Driving: A Comprehensive Survey

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key challenges in autonomous driving trajectory prediction—including poor interpretability, heavy reliance on large-scale annotated data, and insufficient generalization to long-tail scenarios—this paper proposes a novel framework integrating large language models (LLMs) and multimodal large language models (MLLMs). Methodologically, we introduce a trajectory-language mapping mechanism to enable bidirectional alignment between motion semantics and natural language; design a lightweight multimodal fusion module that jointly encodes visual, trajectory, and linguistic features; and incorporate physics- and traffic-rule-based constraint reasoning to support causal, context-aware inference. Experiments on nuScenes and ETH/UCY benchmarks demonstrate significant improvements in prediction accuracy and robustness for long-tail scenarios, alongside enhanced interpretability of predictions. This work establishes a new paradigm for trajectory prediction grounded in cross-modal semantic understanding and constraint-guided reasoning.

Technology Category

Application Category

📝 Abstract
Trajectory prediction serves as a critical functionality in autonomous driving, enabling the anticipation of future motion paths for traffic participants such as vehicles and pedestrians, which is essential for driving safety. Although conventional deep learning methods have improved accuracy, they remain hindered by inherent limitations, including lack of interpretability, heavy reliance on large-scale annotated data, and weak generalization in long-tail scenarios. The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction. This survey offers a systematic review of recent advances in LFMs, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) for trajectory prediction. By integrating linguistic and scene semantics, LFMs facilitate interpretable contextual reasoning, significantly enhancing prediction safety and generalization in complex environments. The article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning. It covers prediction tasks for both vehicles and pedestrians, evaluation metrics, and dataset analyses. Key challenges such as computational latency, data scarcity, and real-world robustness are discussed, along with future research directions including low-latency inference, causality-aware modeling, and motion foundation models.
Problem

Research questions and friction points this paper is trying to address.

Enhancing trajectory prediction accuracy for autonomous vehicles
Overcoming interpretability and generalization limitations in deep learning
Integrating linguistic and multimodal reasoning for safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging Large Foundation Models for prediction
Integrating linguistic and scene semantics
Employing constraint-based reasoning methods
🔎 Similar Papers
No similar papers found.
W
Wei Dai
Department of Mathematical Sciences, School of Physical sciences, University of Liverpool, L69 3BX Liverpool, U.K. and with the Department of Communications and Networking, School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou 215000, China
S
Shengen Wu
Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, China
W
Wei Wu
Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, China
Z
Zhenhao Wang
Deep Interdisciplinary Intelligence Lab, The Hong Kong University of Science and Technology (Guangzhou) as a research intern, Guangzhou 511400, China, and with the School of Mathematics and Statistics, Shandong University, Weihai 264209, China
S
Sisuo Lyu
Thrust of Data Science and Analytics, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, China
H
Haicheng Liao
Limin Yu
Limin Yu
Xi'an Jiaotong-Liverpool University
sonar detectionrational waveletsmedical image analysisAGV system design
W
Weiping Ding
School of Artificial Intelligence and Computer Science, Nantong University, Nantong 226019, China
Runwei Guan
Runwei Guan
Hong Kong University of Science and Technology (Guangzhou) / Founder of FertiTech AI
Multi-Modal LearningUnmanned Surface VesselRadar PerceptionAI Medicine
Y
Yutao Yue
Thrust of Artificial Intelligence, the Thrust of Intelligent Transportation and the Deep Interdisciplinary Intelligence Lab, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, China, also with the Institute of Deep Perception Technology, Jiangsu Industrial Technology Research Institute, Wuxi 214028, China