🤖 AI Summary
To address key challenges in autonomous driving trajectory prediction—including poor interpretability, heavy reliance on large-scale annotated data, and insufficient generalization to long-tail scenarios—this paper proposes a novel framework integrating large language models (LLMs) and multimodal large language models (MLLMs). Methodologically, we introduce a trajectory-language mapping mechanism to enable bidirectional alignment between motion semantics and natural language; design a lightweight multimodal fusion module that jointly encodes visual, trajectory, and linguistic features; and incorporate physics- and traffic-rule-based constraint reasoning to support causal, context-aware inference. Experiments on nuScenes and ETH/UCY benchmarks demonstrate significant improvements in prediction accuracy and robustness for long-tail scenarios, alongside enhanced interpretability of predictions. This work establishes a new paradigm for trajectory prediction grounded in cross-modal semantic understanding and constraint-guided reasoning.
📝 Abstract
Trajectory prediction serves as a critical functionality in autonomous driving, enabling the anticipation of future motion paths for traffic participants such as vehicles and pedestrians, which is essential for driving safety. Although conventional deep learning methods have improved accuracy, they remain hindered by inherent limitations, including lack of interpretability, heavy reliance on large-scale annotated data, and weak generalization in long-tail scenarios. The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction. This survey offers a systematic review of recent advances in LFMs, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) for trajectory prediction. By integrating linguistic and scene semantics, LFMs facilitate interpretable contextual reasoning, significantly enhancing prediction safety and generalization in complex environments. The article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning. It covers prediction tasks for both vehicles and pedestrians, evaluation metrics, and dataset analyses. Key challenges such as computational latency, data scarcity, and real-world robustness are discussed, along with future research directions including low-latency inference, causality-aware modeling, and motion foundation models.