LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-agent trajectory prediction faces challenges in modeling spatiotemporal dependencies and inadequately representing fine-grained temporal dynamics and high-order motion states. To address these, we propose LTMSformer—a lightweight Transformer-based framework featuring a novel local trend-aware attention mechanism to capture granular temporal evolution, and an explicit motion state encoder that integrates acceleration, jerk, and heading for enhanced spatial interaction modeling. The architecture further incorporates convolutional attention, hierarchical local temporal windows, and a lightweight MLP-based trajectory refinement module to balance computational efficiency and prediction accuracy. On Argoverse 1, LTMSformer achieves significant improvements over HiVT-64: minADE decreases by 4.35%, minFDE by 8.74%, and miss rate (MR) by 20%. With only 32% of the parameters of HiVT-128, it delivers superior accuracy–efficiency trade-offs.

Technology Category

Application Category

📝 Abstract
It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc. To further refine the trajectory prediction, we propose a Lightweight Proposal Refinement Module that leverages Multi-Layer Perceptrons for trajectory embedding and generates the refined trajectories with fewer model parameters. Experiment results on the Argoverse 1 dataset demonstrate that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%. We also achieve higher accuracy than HiVT-128 with a 68% reduction in model size.
Problem

Research questions and friction points this paper is trying to address.

Model complex temporal-spatial dependencies for trajectory prediction
Capture local temporal dependency in agent states
Enhance spatial interaction with high-order motion attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Trend-Aware Attention captures temporal dependency
Motion State Encoder models spatial interaction attributes
Lightweight Proposal Refinement Module enhances trajectory prediction
Y
Yixin Yan
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
Y
Yang Li
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
Y
Yuanfan Wang
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
X
Xiaozhou Zhou
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
Beihao Xia
Beihao Xia
Huazhong University of Science and Technology
Trajectory Prediction
M
Manjiang Hu
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
H
Hongmao Qin
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China