GContextFormer: A global context-aware hybrid multi-head attention approach with scaled additive aggregation for multimodal trajectory prediction

๐Ÿ“… 2025-11-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Multimodal trajectory prediction faces two key challenges: (1) heavy reliance on high-definition (HD) maps incurs high deployment costs and degrades robustness; (2) map-free approaches lack global scene context, causing pairwise attention mechanisms to overfit straight-line motion patterns and impair modeling of transitional behaviors and motion intent alignment. This paper proposes a map-free, global-context-aware hybrid attention model. Its core contributions are: (1) scene-level intent prior modeling coupled with hierarchical interaction reasoning; (2) scaled additive aggregation and dual-path cross-attention to decouple mode suppression from enhanced representation of transitional dynamics; and (3) an encoderโ€“decoder architecture integrating mode embeddings, neighbor-context enhancement, and gated fusion. Evaluated on the TOD-VT highway ramp dataset, the method significantly improves prediction accuracy in high-curvature and transitional regions, while demonstrating strong robustness and modular extensibility.

Technology Category

Application Category

๐Ÿ“ Abstract
Multimodal trajectory prediction generates multiple plausible future trajectories to address vehicle motion uncertainty from intention ambiguity and execution variability. However, HD map-dependent models suffer from costly data acquisition, delayed updates, and vulnerability to corrupted inputs, causing prediction failures. Map-free approaches lack global context, with pairwise attention over-amplifying straight patterns while suppressing transitional patterns, resulting in motion-intention misalignment. This paper proposes GContextFormer, a plug-and-play encoder-decoder architecture with global context-aware hybrid attention and scaled additive aggregation achieving intention-aligned multimodal prediction without map reliance. The Motion-Aware Encoder builds scene-level intention prior via bounded scaled additive aggregation over mode-embedded trajectory tokens and refines per-mode representations under shared global context, mitigating inter-mode suppression and promoting intention alignment. The Hierarchical Interaction Decoder decomposes social reasoning into dual-pathway cross-attention: a standard pathway ensures uniform geometric coverage over agent-mode pairs while a neighbor-context-enhanced pathway emphasizes salient interactions, with gating module mediating their contributions to maintain coverage-focus balance. Experiments on eight highway-ramp scenarios from TOD-VT dataset show GContextFormer outperforms state-of-the-art baselines. Compared to existing transformer models, GContextFormer achieves greater robustness and concentrated improvements in high-curvature and transition zones via spatial distributions. Interpretability is achieved through motion mode distinctions and neighbor context modulation exposing reasoning attribution. The modular architecture supports extensibility toward cross-domain multimodal reasoning tasks. Source: https://fenghy-chen.github.io/sources/.
Problem

Research questions and friction points this paper is trying to address.

Multimodal trajectory prediction suffers from costly HD map dependencies
Map-free approaches lack global context causing motion-intention misalignment
Existing methods over-amplify straight patterns while suppressing transitional patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global context-aware hybrid attention for multimodal prediction
Scaled additive aggregation to mitigate inter-mode suppression
Dual-pathway cross-attention decoder balancing coverage and focus
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yuzhi Chen
Intelligent Transportation System Research Center, Southeast University, 2 Southeast University Road, Jiangning District, Nanjing, 211189, P.R. China
Yuanchang Xie
Yuanchang Xie
University of Massachusetts Lowell
ITSTransportation SafetyTraffic Control and SimulationSensing and Data Analytics
L
Lei Zhao
Intelligent Transportation System Research Center, Southeast University, 2 Southeast University Road, Jiangning District, Nanjing, 211189, P.R. China
P
Pan Liu
Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, P.R. China
Yajie Zou
Yajie Zou
Tongji University
Freeway operationTransportation data miningTraffic safetyApplications of statistical analysis in transportation
C
Chen Wang
Intelligent Transportation System Research Center, Southeast University, 2 Southeast University Road, Jiangning District, Nanjing, 211189, P.R. China