🤖 AI Summary
Multimodal trajectory prediction in autonomous driving suffers from overconfident model outputs, inadequate uncertainty modeling, and high computational overhead in ensemble methods. Method: This paper proposes a lightweight hierarchical Transformer ensemble framework. It introduces a novel hierarchical uncertainty-aware loss function and a grouped fully connected architecture to jointly optimize multimodal distribution modeling and computational efficiency, while leveraging deep ensembling to improve uncertainty calibration. Contribution/Results: The framework achieves state-of-the-art performance on major benchmarks including Argoverse and nuScenes: it improves prediction diversity by 12.7%, reduces Expected Calibration Error (ECE) by 38.5%, and significantly enhances the reliability and robustness of collision warning systems.
📝 Abstract
Accurate trajectory forecasting is crucial for the performance of various systems, such as advanced driver-assistance systems and self-driving vehicles. These forecasts allow us to anticipate events that lead to collisions and, therefore, to mitigate them. Deep Neural Networks have excelled in motion forecasting, but overconfidence and weak uncertainty quantification persist. Deep Ensembles address these concerns, yet applying them to multimodal distributions remains challenging. In this paper, we propose a novel approach named Hierarchical Light Transformer Ensembles (HLT-Ens) aimed at efficiently training an ensemble of Transformer architectures using a novel hierarchical loss function. HLT-Ens leverages grouped fully connected layers, inspired by grouped convolution techniques, to capture multimodal distributions effectively. We demonstrate that HLT-Ens achieves state-of-the-art performance levels through extensive experimentation, offering a promising avenue for improving trajectory forecasting techniques.