Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

📅 2023-12-13
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing autonomous vehicle trajectory prediction methods heavily rely on external traffic flow modeling while neglecting driver attention and intent, thereby limiting prediction accuracy and driving safety. To address this, we propose RouteFormer—a novel multimodal Transformer architecture that deeply fuses driver gaze (eye-tracking) data with scene-level visual information—and introduce the Path Complexity Index (PCI) to enable difficulty-aware evaluation. We further construct GEM, the first publicly available driving dataset synchronously featuring first-person video, high-fidelity eye-tracking data, and centimeter-accurate GPS trajectories. Extensive experiments demonstrate that our method achieves significant improvements over state-of-the-art approaches on both GEM and DR(eye)VE benchmarks, reducing average displacement error by 23.6% in high-PCI complex scenarios. All code and datasets are fully open-sourced.
📝 Abstract
Understanding drivers' decision-making is crucial for road safety. Although predicting the ego-vehicle's path is valuable for driver-assistance systems, existing methods mainly focus on external factors like other vehicles' motions, often neglecting the driver's attention and intent. To address this gap, we infer the ego-trajectory by integrating the driver's gaze and the surrounding scene. We introduce RouteFormer, a novel multimodal ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view, comprising first-person video and gaze fixations. We also present the Path Complexity Index (PCI), a new metric for trajectory complexity that enables a more nuanced evaluation of challenging scenarios. To tackle data scarcity and enhance diversity, we introduce GEM, a comprehensive dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data. Extensive evaluations on GEM and DR(eye)VE demonstrate that RouteFormer significantly outperforms state-of-the-art methods, achieving notable improvements in prediction accuracy across diverse conditions. Ablation studies reveal that incorporating driver field-of-view data yields significantly better average displacement error, especially in challenging scenarios with high PCI scores, underscoring the importance of modeling driver attention. All data and code are available at https://meakbiyik.github.io/routeformer.
Problem

Research questions and friction points this paper is trying to address.

Predict ego-vehicle path using driver gaze and scene context
Address data scarcity with enriched driver field-of-view dataset
Improve trajectory prediction accuracy in complex driving scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates driver gaze and scene for trajectory prediction
Uses GPS, environment, and field-of-view data
Introduces Path Complexity Index for evaluation
🔎 Similar Papers
No similar papers found.
M
M. E. Akbiyik
ETH Zürich
N
N. Savov
INSAIT, Sofia University
D
D. Paudel
ETH Zürich, INSAIT, Sofia University
Nikola Popovic
Nikola Popovic
Research Scientist, INSAIT, Sofia University
Computer VisionMachine LearningArtificial Intelligence
C
Christian Vater
ETH Zürich
Otmar Hilliges
Otmar Hilliges
Professor of Computer Science, ETH Zurich
Computer VisionAugmented RealityRoboticsComputational Interaction
L
L. V. Gool
ETH Zürich, INSAIT, Sofia University
X
Xi Wang
ETH Zürich