EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy Observations

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ego-centric trajectory prediction methods suffer from insufficient robustness against realistic perception noise—including visual occlusion, ID switching, and tracking drift—arising from first-person-view (FPV) imagery. Method: We introduce EgoTraj-Bench, the first benchmark that explicitly links real-world front-facing image noise to future bird’s-eye-view (BEV) trajectories. We propose BiFlow, a dual-stream generative model that jointly performs history denoising and trajectory forecasting via a shared latent space, augmented by an EgoAnchor mechanism that modulates intent modeling using historical features. Our approach unifies flow matching, latent representation learning, feature modulation, and denoising diffusion within an end-to-end trainable framework. Results: On EgoTraj-Bench, BiFlow achieves average reductions of 10–15% in minADE and minFDE, demonstrating significantly improved robustness to perception noise and enhanced practical deployability.

Technology Category

Application Category

📝 Abstract
Reliable trajectory prediction from an ego-centric perspective is crucial for robotic navigation in human-centric environments. However, existing methods typically assume idealized observation histories, failing to account for the perceptual artifacts inherent in first-person vision, such as occlusions, ID switches, and tracking drift. This discrepancy between training assumptions and deployment reality severely limits model robustness. To bridge this gap, we introduce EgoTraj-Bench, the first real-world benchmark that grounds noisy, first-person visual histories in clean, bird's-eye-view future trajectories, enabling robust learning under realistic perceptual constraints. Building on this benchmark, we propose BiFlow, a dual-stream flow matching model that concurrently denoises historical observations and forecasts future motion by leveraging a shared latent representation. To better model agent intent, BiFlow incorporates our EgoAnchor mechanism, which conditions the prediction decoder on distilled historical features via feature modulation. Extensive experiments show that BiFlow achieves state-of-the-art performance, reducing minADE and minFDE by 10-15% on average and demonstrating superior robustness. We anticipate that our benchmark and model will provide a critical foundation for developing trajectory forecasting systems truly resilient to the challenges of real-world, ego-centric perception.
Problem

Research questions and friction points this paper is trying to address.

Addressing trajectory prediction robustness under noisy ego-view observations
Bridging the gap between idealized training and real-world perceptual artifacts
Developing benchmark and model for reliable navigation in human-centric environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream flow matching model for trajectory prediction
Shared latent representation for denoising and forecasting
EgoAnchor mechanism modulates features for intent modeling
🔎 Similar Papers
No similar papers found.
J
Jiayi Liu
The Hong Kong University of Science and Technology (Guangzhou)
J
Jiaming Zhou
The Hong Kong University of Science and Technology (Guangzhou)
K
Ke Ye
The Hong Kong University of Science and Technology (Guangzhou)
Kun-Yu Lin
Kun-Yu Lin
The University of Hong Kong
Computer VisionMachine Learning
Allan Wang
Allan Wang
Researcher, Miraikan
Social NavigationVisual NavigationHuman Robot Interaction
Junwei Liang
Junwei Liang
Assistant Professor, HKUST (Guangzhou) | CSE, HKUST | Ph.D. @CMU
Computer VisionRoboticsEmbodied AITrajectory Prediction