EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy Observations

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing ego-centric trajectory prediction methods suffer from insufficient robustness against realistic perception noise—including visual occlusion, ID switching, and tracking drift—arising from first-person-view (FPV) imagery. Method: We introduce EgoTraj-Bench, the first benchmark that explicitly links real-world front-facing image noise to future bird’s-eye-view (BEV) trajectories. We propose BiFlow, a dual-stream generative model that jointly performs history denoising and trajectory forecasting via a shared latent space, augmented by an EgoAnchor mechanism that modulates intent modeling using historical features. Our approach unifies flow matching, latent representation learning, feature modulation, and denoising diffusion within an end-to-end trainable framework. Results: On EgoTraj-Bench, BiFlow achieves average reductions of 10–15% in minADE and minFDE, demonstrating significantly improved robustness to perception noise and enhanced practical deployability.

Technology Category

Application Category

📝 Abstract

Reliable trajectory prediction from an ego-centric perspective is crucial for robotic navigation in human-centric environments. However, existing methods typically assume idealized observation histories, failing to account for the perceptual artifacts inherent in first-person vision, such as occlusions, ID switches, and tracking drift. This discrepancy between training assumptions and deployment reality severely limits model robustness. To bridge this gap, we introduce EgoTraj-Bench, the first real-world benchmark that grounds noisy, first-person visual histories in clean, bird's-eye-view future trajectories, enabling robust learning under realistic perceptual constraints. Building on this benchmark, we propose BiFlow, a dual-stream flow matching model that concurrently denoises historical observations and forecasts future motion by leveraging a shared latent representation. To better model agent intent, BiFlow incorporates our EgoAnchor mechanism, which conditions the prediction decoder on distilled historical features via feature modulation. Extensive experiments show that BiFlow achieves state-of-the-art performance, reducing minADE and minFDE by 10-15% on average and demonstrating superior robustness. We anticipate that our benchmark and model will provide a critical foundation for developing trajectory forecasting systems truly resilient to the challenges of real-world, ego-centric perception.

Problem

Research questions and friction points this paper is trying to address.

Addressing trajectory prediction robustness under noisy ego-view observations

Bridging the gap between idealized training and real-world perceptual artifacts

Developing benchmark and model for reliable navigation in human-centric environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream flow matching model for trajectory prediction

Shared latent representation for denoising and forecasting

EgoAnchor mechanism modulates features for intent modeling

🔎 Similar Papers

No similar papers found.