🤖 AI Summary
Existing ego-centric trajectory prediction methods suffer from insufficient robustness against realistic perception noise—including visual occlusion, ID switching, and tracking drift—arising from first-person-view (FPV) imagery.
Method: We introduce EgoTraj-Bench, the first benchmark that explicitly links real-world front-facing image noise to future bird’s-eye-view (BEV) trajectories. We propose BiFlow, a dual-stream generative model that jointly performs history denoising and trajectory forecasting via a shared latent space, augmented by an EgoAnchor mechanism that modulates intent modeling using historical features. Our approach unifies flow matching, latent representation learning, feature modulation, and denoising diffusion within an end-to-end trainable framework.
Results: On EgoTraj-Bench, BiFlow achieves average reductions of 10–15% in minADE and minFDE, demonstrating significantly improved robustness to perception noise and enhanced practical deployability.
📝 Abstract
Reliable trajectory prediction from an ego-centric perspective is crucial for robotic navigation in human-centric environments. However, existing methods typically assume idealized observation histories, failing to account for the perceptual artifacts inherent in first-person vision, such as occlusions, ID switches, and tracking drift. This discrepancy between training assumptions and deployment reality severely limits model robustness. To bridge this gap, we introduce EgoTraj-Bench, the first real-world benchmark that grounds noisy, first-person visual histories in clean, bird's-eye-view future trajectories, enabling robust learning under realistic perceptual constraints. Building on this benchmark, we propose BiFlow, a dual-stream flow matching model that concurrently denoises historical observations and forecasts future motion by leveraging a shared latent representation. To better model agent intent, BiFlow incorporates our EgoAnchor mechanism, which conditions the prediction decoder on distilled historical features via feature modulation. Extensive experiments show that BiFlow achieves state-of-the-art performance, reducing minADE and minFDE by 10-15% on average and demonstrating superior robustness. We anticipate that our benchmark and model will provide a critical foundation for developing trajectory forecasting systems truly resilient to the challenges of real-world, ego-centric perception.