π€ AI Summary
This work addresses the challenge of precise final-meter navigation to points of interest (POIs) in vision-and-language navigation by proposing a βBrain-in-Actionβ framework that leverages semantic reasoning about POIs to guide continuous waypoint prediction. To facilitate realistic evaluation, the authors introduce POINav-Bench, the first closed-loop benchmark for real-world POI navigation, built upon a high-fidelity 3D Gaussian Splatting reconstruction of a 120,000-square-meter commercial district encompassing 163 POIs, complete with traversability annotations and reference trajectories. The study also releases a large-scale dataset comprising 70,000 real-world signboard-to-entrance pairs. Experimental results demonstrate the effectiveness of the proposed approach in fine-grained navigation tasks.
π Abstract
Real-world navigation is fundamentally driven by Points of Interest (POIs), yet reaching a precise POI remains a critical "final-meters" challenge. Existing Vision-Language Navigation (VLN) benchmarks of POI-goal navigation often suffer from coarse granularity or significant sim-to-real gaps due to generated scene. To bridge this gap, we present POINav-Bench, the first benchmark designed for closed-loop evaluation of real-world POI-goal navigation. It comprises 11 commercial areas reconstructed from real-world captures using 3D Gaussian Splatting (3DGS), covering 126,398 $m^{2}$ in total and spanning 163 distinct POIs. With traversability-aware annotations and reference trajectories, POINav-Bench enables high-fidelity evaluation of navigation agents in realistic, POI-rich real-world environments. Building on this, we propose the POINav Brain-Action Framework where a Brain module performs POI-grounded reasoning to guide an Action module in predicting continuous waypoints for real-world execution. We further curate the POINav-Dataset, containing 70K real-world signage-entrance pairs. Experiments show that our framework provides a viable path toward refining real-world POI-goal navigation.