LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing end-to-end navigation methods for unstructured environments rely on separate localization modules, require precise sensor calibration, and exhibit poor generalization. To address these limitations, this paper proposes an end-to-end trajectory planning framework that jointly integrates implicit localization with metric-aware visual geometric modeling. Our key contributions are: (1) the first “localization-anchored” end-to-end paradigm; (2) a long-horizon vision–geometry backbone network enabling implicit state estimation at absolute scale; and (3) dense scene geometry reconstruction from historical observations to support robust obstacle avoidance. Leveraging implicit geometric memory modeling, multi-task auxiliary supervision (localization + reconstruction), and joint fine-tuning, our method achieves over 27.3% improvement in navigation accuracy over ideal localization baselines in both simulation and real-world settings, significantly reducing cumulative pose error. It demonstrates strong cross-platform and cross-environment generalization without explicit mapping or external localization.

Technology Category

Application Category

📝 Abstract
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the href{https://steinate.github.io/logoplanner.github.io/}{project page}.
Problem

Research questions and friction points this paper is trying to address.

Traditional modular navigation pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules.
Most end-to-end navigation approaches rely on separate localization modules that require accurate sensor calibration, limiting generalization across embodiments and environments.
There is a need for an end-to-end navigation framework that provides implicit state estimation and metric-aware geometry for consistent planning and obstacle avoidance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end navigation framework with implicit state estimation
Metric-aware visual geometry for localization and planning
Reconstructed scene geometry for obstacle avoidance
🔎 Similar Papers
No similar papers found.
J
Jiaqi Peng
Department of Electronic Engineering, Tsinghua University
Wenzhe Cai
Wenzhe Cai
Shanghai AI Laboratory
Reinforcement LearningVisual NavigationRobotics
Y
Yuqiang Yang
Shanghai AI laboratory
Tai Wang
Tai Wang
Shanghai AI Laboratory
Computer Vision3D VisionEmbodied AIDeep Learning
Y
Yuan Shen
Department of Electronic Engineering, Tsinghua University and Shanghai AI laboratory
J
Jiangmiao Pang
Shanghai AI laboratory