🤖 AI Summary
Zero-shot outdoor long-range navigation faces significant challenges including extremely small target size, severe occlusion, and intermittent visibility. This paper proposes a lightweight closed-loop navigation framework that constructs a hierarchical multi-scale image patch structure, integrating target semantics with visual saliency to enable robust directional estimation and visibility awareness for sub-pixel-sized targets at distances exceeding 100 meters. A hierarchical saliency fusion mechanism is introduced, combining keyframe memory with saliency-weighted historical heading integration to support active target search and heading maintenance under occlusion—without requiring full-image downscaling. Evaluated in both simulation and real-world outdoor environments, the system stably detects semantic targets beyond 150 meters; achieves an 82.6% heading accuracy under dynamic visibility conditions; and improves task success rate by 17.5% over state-of-the-art methods.
📝 Abstract
Zero-shot object navigation (ZSON) in large-scale outdoor environments faces many challenges; we specifically address a coupled one: long-range targets that reduce to tiny projections and intermittent visibility due to partial or complete occlusion. We present a unified, lightweight closed-loop system built on an aligned multi-scale image tile hierarchy. Through hierarchical target-saliency fusion, it summarizes localized semantic contrast into a stable coarse-layer regional saliency that provides the target direction and indicates target visibility. This regional saliency supports visibility-aware heading maintenance through keyframe memory, saliency-weighted fusion of historical headings, and active search during temporary invisibility. The system avoids whole-image rescaling, enables deterministic bottom-up aggregation, supports zero-shot navigation, and runs efficiently on a mobile robot. Across simulation and real-world outdoor trials, the system detects semantic targets beyond 150m, maintains a correct heading through visibility changes with 82.6% probability, and improves overall task success by 17.5% compared with the SOTA methods, demonstrating robust ZSON toward distant and intermittently observable targets.