PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For zero-shot object navigation (ZSON) in unknown environments, existing approaches suffer from reliance on depth sensors and pre-built maps, and frequently get trapped in local minima due to short-sighted decision-making. This paper proposes a map-free, RGB-only navigation framework. Methodologically, it introduces the first integration of panoramic visual input with a dynamic bounded memory queue: a panoramic scene parsing module enables fine-grained spatial understanding, while a memory-guided decision mechanism models long-horizon historical context to enhance spatial reasoning. Key contributions include: (i) complete elimination of depth sensors and prior maps; (ii) leveraging multimodal large models to improve generalization and robustness under map-free conditions; and (iii) dynamic memory that effectively mitigates myopic policies and local optima. On mainstream navigation benchmarks, our approach achieves significant improvements in Success Rate (SR) and Success-weighted by Path Length (SPL) over state-of-the-art baselines, validating its effectiveness and advancement.

Technology Category

Application Category

📝 Abstract
Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Mapless ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local deadlocks. Experiments on the public navigation benchmark show that PanoNav significantly outperforms representative baselines in both SR and SPL metrics.
Problem

Research questions and friction points this paper is trying to address.

Enabling robots to navigate to unseen objects without prebuilt maps or depth sensors
Addressing short-sighted decisions and local deadlocks in mapless navigation systems
Enhancing spatial reasoning of multimodal language models using only panoramic RGB inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Panoramic Scene Parsing module for MLLM spatial parsing
Dynamic Bounded Memory Queue for exploration history
Fully RGB-only mapless navigation framework
Q
Qunchao Jin
PEAK-Lab, The Hong Kong University of Science and Technology (Guangzhou)
Yilin Wu
Yilin Wu
Robotics PhD at CMU
Reinforcement learningRobotics
Changhao Chen
Changhao Chen
HKUST-GZ
Embodied AIRoboticsInertial NavigationSLAM