🤖 AI Summary
This work addresses the challenge of efficient zero-shot object navigation for resource-constrained drones in unknown environments by proposing a navigation framework based on an incremental spatio-semantic scene graph. The approach constructs a spatial connectivity graph via polyhedral expansion and dynamically partitions semantic regions through graph clustering, yielding a hierarchical environment representation. It integrates open-vocabulary object semantic anchoring, large language model–guided global planning, and an information-gain–driven local exploration strategy. Implemented on an onboard platform, the system achieves real-time updates at 15 Hz and significantly outperforms existing methods in terms of Success weighted by Path Length (SPL), marking the first demonstration of efficient zero-shot navigation on a lightweight system.
📝 Abstract
Zero-Shot Object Navigation in unknown environments poses significant challenges for Unmanned Aerial Vehicles (UAVs) due to the conflict between high-level semantic reasoning requirements and limited onboard computational resources. To address this, we present USS-Nav, a lightweight framework that incrementally constructs a Unified Spatio-Semantic scene graph and enables efficient Large Language Model (LLM)-augmented Zero-Shot Object Navigation in unknown environments. Specifically, we introduce an incremental Spatial Connectivity Graph generation method utilizing polyhedral expansion to capture global geometric topology, which is dynamically partitioned into semantic regions via graph clustering. Concurrently, open-vocabulary object semantics are instantiated and anchored to this topology to form a hierarchical environmental representation. Leveraging this hierarchical structure, we present a coarse-to-fine exploration strategy: LLM grounded in the scene graph's semantics to determine global target regions, while a local planner optimizes frontier coverage based on information gain. Experimental results demonstrate that our framework outperforms state-of-the-art methods in terms of computational efficiency and real-time update frequency (15 Hz) on a resource-constrained platform. Furthermore, ablation studies confirm the effectiveness of our framework, showing substantial improvements in Success weighted by Path Length (SPL). The source code will be made publicly available to foster further research.