🤖 AI Summary
This work addresses the challenge of object search in dynamic indoor environments, where shifting object positions invalidate historical scene knowledge and substantially increase search difficulty. To tackle this, the authors propose a two-layer semantic map that integrates uncertainty-aware 3D scene graph priors with online semantic observations from a vision-language model (VLM). They further introduce the IGV-RRT planner, which— for the first time—jointly optimizes scene graph priors and VLM semantic scores, guiding path planning through a synergy of information gain and semantic evidence. By combining a variant of RRT* with gradient-based motion feasibility analysis, the method achieves significantly higher search success rates and efficiency compared to existing baselines, as demonstrated in both simulated and real-world environments.
📝 Abstract
Object Goal Navigation (ObjectNav) in temporally changing indoor environments is challenging because object relocation can invalidate historical scene knowledge. To address this issue, we propose a probabilistic planning framework that combines uncertainty-aware scene priors with online target relevance estimates derived from a Vision Language Model (VLM). The framework contains a dual-layer semantic mapping module and a real-time planner. The mapping module includes an Information Gain Map (IGM) built from a 3D scene graph (3DSG) during prior exploration to model object co-occurrence relations and provide global guidance on likely target regions. It also maintains a VLM score map (VLM-SM) that fuses confidence-weighted semantic observations into the map for local validation of the current scene. Based on these two cues, we develop a planner that jointly exploits information gain and semantic evidence for online decision making. The planner biases tree expansion toward semantically salient regions with high prior likelihood and strong online relevance (IGV-RRT), while preserving kinematic feasibility through gradient-based analysis. Simulation and real-world experiments demonstrate that the proposed method effectively mitigates the impact of object rearrangement, achieving higher search efficiency and success rates than representative baselines in complex indoor environments.