🤖 AI Summary
This work addresses the challenge of efficiently exploring an environment under a limited action budget to construct high-quality semantic scene graphs (SSGs). The authors propose a reinforcement learning–based navigation strategy that leverages a fine-grained discrete action space and a factorized multi-head policy network to significantly enhance both exploration efficiency and decision-making capability. By incorporating modern RL optimization algorithms, they achieve a 21% relative improvement in SSG completeness without modifying the reward function—solely through optimizer replacement. Further gains are realized by combining this approach with a factorized action representation, which optimally balances completeness and efficiency. Systematic evaluation demonstrates that curriculum learning and depth-aware collision supervision effectively improve training stability and execution safety.
📝 Abstract
Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representations. In Organic Computing, such models are a key enabler for objective-driven self-adaptation under uncertainty and resource constraints. The core challenge is to acquire observations maximising model quality and downstream usefulness within a limited action budget.
Semantic scene graphs (SSGs) provide a structured and compact representation for this purpose. However, constructing them within a finite action horizon requires exploration strategies that trade off information gain against navigation cost and decide when additional actions yield diminishing returns.
This work presents a modular navigation component for Embodied Semantic Scene Graph Generation and modernises its decision-making by replacing the policy-optimisation method and revisiting the discrete action formulation. We study compact and finer-grained, larger discrete motion sets and compare a single-head policy over atomic actions with a factorised multi-head policy over action components. We evaluate curriculum learning and optional depth-based collision supervision, and assess SSG completeness, execution safety, and navigation behaviour.
Results show that replacing the optimisation algorithm alone improves SSG completeness by 21\% relative to the baseline under identical reward shaping. Depth mainly affects execution safety (collision-free motion), while completeness remains largely unchanged. Combining modern optimisation with a finer-grained, factorised action representation yields the strongest overall completeness--efficiency trade-off.