🤖 AI Summary
This work addresses the autonomous exploration problem for mobile robots operating in unknown environments, where geometric and semantic mapping must be performed simultaneously. We propose the first semantic-guided next-best-view (NBV) selection framework. Our method formalizes “semantic exploration” as a novel task, introduces a semantic visibility scoring mechanism to enable active perception jointly optimized for structural and semantic map construction, and integrates semantic segmentation networks with 3D reconstruction and multi-view sampling optimization. Evaluated in both simulation and on real robotic platforms, the approach significantly improves semantic map accuracy and environmental understanding. Moreover, it enhances generalization performance on downstream tasks—including object localization and scene-based question answering—by leveraging semantically informed viewpoint selection. The framework bridges the gap between traditional geometry-driven NBV strategies and high-level semantic reasoning, enabling more intelligent and task-aware robotic exploration.
📝 Abstract
The rise of embodied AI applications has enabled robots to perform complex tasks which require a sophisticated understanding of their environment. To enable successful robot operation in such settings, maps must be constructed so that they include semantic information, in addition to geometric information. In this paper, we address the novel problem of semantic exploration, whereby a mobile robot must autonomously explore an environment to fully map both its structure and the semantic appearance of features. We develop a method based on next-best-view exploration, where potential poses are scored based on the semantic features visible from that pose. We explore two alternative methods for sampling potential views and demonstrate the effectiveness of our framework in both simulation and physical experiments. Automatic creation of high-quality semantic maps can enable robots to better understand and interact with their environments and enable future embodied AI applications to be more easily deployed.