🤖 AI Summary
This work investigates how prior information from a fixed external RGB camera can enhance the initial completeness and exploration efficiency of 3D scene graphs in robotic active exploration. The method models external camera observations as a Common Prior Map (CPM), constructing semantic and geometric priors before exploration begins. A hardware-agnostic, RGB-only multi-view fusion framework seamlessly integrates both onboard and external viewpoints to enable incremental 3D scene graph generation. The key innovation lies in leveraging a fixed external camera as a universal prior source without requiring hardware modifications, coupled with a semantic uncertainty–guided active exploration strategy driven by partial scene graphs. Experiments demonstrate that a single external camera can improve initial object recall by up to 79%, substantially boosting both exploration efficiency and scene graph completeness.
📝 Abstract
Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of the environment that initialize a semantic and geometric scene prior before any robot motion begins. We present an RGB-only framework for active, incremental 3D scene graph (3DSG) generation that seamlessly fuses observations from both onboard robot cameras and fixed external cameras within a single hardware-agnostic pipeline. By relying solely on RGB observations processed by a feed-forward 3D reconstruction model, the system treats all cameras - onboard or external - identically, requiring no hardware modifications. A graph-based active semantic exploration framework then directly leverages the partial scene graph to guide the robot toward regions of high semantic uncertainty, progressively completing and refining the prior. Experiments demonstrate that bootstrapping the scene graph with even a single external camera increases initial object recall by up to +79%, and that the richer context of the prior significantly improves the efficiency of subsequent active exploration.