๐ค AI Summary
To address the challenge of achieving efficient human-robot collaboration for autonomous robots in high-risk scenarios, this paper proposes a lightweight, real-time semantic-geometric fusion method that bridges human-preferred 2D BIM-based environment representations with robot-centric 3D geometric perception. Leveraging synchronized image and LiDAR data, the method operates entirely on CPUโemploying bottom-up pixel-wise parsing and multi-layer graph-structured modeling to jointly denoise 2D maps and segment 3D point clouds, thereby generating a unified, structured scene graph. It is the first approach to achieve real-time, cross-scale (object- to building-level) semantic-geometric consistency on CPU alone. Evaluated on the NASA JPL NeBula-Spot robot, the system demonstrates robust performance in complex garage and office environments, enabling real-time semantic mapping and collaborative exploration.
๐ Abstract
Autonomous robots are increasingly playing key roles as support platforms for human operators in high-risk, dangerous applications. To accomplish challenging tasks, an efficient human-robot cooperation and understanding is required. While typically robotic planning leverages 3D geometric information, human operators are accustomed to a high-level compact representation of the environment, like top-down 2D maps representing the Building Information Model (BIM). 3D scene graphs have emerged as a powerful tool to bridge the gap between human readable 2D BIM and the robot 3D maps. In this work, we introduce Pixels-to-Graph (Pix2G), a novel lightweight method to generate structured scene graphs from image pixels and LiDAR maps in real-time for the autonomous exploration of unknown environments on resource-constrained robot platforms. To satisfy onboard compute constraints, the framework is designed to perform all operation on CPU only. The method output are a de-noised 2D top-down environment map and a structure-segmented 3D pointcloud which are seamlessly connected using a multi-layer graph abstracting information from object-level up to the building-level. The proposed method is quantitatively and qualitatively evaluated during real-world experiments performed using the NASA JPL NeBula-Spot legged robot to autonomously explore and map cluttered garage and urban office like environments in real-time.