Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

๐Ÿ“… 2025-06-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of achieving efficient human-robot collaboration for autonomous robots in high-risk scenarios, this paper proposes a lightweight, real-time semantic-geometric fusion method that bridges human-preferred 2D BIM-based environment representations with robot-centric 3D geometric perception. Leveraging synchronized image and LiDAR data, the method operates entirely on CPUโ€”employing bottom-up pixel-wise parsing and multi-layer graph-structured modeling to jointly denoise 2D maps and segment 3D point clouds, thereby generating a unified, structured scene graph. It is the first approach to achieve real-time, cross-scale (object- to building-level) semantic-geometric consistency on CPU alone. Evaluated on the NASA JPL NeBula-Spot robot, the system demonstrates robust performance in complex garage and office environments, enabling real-time semantic mapping and collaborative exploration.

Technology Category

Application Category

๐Ÿ“ Abstract
Autonomous robots are increasingly playing key roles as support platforms for human operators in high-risk, dangerous applications. To accomplish challenging tasks, an efficient human-robot cooperation and understanding is required. While typically robotic planning leverages 3D geometric information, human operators are accustomed to a high-level compact representation of the environment, like top-down 2D maps representing the Building Information Model (BIM). 3D scene graphs have emerged as a powerful tool to bridge the gap between human readable 2D BIM and the robot 3D maps. In this work, we introduce Pixels-to-Graph (Pix2G), a novel lightweight method to generate structured scene graphs from image pixels and LiDAR maps in real-time for the autonomous exploration of unknown environments on resource-constrained robot platforms. To satisfy onboard compute constraints, the framework is designed to perform all operation on CPU only. The method output are a de-noised 2D top-down environment map and a structure-segmented 3D pointcloud which are seamlessly connected using a multi-layer graph abstracting information from object-level up to the building-level. The proposed method is quantitatively and qualitatively evaluated during real-world experiments performed using the NASA JPL NeBula-Spot legged robot to autonomously explore and map cluttered garage and urban office like environments in real-time.
Problem

Research questions and friction points this paper is trying to address.

Bridging human-readable 2D BIM and robot 3D maps using scene graphs
Real-time generation of structured scene graphs from pixels and LiDAR
Enabling autonomous exploration in unknown environments with CPU-only processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time scene graph generation from pixels
Lightweight CPU-only processing framework
Multi-layer graph connecting 2D and 3D data
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Antonello Longo
NASA Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA.
C
Chanyoung Chung
Field AI, Mission Viejo CA, USA. (Work conducted at NASA Jet Propulsion Laboratory)
Matteo Palieri
Matteo Palieri
NASA Jet Propulsion Laboratory
Multi Robot Autonomy
S
Sung-Kyun Kim
Field AI, Mission Viejo CA, USA. (Work conducted at NASA Jet Propulsion Laboratory)
Ali Agha
Ali Agha
Field AI; formerly: NASA-JPL; Caltech; MIT
RoboticsAutonomous SystemsArtificial Intelligence
Cataldo Guaragnella
Cataldo Guaragnella
DEI - Dept. of Electrical and Information Engineering - Politecnico di Bari
Signal ProcessingSignalImage and Video CodingPattern recognitionMultidimensional signal processing
Shehryar Khattak
Shehryar Khattak
NASA Jet Propulsion Lab
RoboticsPerceptionComputer VisionSLAMSensor Fusion