Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the challenge of achieving efficient human-robot collaboration for autonomous robots in high-risk scenarios, this paper proposes a lightweight, real-time semantic-geometric fusion method that bridges human-preferred 2D BIM-based environment representations with robot-centric 3D geometric perception. Leveraging synchronized image and LiDAR data, the method operates entirely on CPU—employing bottom-up pixel-wise parsing and multi-layer graph-structured modeling to jointly denoise 2D maps and segment 3D point clouds, thereby generating a unified, structured scene graph. It is the first approach to achieve real-time, cross-scale (object- to building-level) semantic-geometric consistency on CPU alone. Evaluated on the NASA JPL NeBula-Spot robot, the system demonstrates robust performance in complex garage and office environments, enabling real-time semantic mapping and collaborative exploration.

Technology Category

Application Category

📝 Abstract

Autonomous robots are increasingly playing key roles as support platforms for human operators in high-risk, dangerous applications. To accomplish challenging tasks, an efficient human-robot cooperation and understanding is required. While typically robotic planning leverages 3D geometric information, human operators are accustomed to a high-level compact representation of the environment, like top-down 2D maps representing the Building Information Model (BIM). 3D scene graphs have emerged as a powerful tool to bridge the gap between human readable 2D BIM and the robot 3D maps. In this work, we introduce Pixels-to-Graph (Pix2G), a novel lightweight method to generate structured scene graphs from image pixels and LiDAR maps in real-time for the autonomous exploration of unknown environments on resource-constrained robot platforms. To satisfy onboard compute constraints, the framework is designed to perform all operation on CPU only. The method output are a de-noised 2D top-down environment map and a structure-segmented 3D pointcloud which are seamlessly connected using a multi-layer graph abstracting information from object-level up to the building-level. The proposed method is quantitatively and qualitatively evaluated during real-world experiments performed using the NASA JPL NeBula-Spot legged robot to autonomously explore and map cluttered garage and urban office like environments in real-time.

Problem

Research questions and friction points this paper is trying to address.

Bridging human-readable 2D BIM and robot 3D maps using scene graphs

Real-time generation of structured scene graphs from pixels and LiDAR

Enabling autonomous exploration in unknown environments with CPU-only processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time scene graph generation from pixels

Lightweight CPU-only processing framework

Multi-layer graph connecting 2D and 3D data

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Robotics Autonomy Engineer-Planning and Control

Field AI

Irvine, CA

Authors to Follow