Estimating Commonsense Scene Composition on Belief Scene Graphs

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of estimating spatial distributions of unseen objects in scene understanding. We propose Extended Belief Scene Graphs (EBSGs) to model commonsense spatial composition relationships among semantic categories. Methodologically, we formally define “commonsense scene composition” and introduce the neural-symbolic CECI model, which integrates graph convolutional networks (GCNs) with a large language model–driven spatial ontology to enable interpretable probabilistic spatial reasoning; joint object location distributions are modeled via probabilistic graphical models. Evaluated on both simulated and real-world indoor multi-room scenes, our approach significantly improves the plausibility and semantic consistency of unseen object localization, while enabling cross-room-type spatial scene parsing. Key contributions include: (1) the first formalization of commonsense scene composition for spatial reasoning; (2) a neuro-symbolic architecture unifying geometric priors and linguistic knowledge; and (3) empirical validation demonstrating state-of-the-art performance in zero-shot spatial layout prediction.

Technology Category

Application Category

📝 Abstract
This work establishes the concept of commonsense scene composition, with a focus on extending Belief Scene Graphs by estimating the spatial distribution of unseen objects. Specifically, the commonsense scene composition capability refers to the understanding of the spatial relationships among related objects in the scene, which in this article is modeled as a joint probability distribution for all possible locations of the semantic object class. The proposed framework includes two variants of a Correlation Information (CECI) model for learning probability distributions: (i) a baseline approach based on a Graph Convolutional Network, and (ii) a neuro-symbolic extension that integrates a spatial ontology based on Large Language Models (LLMs). Furthermore, this article provides a detailed description of the dataset generation process for such tasks. Finally, the framework has been validated through multiple runs on simulated data, as well as in a real-world indoor environment, demonstrating its ability to spatially interpret scenes across different room types.
Problem

Research questions and friction points this paper is trying to address.

Estimating spatial distribution of unseen objects in scenes
Modeling spatial relationships via joint probability distributions
Validating framework on simulated and real-world indoor data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Belief Scene Graphs with spatial distribution estimation
Uses Graph Convolutional Network for probability distributions
Integrates neuro-symbolic LLM-based spatial ontology
🔎 Similar Papers
No similar papers found.
M
Mario A. V. Saucedo
Robotics & AI Team, Department of Computer, Electrical and Space Engineering, Lule˚a University of Technology, Lule˚a SE-97187, Sweden
V
V. Viswanathan
Robotics & AI Team, Department of Computer, Electrical and Space Engineering, Lule˚a University of Technology, Lule˚a SE-97187, Sweden
Christoforos Kanellakis
Christoforos Kanellakis
PhD, Luleå University of Technology
RoboticsComputer VisionControl Theory
G
G. Nikolakopoulos
Robotics & AI Team, Department of Computer, Electrical and Space Engineering, Lule˚a University of Technology, Lule˚a SE-97187, Sweden