Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

📅 2025-03-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of functional 3D understanding in real indoor scenes by introducing, for the first time, the open-vocabulary Functional 3D Scene Graph (F3SG), which explicitly models objects, interaction primitives, and their functional relationships—going beyond conventional spatial relation modeling. Methodologically, it synergistically integrates vision-language models (VLMs) and large language models (LLMs) to inject implicit functional knowledge, combined with RGB-D perception, graph neural decoding, and cross-modal functional reasoning to mitigate the scarcity of functional annotations. Evaluated on SceneFun3D and the newly introduced FunGraph3D benchmark, our approach significantly outperforms prior methods including Open3DSG and ConceptGraph. The generated functional scene graphs directly support downstream tasks such as 3D visual question answering and embodied robotic manipulation, establishing a novel paradigm for general-purpose indoor scene understanding.

Technology Category

Application Category

📝 Abstract
We introduce the task of predicting functional 3D scene graphs for real-world indoor environments from posed RGB-D images. Unlike traditional 3D scene graphs that focus on spatial relationships of objects, functional 3D scene graphs capture objects, interactive elements, and their functional relationships. Due to the lack of training data, we leverage foundation models, including visual language models (VLMs) and large language models (LLMs), to encode functional knowledge. We evaluate our approach on an extended SceneFun3D dataset and a newly collected dataset, FunGraph3D, both annotated with functional 3D scene graphs. Our method significantly outperforms adapted baselines, including Open3DSG and ConceptGraph, demonstrating its effectiveness in modeling complex scene functionalities. We also demonstrate downstream applications such as 3D question answering and robotic manipulation using functional 3D scene graphs. See our project page at https://openfungraph.github.io
Problem

Research questions and friction points this paper is trying to address.

Predicting functional 3D scene graphs from RGB-D images
Capturing objects and functional relationships in indoor spaces
Leveraging foundation models for functional knowledge encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicting functional 3D scene graphs
Leveraging foundation models for encoding
Outperforming Open3DSG and ConceptGraph baselines
🔎 Similar Papers