RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of open-vocabulary 3D scene graph generation, which suffers from low object recognition accuracy and poor computational efficiency due to view occlusions and surface redundancies. The authors propose a re-projection-guided uncertainty estimation mechanism that effectively suppresses noise during cross-view feature aggregation. By integrating retrieval-augmented generation (RAG) conditioned on low-uncertainty objects, the method enhances semantic accuracy. Additionally, a dynamic downsampling mapping strategy is introduced to accelerate cross-image object aggregation. Evaluated on the Replica dataset, the approach significantly improves node description accuracy while reducing mapping time by approximately two-thirds, achieving more precise and efficient 3D scene graph construction.

Technology Category

Application Category

📝 Abstract
Open-vocabulary 3D Scene Graph (3DSG) generation can enhance various downstream tasks in robotics, such as manipulation and navigation, by leveraging structured semantic representations. A 3DSG is constructed from multiple images of a scene, where objects are represented as nodes and relationships as edges. However, existing works for open-vocabulary 3DSG generation suffer from both low object-level recognition accuracy and speed, mainly due to constrained viewpoints, occlusions, and redundant surface density. To address these challenges, we propose RAG-3DSG to mitigate aggregation noise through re-shot guided uncertainty estimation and support object-level Retrieval-Augmented Generation (RAG) via reliable low-uncertainty objects. Furthermore, we propose a dynamic downsample-mapping strategy to accelerate cross-image object aggregation with adaptive granularity. Experiments on Replica dataset demonstrate that RAG-3DSG significantly improves node captioning accuracy in 3DSG generation while reducing the mapping time by two-thirds compared to the vanilla version.
Problem

Research questions and friction points this paper is trying to address.

3D Scene Graph
open-vocabulary
object recognition
viewpoint constraints
occlusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
3D Scene Graph
Uncertainty Estimation
Dynamic Downsampling
Open-vocabulary Recognition
🔎 Similar Papers
No similar papers found.