RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenges of open-vocabulary 3D scene graph generation, which suffers from low object recognition accuracy and poor computational efficiency due to view occlusions and surface redundancies. The authors propose a re-projection-guided uncertainty estimation mechanism that effectively suppresses noise during cross-view feature aggregation. By integrating retrieval-augmented generation (RAG) conditioned on low-uncertainty objects, the method enhances semantic accuracy. Additionally, a dynamic downsampling mapping strategy is introduced to accelerate cross-image object aggregation. Evaluated on the Replica dataset, the approach significantly improves node description accuracy while reducing mapping time by approximately two-thirds, achieving more precise and efficient 3D scene graph construction.

Technology Category

Application Category

📝 Abstract

Open-vocabulary 3D Scene Graph (3DSG) generation can enhance various downstream tasks in robotics, such as manipulation and navigation, by leveraging structured semantic representations. A 3DSG is constructed from multiple images of a scene, where objects are represented as nodes and relationships as edges. However, existing works for open-vocabulary 3DSG generation suffer from both low object-level recognition accuracy and speed, mainly due to constrained viewpoints, occlusions, and redundant surface density. To address these challenges, we propose RAG-3DSG to mitigate aggregation noise through re-shot guided uncertainty estimation and support object-level Retrieval-Augmented Generation (RAG) via reliable low-uncertainty objects. Furthermore, we propose a dynamic downsample-mapping strategy to accelerate cross-image object aggregation with adaptive granularity. Experiments on Replica dataset demonstrate that RAG-3DSG significantly improves node captioning accuracy in 3DSG generation while reducing the mapping time by two-thirds compared to the vanilla version.

Problem

Research questions and friction points this paper is trying to address.

3D Scene Graph

open-vocabulary

object recognition

viewpoint constraints

occlusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

3D Scene Graph

Uncertainty Estimation