🤖 AI Summary
Existing NeRF methods focus on object-level representations and lack explicit modeling of inter-object semantic relationships. Method: This work introduces the first approach to directly extract open-vocabulary 3D semantic relations from NeRFs, implicitly encoding relations as ray-pairs and designing a differentiable relation query network. It integrates knowledge distillation from multimodal large language models (MLLMs) with open-vocabulary vision-language features to establish the first relation-prior-guided framework for radiance fields. Contribution/Results: Our method achieves state-of-the-art performance on both open-vocabulary 3D scene graph generation and relation-guided instance segmentation. It significantly enhances cross-object semantic relationship understanding, marking a critical step toward structured, reasoning-capable 3D scene understanding with NeRF.
📝 Abstract
Neural radiance fields are an emerging 3D scene representation and recently even been extended to learn features for scene understanding by distilling open-vocabulary features from vision-language models. However, current method primarily focus on object-centric representations, supporting object segmentation or detection, while understanding semantic relationships between objects remains largely unexplored. To address this gap, we propose RelationField, the first method to extract inter-object relationships directly from neural radiance fields. RelationField represents relationships between objects as pairs of rays within a neural radiance field, effectively extending its formulation to include implicit relationship queries. To teach RelationField complex, open-vocabulary relationships, relationship knowledge is distilled from multi-modal LLMs. To evaluate RelationField, we solve open-vocabulary 3D scene graph generation tasks and relationship-guided instance segmentation, achieving state-of-the-art performance in both tasks. See the project website at https://relationfield.github.io.