🤖 AI Summary
Existing 2D scene graph research is hindered by sparse, coarse-grained (binary, discrete) relationship annotations in real-world data and insufficient modeling of latent relationships. To address this, we propose CoPa-SG—the first high-fidelity synthetic scene graph dataset that comprehensively covers all object pairs, incorporates fine-grained parametric relations (e.g., relative angle, distance), and introduces prototype relations encoding hypothetical associations triggered by novel objects. Methodologically, we integrate geometric priors with differentiable rendering to generate dense, pixel-accurate annotations, and design a structured, differentiable relational representation paradigm. This framework transcends conventional discrete relation modeling, substantially enhancing expressive power and prospective reasoning capability. Evaluations across multiple vision-language models demonstrate that our novel relation types improve mean Recall@100 for relationship prediction in visual-language navigation and robotic planning tasks by 12.3%. Moreover, CoPa-SG enables downstream integration with causal reasoning and task planning frameworks.
📝 Abstract
2D scene graphs provide a structural and explainable framework for scene understanding. However, current work still struggles with the lack of accurate scene graph data. To overcome this data bottleneck, we present CoPa-SG, a synthetic scene graph dataset with highly precise ground truth and exhaustive relation annotations between all objects. Moreover, we introduce parametric and proto-relations, two new fundamental concepts for scene graphs. The former provides a much more fine-grained representation than its traditional counterpart by enriching relations with additional parameters such as angles or distances. The latter encodes hypothetical relations in a scene graph and describes how relations would form if new objects are placed in the scene. Using CoPa-SG, we compare the performance of various scene graph generation models. We demonstrate how our new relation types can be integrated in downstream applications to enhance planning and reasoning capabilities.