ToLL: Topological Layout Learning with Structural Multi-view Augmentation for 3D Scene Graph Pretraining

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 3D scene graph generation are hindered by scarce annotated data and susceptibility to object priors, making it challenging to design effective self-supervised pretraining tasks. This work proposes a topological layout learning framework that, for the first time, formulates predicate-aware topological layout reconstruction as a self-supervised objective. By modeling spatial priors conditioned on anchor points and leveraging graph neural networks for topological-geometric reasoning, the approach recovers the global structure of subgraphs. To preserve semantic fidelity, the method incorporates structure-aware multi-view augmentation and enhances relational representations through self-distillation. Evaluated on the 3DSSG dataset, the proposed framework significantly outperforms current state-of-the-art baselines, demonstrating its effectiveness and robustness.
📝 Abstract
3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and semantic-affordance perception. However, its generalizability is often constrained by data scarcity. Current solutions primarily focus on cross-modal assisted representation learning and object-centric generation pre-training. The former relies heavily on predicate annotations, while the latter's predicate learning may be bypassed due to strong object priors. Consequently, they could not often provide a label-free and robust self-supervised proxy task for 3DSG fine-tuning. To bridge this gap, we propose a Topological Layout Learning (ToLL) for 3DSG pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning, with a GNN to recover the global layout of zero-centered subgraphs by the spatial priors from sparse anchors. This process is strictly modulated by predicate features, thereby enforcing the predicate relation learning. Furthermore, we construct a Structural Multi-view Augmentation to avoid semantic corruption, and enhancing representations via self-distillation. The extensive experiments on 3DSSG dataset demonstrate that our ToLL could improve representation quality, outperforming state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

3D Scene Graph
self-supervised learning
predicate learning
data scarcity
pretraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Topological Layout Learning
Structural Multi-view Augmentation
3D Scene Graph Pretraining
Anchor-Conditioned Reasoning
Self-supervised Representation Learning
🔎 Similar Papers
No similar papers found.
Y
Yucheng Huang
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
L
Luping Ji
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
X
Xiangwei Jiang
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
Wen Li
Wen Li
Data Intelligence Group, UESTC
Machine LearningComputer VisionDomain AdaptationTransfer LearningWeb Data
M
Mao Ye
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China