Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scene graph generation (SGG) suffers significant performance degradation on corrupted images (e.g., noisy or occluded inputs). To address this, this paper proposes a layout-invariance–based robust SGG method. Our core innovation is the first introduction of a layout-guided normalization and reconstruction mechanism, realized via a plug-and-play Layout Embedding Encoder (LEE). LEE employs Instance Normalization to disentangle domain-specific features from structured layout representations, thereby mitigating domain shift between clean and corrupted images and enhancing model generalization. Evaluated on VG-C and GQA-C benchmarks, our method achieves new state-of-the-art results: mean Recall@50 improves by 5.6%, 8.0%, and 6.5% under PredCls, SGCls, and SGDet settings, respectively. These gains empirically validate the critical role of layout priors in robust SGG.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a novel method named Robo-SGG, i.e., Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation. Compared to the existing SGG setting, the robust scene graph generation aims to perform inference on a diverse range of corrupted images, with the core challenge being the domain shift between the clean and corrupted images. Existing SGG methods suffer from degraded performance due to compromised visual features e.g., corruption interference or occlusions. To obtain robust visual features, we exploit the layout information, which is domain-invariant, to enhance the efficacy of existing SGG methods on corrupted images. Specifically, we employ Instance Normalization(IN) to filter out the domain-specific feature and recover the unchangeable structural features, i.e., the positional and semantic relationships among objects by the proposed Layout-Oriented Restitution. Additionally, we propose a Layout-Embedded Encoder (LEE) that augments the existing object and predicate encoders within the SGG framework, enriching the robust positional and semantic features of objects and predicates. Note that our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model. Extensive experiments demonstrate that by integrating the state-of-the-art method into our proposed Robo-SGG, we achieve relative improvements of 5.6%, 8.0%, and 6.5% in mR@50 for PredCls, SGCls, and SGDet tasks on the VG-C dataset, respectively, and achieve new state-of-the-art performance in corruption scene graph generation benchmark (VG-C and GQA-C). We will release our source code and model.
Problem

Research questions and friction points this paper is trying to address.

Enhance scene graph generation on corrupted images
Address domain shift between clean and corrupted images
Improve robustness using layout-oriented normalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layout-Oriented Normalization filters domain-specific features
Layout-Oriented Restitution recovers structural features
Layout-Embedded Encoder enriches robust positional features
🔎 Similar Papers
No similar papers found.
Changsheng Lv
Changsheng Lv
Beijing University of Posts and Telecommunications
Scene Graph GenerationAutonomous Driving
M
Mengshi Qi
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Z
Zijian Fu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Huadong Ma
Huadong Ma
BUPT
Internet of ThingsMultimedia