🤖 AI Summary
Scene graph generation (SGG) suffers significant performance degradation on corrupted images (e.g., noisy or occluded inputs). To address this, this paper proposes a layout-invariance–based robust SGG method. Our core innovation is the first introduction of a layout-guided normalization and reconstruction mechanism, realized via a plug-and-play Layout Embedding Encoder (LEE). LEE employs Instance Normalization to disentangle domain-specific features from structured layout representations, thereby mitigating domain shift between clean and corrupted images and enhancing model generalization. Evaluated on VG-C and GQA-C benchmarks, our method achieves new state-of-the-art results: mean Recall@50 improves by 5.6%, 8.0%, and 6.5% under PredCls, SGCls, and SGDet settings, respectively. These gains empirically validate the critical role of layout priors in robust SGG.
📝 Abstract
In this paper, we introduce a novel method named Robo-SGG, i.e., Layout-Oriented Normalization and Restitution for Robust Scene Graph Generation. Compared to the existing SGG setting, the robust scene graph generation aims to perform inference on a diverse range of corrupted images, with the core challenge being the domain shift between the clean and corrupted images. Existing SGG methods suffer from degraded performance due to compromised visual features e.g., corruption interference or occlusions. To obtain robust visual features, we exploit the layout information, which is domain-invariant, to enhance the efficacy of existing SGG methods on corrupted images. Specifically, we employ Instance Normalization(IN) to filter out the domain-specific feature and recover the unchangeable structural features, i.e., the positional and semantic relationships among objects by the proposed Layout-Oriented Restitution. Additionally, we propose a Layout-Embedded Encoder (LEE) that augments the existing object and predicate encoders within the SGG framework, enriching the robust positional and semantic features of objects and predicates. Note that our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model. Extensive experiments demonstrate that by integrating the state-of-the-art method into our proposed Robo-SGG, we achieve relative improvements of 5.6%, 8.0%, and 6.5% in mR@50 for PredCls, SGCls, and SGDet tasks on the VG-C dataset, respectively, and achieve new state-of-the-art performance in corruption scene graph generation benchmark (VG-C and GQA-C). We will release our source code and model.