DDS: Decoupled Dynamic Scene-Graph Generation Network

📅 2023-01-18
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing scene graph generation (SGG) methods suffer from poor generalization to unseen object–predicate–object triplets due to their reliance on joint object–relation feature learning. To address this, we propose a novel dual-branch decoupled architecture—the first to fully disentangle object and relation feature representations. Our approach comprises: (i) independent object and relation branches; (ii) a bi-directional dynamic feature decoupling network; (iii) a contrastive-driven feature decorrelation loss; and (iv) cross-dataset joint training for enhanced compositional generalization and zero-shot transfer. Evaluated on three major benchmarks—including VisualGenome—our method achieves a 12.7% absolute gain in unseen triplet detection accuracy over state-of-the-art methods. It demonstrates substantially improved generalization capability and robustness, particularly under distribution shifts and zero-shot settings.
📝 Abstract
—Scene-graph generation involves creating a struc- tural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS– a decoupled dynamic scene-graph generation network– that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.
Problem

Research questions and friction points this paper is trying to address.

Scene Graph Generation
Unseen Object Relationships
Joint Feature Learning Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

DDS Method
Separate Feature Processing
Unseen Relationship Recognition
🔎 Similar Papers
No similar papers found.
A S M Iftekhar
A S M Iftekhar
Research Scientist, Microsoft
Computer VisionImage/Video ProcessingMachine LearningDeep Learning
R
Raphael Ruschel
University of California Santa Barbara
S
Satish Kumar
University of California Santa Barbara
Suya You
Suya You
USC
Computer VisionMachine LearningComputer GraphicsHuman Computer InteractionData Visualization
B
B. S. Manjunath
University of California Santa Barbara