BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation

📅 2024-07-26
🏛️ Information Fusion
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Scene Graph Generation (SGG) suffers from coarse subject–predicate–object modeling and neglect of bidirectional dependencies between entities and predicates. To address this, we propose a Bidirectional Conditionalized Transformer architecture—the first to enable mutual guidance between visual feature processing and semantic decoding: visual features dynamically modulate semantic decoding, while semantic structures reciprocally constrain visual attention, thereby facilitating joint optimization of objects and relations. Our method comprises a dual-stream conditional encoder, a learnable relational prior module, and a contrastive relational rescoring mechanism. Evaluated on Visual Genome, our approach achieves a +3.2% improvement in Recall@100, demonstrating substantial gains in long-tail relation recognition and modeling of co-occurring multiple relations.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Addresses unidirectional conditioning limitation in SGG
Proposes bidirectional conditioning for entity-predicate interaction
Enhances generalization to unseen semantic relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional conditioning factorization in semantic space
Multi-stage interactive feature augmentation module
Random feature alignment with multi-modal knowledge
🔎 Similar Papers
No similar papers found.