Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video Scene Graph Generation (VidSGG) suffers from inaccurate relation prediction due to visual and semantic biases. To address this, we propose VISA—the first vision-semantic dual debiasing framework—for VidSGG. VISA employs memory-augmented temporal modeling to capture dynamic entity evolution and introduces a triplet-driven semantic iterative fusion mechanism that jointly disentangles and recalibrates vision-semantic representations at the feature level. By synergistically mitigating both visual and semantic biases, VISA significantly enhances relation recognition robustness. On the SGCLS task under the Semi-Constrained setting, VISA achieves absolute improvements of +13.1% in mR@20 and mR@50 over prior unbiased VidSGG methods. These results validate the effectiveness and advancement of the dual-path debiasing paradigm for VidSGG.

Technology Category

Application Category

📝 Abstract
Video Scene Graph Generation (VidSGG) aims to capture dynamic relationships among entities by sequentially analyzing video frames and integrating visual and semantic information. However, VidSGG is challenged by significant biases that skew predictions. To mitigate these biases, we propose a VIsual and Semantic Awareness (VISA) framework for unbiased VidSGG. VISA addresses visual bias through memory-enhanced temporal integration that enhances object representations and concurrently reduces semantic bias by iteratively integrating object features with comprehensive semantic information derived from triplet relationships. This visual-semantics dual debiasing approach results in more unbiased representations of complex scene dynamics. Extensive experiments demonstrate the effectiveness of our method, where VISA outperforms existing unbiased VidSGG approaches by a substantial margin (e.g., +13.1% improvement in mR@20 and mR@50 for the SGCLS task under Semi Constraint).
Problem

Research questions and friction points this paper is trying to address.

Mitigate biases in Video Scene Graph Generation
Enhance object representations with visual-semantic integration
Improve accuracy in dynamic relationship prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-enhanced temporal integration reduces visual bias.
Iterative semantic integration minimizes semantic bias.
Visual-semantics dual debiasing enhances scene dynamics representation.
🔎 Similar Papers
No similar papers found.
Y
Yanjun Li
University of Science and Technology of China
Zhaoyang Li
Zhaoyang Li
Ph.D student, University of Science and Technology of China
Computer Vision
Honghui Chen
Honghui Chen
Professor of Finance, University of Central Florida
FinanceInvestments
L
Lizhi Xu
University of Science and Technology of China