Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation

📅 2024-11-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address model bias induced by long-tailed distribution in video scene graph generation (VidSGG) and scene graph anticipation (SGA), this paper proposes ImparTail—the first unbiased and robust framework for spatiotemporal scene graphs (STSG). ImparTail integrates curriculum learning with a dynamic loss masking mechanism to explicitly suppress the dominance of head relations and strengthen modeling capacity for tail relations. It formally introduces two novel tasks—“robust generation” and “robust prediction”—to advance fairness-aware evaluation and cross-distribution generalization of STSG. Evaluated on the Action Genome dataset, ImparTail achieves significant gains in tail-relation performance, outperforming state-of-the-art methods comprehensively in both unbiasedness and robustness to distributional shifts.

Technology Category

Application Category

📝 Abstract
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modelling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To this end, we propose ImparTail, a novel training framework that leverages curriculum learning and loss masking to mitigate bias in the generation and anticipation of spatio-temporal scene graphs. Our approach gradually decreases the dominance of the head relationship classes during training and focuses more on tail classes, leading to more balanced training. Furthermore, we introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts. Extensive experiments on the Action Genome dataset demonstrate that our framework significantly enhances the unbiased performance and robustness of STSG models compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Mitigates bias in spatio-temporal scene graph generation.
Addresses long-tailed distribution in visual relationship data.
Enhances robustness against distribution shifts in STSG tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages loss masking and curriculum learning
Impartial training objective reduces head class dominance
Curriculum-driven mask adaptively adjusts bias mitigation
🔎 Similar Papers
No similar papers found.