🤖 AI Summary
To address model bias induced by long-tailed distribution in video scene graph generation (VidSGG) and scene graph anticipation (SGA), this paper proposes ImparTail—the first unbiased and robust framework for spatiotemporal scene graphs (STSG). ImparTail integrates curriculum learning with a dynamic loss masking mechanism to explicitly suppress the dominance of head relations and strengthen modeling capacity for tail relations. It formally introduces two novel tasks—“robust generation” and “robust prediction”—to advance fairness-aware evaluation and cross-distribution generalization of STSG. Evaluated on the Action Genome dataset, ImparTail achieves significant gains in tail-relation performance, outperforming state-of-the-art methods comprehensively in both unbiasedness and robustness to distributional shifts.
📝 Abstract
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modelling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To this end, we propose ImparTail, a novel training framework that leverages curriculum learning and loss masking to mitigate bias in the generation and anticipation of spatio-temporal scene graphs. Our approach gradually decreases the dominance of the head relationship classes during training and focuses more on tail classes, leading to more balanced training. Furthermore, we introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts. Extensive experiments on the Action Genome dataset demonstrate that our framework significantly enhances the unbiased performance and robustness of STSG models compared to existing methods.