Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the challenge of modeling long-tailed semantic relationships in video scene graph generation by proposing the FReMuRe model. The approach employs a multi-level reasoning mechanism that incorporates relation-specific branches to alleviate gradient conflicts and introduces a frequency-aware dual-branch predicate embedding network. Furthermore, it integrates interchangeable Bayesian and Gaussian mixture model heads, combined with gated fusion and uncertainty estimation, to enhance discriminative capability for rare relationships. Experimental results on the Action Genome dataset demonstrate that FReMuRe significantly improves recall for long-tailed relations and overall inference robustness.

Technology Category

Application Category

📝 Abstract
Video Scene Graph Generation aims to obtain structured semantic representations of objects and their relationships in videos for high-level understanding. However, existing methods still have limitations in handling long-tail distributions. This paper proposes the Frequency-guided Relational Multi-level Reasoning (FReMuRe) model, which enhances the modeling ability of long-tail relationships from a mechanism perspective. We introduce relation-specific branches to deal gradient conflicts, yielding more balanced and tail-aware learning. And we design a frequency-aware dual-branch predicate embedding network to model high-frequency and low-frequency relationships separately and improve the recall rate of tail classes through gated fusion. Meanwhile, we propose two types of interchangeable relation classification heads: Bayesian Head for uncertainty estimation and new Gaussian Mixture Model Head to enhance intra-class diversity. Experimental results show that FReMuRe significantly improves the recall rate of long-tail relationships and overall reasoning robustness on the Action Genome dataset.
Problem

Research questions and friction points this paper is trying to address.

long-tail distribution
scene graph generation
video understanding
relationship modeling
tail classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

long-tail learning
frequency-aware modeling
relation-specific branches
gated fusion
Gaussian Mixture Model Head