Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Action recognition models often suffer from “static bias,” wherein they over-rely on static scene cues (e.g., background, objects), leading to poor generalization—especially in zero-shot settings. To address this, we propose a dual-stream disentanglement framework that explicitly separates static (biased) and dynamic (unbiased) representations. We enforce statistical independence between the two streams via an independence loss and further constrain the static stream to encode only scene information using a scene prediction loss, thereby suppressing its interference with action classification. The method requires no additional annotations and is plug-and-play compatible with mainstream architectures. Experiments across multiple benchmarks demonstrate substantial mitigation of static bias: on zero-shot action recognition, our approach achieves an average accuracy improvement of 8.2%. Moreover, it enhances robustness in real-world scenarios and improves model interpretability.

Technology Category

Application Category

📝 Abstract
Action recognition models rely excessively on static cues rather than dynamic human motion, which is known as static bias. This bias leads to poor performance in real-world applications and zero-shot action recognition. In this paper, we propose a method to reduce static bias by separating temporal dynamic information from static scene information. Our approach uses a statistical independence loss between biased and unbiased streams, combined with a scene prediction loss. Our experiments demonstrate that this method effectively reduces static bias and confirm the importance of scene prediction loss.
Problem

Research questions and friction points this paper is trying to address.

Reducing static bias in action recognition models
Separating temporal dynamic from static scene information
Improving performance in real-world and zero-shot scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates temporal dynamic from static scene information
Uses statistical independence loss between biased streams
Combines with scene prediction loss to reduce bias
🔎 Similar Papers
No similar papers found.
M
Masato Kobayashi
Nagoya Institute of Technology
N
Ning Ding
Nagoya Institute of Technology
Toru Tamaki
Toru Tamaki
Nagoya Institute of Technology
Computer visionPattern recognitionDeep learning