Can Masking Background and Object Reduce Static Bias for Zero-Shot Action Recognition?

📅 2025-01-22
🏛️ Conference on Multimedia Modeling
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In zero-shot action recognition, models are prone to static biases—such as background and object cues—that distort action semantic modeling. To address this, we propose a dual-mask intervention mechanism: for the first time, learnable foreground/background mask modules explicitly decouple scene-redundant information from action semantics within a CLIP-based transfer framework, enabling unbiased vision–language alignment. Furthermore, we introduce contrastive action semantic distillation to enhance discriminative action representations. On UCF101 and HMDB51, our method achieves zero-shot accuracy improvements of +5.2% and +4.8%, respectively, while significantly reducing background confusion. This work establishes an interpretable and scalable paradigm for mitigating static biases in action recognition, advancing both robustness and generalizability of zero-shot models.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Zero-shot Action Recognition
Static Bias
Accuracy Issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot Action Recognition
Bias Mitigation
Background and Object Occlusion
🔎 Similar Papers
No similar papers found.