Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Action recognition models often exhibit background bias—over-relying on contextual cues at the expense of motion semantics—thereby compromising generalization and robustness. This work presents the first systematic evaluation of background bias across three model families: standard classification models, contrastive vision-language pre-trained models (e.g., CLIP), and video large language models (VLLMs). To mitigate this bias, we propose a dual-path disentanglement framework: (1) input purification via human instance segmentation to explicitly remove background interference, and (2) prompt tuning that synergistically combines handcrafted priors with automated search to steer attention toward human pose and motion dynamics. We introduce a quantitative bias evaluation framework to rigorously assess mitigation efficacy. Experiments show a 3.78% reduction in background bias for classification models and a 9.85% improvement in human-centric focus for VLLMs on action discrimination tasks, yielding substantial gains in cross-background robustness.

Technology Category

Application Category

📝 Abstract
Human action recognition models often rely on background cues rather than human movement and pose to make predictions, a behavior known as background bias. We present a systematic analysis of background bias across classification models, contrastive text-image pretrained models, and Video Large Language Models (VLLM) and find that all exhibit a strong tendency to default to background reasoning. Next, we propose mitigation strategies for classification models and show that incorporating segmented human input effectively decreases background bias by 3.78%. Finally, we explore manual and automated prompt tuning for VLLMs, demonstrating that prompt design can steer predictions towards human-focused reasoning by 9.85%.
Problem

Research questions and friction points this paper is trying to address.

Analyzes background bias in action recognition models
Proposes mitigation strategies to reduce bias
Explores prompt tuning for human-focused reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporating segmented human input reduces bias
Manual prompt tuning steers models to human reasoning
Automated prompt design improves action recognition focus
🔎 Similar Papers
No similar papers found.
E
Ellie Zhou
Westmont High School
J
Jihoon Chung
Princeton University
Olga Russakovsky
Olga Russakovsky
Associate Professor, Princeton University
Computer vision