OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

📅 2024-11-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing action recognition datasets lack occlusion samples, leading to poor model robustness under real-world occlusion. To address this, we introduce OccludeNet—the first large-scale, multi-view occlusion video dataset incorporating both real and synthetic occlusions—and propose Causal Action Recognition (CAR), a structural causal modeling framework. CAR systematically formalizes the occlusion–action causal structure, applies backdoor adjustment to eliminate confounding bias, and employs counterfactual reasoning to strengthen key subject representations. Experiments demonstrate significant improvements in occlusion-robust action recognition accuracy, particularly for actions with low scene correlation or partial limb visibility. Our work establishes a new benchmark for occlusion-robust action recognition and introduces a principled causal analysis paradigm for the field.

Technology Category

Application Category

📝 Abstract
The lack of occlusion data in commonly used action recognition video datasets limits model robustness and impedes sustained performance improvements. We construct OccludeNet, a large-scale occluded video dataset that includes both real-world and synthetic occlusion scene videos under various natural environments. OccludeNet features dynamic tracking occlusion, static scene occlusion, and multi-view interactive occlusion, addressing existing gaps in data. Our analysis reveals that occlusion impacts action classes differently, with actions involving low scene relevance and partial body visibility experiencing greater accuracy degradation. To overcome the limitations of current occlusion-focused approaches, we propose a structural causal model for occluded scenes and introduce the Causal Action Recognition (CAR) framework, which employs backdoor adjustment and counterfactual reasoning. This framework enhances key actor information, improving model robustness to occlusion. We anticipate that the challenges posed by OccludeNet will stimulate further exploration of causal relations in occlusion scenarios and encourage a reevaluation of class correlations, ultimately promoting sustainable performance improvements. The code and full dataset will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Lack of occlusion data limits action recognition robustness
Existing datasets lack dynamic and multi-view occlusions
Current methods fail to address causal links in occlusions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale occluded video dataset with real and synthetic scenes
Structural causal model for occluded scenes using CAR method
Backdoor adjustment and counterfactual reasoning enhance robustness
🔎 Similar Papers
No similar papers found.
Guanyu Zhou
Guanyu Zhou
Wuhan University of Technology
Artificial IntelligenceMachine LearningDeep Learning
W
Wenxuan Liu
Peking University, Wuhan University of Technology
W
Wenxin Huang
Hubei University
Xuemei Jia
Xuemei Jia
Wuhan University
Trustworthy AI、Adversarial Attack、Video and image representation
X
Xian Zhong
Wuhan University of Technology
C
Chia-Wen Lin
National Tsing Hua University