Dark Transformer: A Video Transformer for Action Recognition in the Dark

📅 2024-06-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of end-to-end spatiotemporal representation learning for action recognition in low-light videos, this paper introduces the first video Transformer architecture explicitly designed for dark-scene understanding, jointly modeling low-light enhancement and spatiotemporal action representation learning. Our key contributions are: (1) a cross-domain spatiotemporal self-attention mechanism that simultaneously optimizes low-illumination video reconstruction and action discrimination; and (2) an end-to-end joint training paradigm that eliminates error propagation inherent in multi-stage pipelines. Extensive experiments on three benchmark low-light action datasets—InFAR, XD145, and ARID—demonstrate consistent state-of-the-art performance, significantly improving robustness and accuracy for real-world applications such as nighttime surveillance and autonomous driving.

Technology Category

Application Category

📝 Abstract
Recognizing human actions in adverse lighting conditions presents significant challenges in computer vision, with wide-ranging applications in visual surveillance and nighttime driving. Existing methods tackle action recognition and dark enhancement separately, limiting the potential for end-to-end learning of spatiotemporal representations for video action classification. This paper introduces Dark Transformer, a novel video transformer-based approach for action recognition in low-light environments. Dark Transformer leverages spatiotemporal self-attention mechanisms in cross-domain settings to enhance cross-domain action recognition. By extending video transformers to learn cross-domain knowledge, Dark Transformer achieves state-of-the-art performance on benchmark action recognition datasets, including InFAR, XD145, and ARID. The proposed approach demonstrates significant promise in addressing the challenges of action recognition in adverse lighting conditions, offering practical implications for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Recognizing human actions in low-light conditions using video transformers
Enhancing cross-domain action recognition through spatiotemporal self-attention mechanisms
Developing end-to-end learning for video action classification in dark environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video transformer for action recognition in dark
Cross-domain spatiotemporal self-attention mechanisms
End-to-end learning for low-light video classification
🔎 Similar Papers
No similar papers found.
A
Anwaar Ulhaq
Central Queensland University, School of Engineering and Technology, Sydney campus, Australia