MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-duration action detection (TAD) in untrimmed videos faces critical challenges including temporal context decay, intra-frame self-interference, and insufficient global perception. To address these, this paper proposes an end-to-end, one-stage TAD framework. Our method introduces three key innovations: (1) a Diagonal-Masked Bidirectional State Space (DMBSS) module to mitigate temporal decay and cross-frame self-interference in recurrent modeling; (2) a State Space Temporal Adapter (SSTA) to enhance long-range temporal dependency modeling; and (3) a multi-granularity global feature fusion detection head to improve global contextual awareness and boundary localization accuracy. Extensive experiments demonstrate that our approach achieves new state-of-the-art performance on standard benchmarks—including THUMOS14 and ActivityNet v1.3—while simultaneously reducing computational cost and model parameters. This work establishes a more efficient and effective paradigm for long-duration action detection in untrimmed videos.

Technology Category

Application Category

📝 Abstract
Temporal Action Detection (TAD) aims to identify and localize actions by determining their starting and ending frames within untrimmed videos. Recent Structured State-Space Models such as Mamba have demonstrated potential in TAD due to their long-range modeling capability and linear computational complexity. On the other hand, structured state-space models often face two key challenges in TAD, namely, decay of temporal context due to recursive processing and self-element conflict during global visual context modeling, which become more severe while handling long-span action instances. Additionally, traditional methods for TAD struggle with detecting long-span action instances due to a lack of global awareness and inefficient detection heads. This paper presents MambaTAD, a new state-space TAD model that introduces long-range modeling and global feature detection capabilities for accurate temporal action detection. MambaTAD comprises two novel designs that complement each other with superior TAD performance. First, it introduces a Diagonal-Masked Bidirectional State-Space (DMBSS) module which effectively facilitates global feature fusion and temporal action detection. Second, it introduces a global feature fusion head that refines the detection progressively with multi-granularity features and global awareness. In addition, MambaTAD tackles TAD in an end-to-end one-stage manner using a new state-space temporal adapter(SSTA) which reduces network parameters and computation cost with linear complexity. Extensive experiments show that MambaTAD achieves superior TAD performance consistently across multiple public benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Detecting long-span actions in videos with global awareness
Overcoming temporal context decay in state-space models
Improving efficiency of temporal action detection heads
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagonal-Masked Bidirectional State-Space module for global fusion
Global feature fusion head with multi-granularity refinement
End-to-end state-space temporal adapter with linear complexity
🔎 Similar Papers
No similar papers found.
Hui Lu
Hui Lu
Department of Computer Science and Engineering (CSE), the University of Texas at Arlington (UTA)
Cloud ComputingVirtualizationFile and Storage SystemsComputer NetworksComputer Systems
Y
Yi Yu
Rapid-Rich Object Search Lab, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore
Shijian Lu
Shijian Lu
College of Computing and Data Science, NTU
Image and video analyticscomputer visionmachine learning
Deepu Rajan
Deepu Rajan
Nanyang Technological University
Image ProcessingComputer Vision
B
Boon Poh Ng
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
A
Alex C. Kot
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
X
Xudong Jiang
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore