Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the increased intra-class variance caused by long video subsequences and the high computational cost of mainstream Transformer-based approaches in few-shot action recognition (FSAR), this paper proposes Matryoshka Mamba—a dual-module state-space architecture comprising inner and outer modules. The inner module enhances fine-grained feature representation via multi-granularity local modeling, while the outer module enables implicit temporal alignment. Furthermore, we introduce a hybrid supervised-unsupervised contrastive learning framework (SupCon + SimCLR) to explicitly suppress intra-class variance accumulation. Evaluated on SSv2, Kinetics, UCF101, and HMDB51, our method achieves new state-of-the-art performance in FSAR. Notably, it delivers significant improvements in accuracy and robustness under long-subsequence settings—marking the first successful adaptation of the Mamba architecture to efficient, high-accuracy few-shot action understanding.

Technology Category

Application Category

📝 Abstract
In few-shot action recognition (FSAR), long sub-sequences of video naturally express entire actions more effectively. However, the high computational complexity of mainstream Transformer-based methods limits their application. Recent Mamba demonstrates efficiency in modeling long sequences, but directly applying Mamba to FSAR overlooks the importance of local feature modeling and alignment. Moreover, long sub-sequences within the same class accumulate intra-class variance, which adversely impacts FSAR performance. To solve these challenges, we propose a Matryoshka MAmba and CoNtrasTive LeArning framework (Manta). Firstly, the Matryoshka Mamba introduces multiple Inner Modules to enhance local feature representation, rather than directly modeling global features. An Outer Module captures dependencies of timeline between these local features for implicit temporal alignment. Secondly, a hybrid contrastive learning paradigm, combining both supervised and unsupervised methods, is designed to mitigate the negative effects of intra-class variance accumulation. The Matryoshka Mamba and the hybrid contrastive learning paradigm operate in two parallel branches within Manta, enhancing Mamba for FSAR of long sub-sequence. Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51. Extensive empirical studies prove that Manta significantly improves FSAR of long sub-sequence from multiple perspectives.
Problem

Research questions and friction points this paper is trying to address.

Enhance few-shot action recognition
Reduce intra-class variance impact
Improve local feature modeling efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Matryoshka Mamba enhances local features
Hybrid contrastive learning reduces variance
Parallel branches optimize FSAR performance
🔎 Similar Papers
No similar papers found.
Wenbo Huang
Wenbo Huang
Southeast University | Institute of Science Tokyo
Video AnalysisMultimediaUbiquitous Computing
J
Jinghui Zhang
Southeast University, Nanjing 211189, Jiangsu, China
Guang Li
Guang Li
Assistant Professor, Hokkaido University
Dataset DistillationSelf-Supervised LearningData-Centric AIMedical Image Analysis
L
Lei Zhang
Nanjing Normal University, Nanjing 210023, Jiangsu, China
S
Shuoyuan Wang
Southern University of Science and Technology, Shenzhen 518055, Guangdong, China
Fang Dong
Fang Dong
Southeast University
Edge CompuingCloudAIOT
Jiahui Jin
Jiahui Jin
Southeast University
Cloud ComputingBig DataGraph DatabaseTask Scheduling
Takahiro Ogawa
Takahiro Ogawa
Hokkaido University
Multimedia ProcessingAIIoTBig Data Analysis
M
M. Haseyama
Hokkaido University, Sapporo 060-0808, Hokkaido, Japan