CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D video action recognition models suffer from excessive computational overhead, hindering deployment on resource-constrained edge devices. Method: We propose an efficient edge-oriented architecture—a lightweight 3D network integrating depthwise separable 3D convolutions with linear-complexity attention to capture long-range temporal dependencies—complemented by channel–spatiotemporal joint pruning for parameter and FLOPs reduction. Furthermore, we design a customized quantization scheme ensuring training stability and inference efficiency, supporting low-bitweight/activation co-optimization and hardware-friendly integer-only inference. Results: On Kinetics-400 and Something-Something V2 benchmarks, our method achieves comparable or superior accuracy using less than 30% of the computational cost of mainstream 3D models, enabling real-time applications such as smart-home surveillance and remote healthcare behavior analysis.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a competitive computing cost.
Problem

Research questions and friction points this paper is trying to address.

Efficient video activity recognition on edge devices
Reducing computational cost while maintaining accuracy
Addressing privacy and efficiency in smart applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines convolutional layers with linear-complexity attention
Introduces novel quantization for training and inference efficiency
Maintains reduced computational cost with robust learning
🔎 Similar Papers
No similar papers found.