SMART-vision: survey of modern action recognition techniques in vision

📅 2024-12-21

🏛️ Multimedia tools and applications

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address key challenges in human activity recognition (HAR)—including opaque hybrid methodologies, unclear deep learning mechanisms, and poor generalization to unseen activities—this paper systematically reviews over 120 state-of-the-art works published between 2018 and 2024. We propose the SMART taxonomy—Semantic, Multi-scale, Adaptive, Robust, and Temporal—as the first unified classification framework to standardize evaluation and elucidate technological evolution. Focusing on emerging paradigms such as Transformers, spatiotemporal graph neural networks, multimodal fusion, self-supervised contrastive learning, and prompt-based fine-tuning, we rigorously analyze their impacts on classification accuracy, model interpretability, and lightweight deployment. We identify three persistent bottlenecks: data bias, limitations in temporal modeling, and insufficient cross-domain generalization. Finally, we articulate design principles for scalable, benchmark-driven evaluation, offering both theoretical foundations and practical guidelines to advance HAR research and deployment.