🤖 AI Summary
This work addresses environmental sound classification under resource-constrained conditions, investigating the joint impact of feature selection and pooling strategies on model efficiency and accuracy. We propose Sparse Significant Region Pooling (SSRP) and its two variants—SSRP-B and SSRP-T—integrated into lightweight CNNs by combining PCA-based dimensionality reduction with sparse activation mechanisms. Experimental results demonstrate that SSRP-T, through optimized sparse pattern design, achieves substantial performance gains with minimal computational overhead: 80.69% accuracy on ESC-50, outperforming a baseline CNN (66.75%) by 13.94 percentage points and significantly surpassing a PCA-only variant (37.60%). To our knowledge, this is the first study to empirically validate the effectiveness of jointly designing sparse pooling and dimensionality reduction for audio model lightweighting. The proposed approach provides an efficient and practical solution for environmental sound recognition on edge devices.
📝 Abstract
This paper explores the impact of dimensionality reduction and pooling methods for Environmental Sound Classification (ESC) using lightweight CNNs. We evaluate Sparse Salient Region Pooling (SSRP) and its variants, SSRP-Basic (SSRP-B) and SSRP-Top-K (SSRP-T), under various hyperparameter settings and compare them with Principal Component Analysis (PCA). Experiments on the ESC-50 dataset demonstrate that SSRP-T achieves up to 80.69 % accuracy, significantly outperforming both the baseline CNN (66.75 %) and the PCA-reduced model (37.60 %). Our findings confirm that a well-tuned sparse pooling strategy provides a robust, efficient, and high-performing solution for ESC tasks, particularly in resource-constrained scenarios where balancing accuracy and computational cost is crucial.