Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

📅 2024-03-06

🏛️ FICC

📈 Citations: 14

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Systematic comparative evaluation of STFT spectrograms versus wavelet scalograms as CNN inputs for acoustic recognition remains lacking. Method: This work conducts the first fair, comprehensive performance attribution analysis under a unified CNN architecture, rigorously assessing noise robustness, time-frequency resolution, and task-specific adaptability. Experiments employ a multi-SNR acoustic dataset with standardized preprocessing (Mel filtering) and feature extraction (STFT and continuous wavelet transform, CWT). Contribution/Results: Spectrograms achieve 2.1% higher accuracy on stationary speech recognition, whereas scalograms yield a 5.7% improvement in F1-score for transient/non-stationary sound classification (e.g., knocks, alarms). The study elucidates fundamental representational differences between these time-frequency representations and provides an empirically grounded, scenario-aware decision-making framework for selecting optimal features in acoustic recognition tasks.

Technology Category

Application Category

📝 Abstract

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet Transform. However, there is a notable deficiency in studies that comprehensively discuss the advantages, drawbacks, and performance comparisons of these methods. This paper aims to evaluate the characteristics of these two transforms as input data for acoustic recognition using Convolutional Neural Networks. The performance of the trained models employing both transforms is documented for comparison. Through this analysis, the paper elucidates the advantages and limitations of each method, provides insights into their respective application scenarios, and identifies potential directions for further research.

Problem

Research questions and friction points this paper is trying to address.

Compares spectrogram and scalogram for acoustic recognition input

Evaluates performance using Convolutional Neural Networks on audio tasks

Analyzes advantages, limitations, and application scenarios of both transforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares spectrogram and scalogram for acoustic recognition inputs

Uses Convolutional Neural Networks to evaluate both transforms

Analyzes advantages, limitations, and application scenarios of each

🔎 Similar Papers

Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs