Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Conventional MFCCs lack time-frequency localization capability, while traditional wavelet transforms suffer from mismatched frequency resolution relative to human auditory perception and high computational complexity. To address these limitations, this paper proposes a time-domain Mel-frequency wavelet coefficient (TMFWC) feature extraction method. TMFWC directly embeds the auditory-perceptual characteristics of the Mel filterbank into time-domain wavelet design, eliminating the need for frequency-domain transformations and redundant filtering, thereby enabling end-to-end, efficient time-frequency localization. Furthermore, integration with reservoir computing significantly reduces feature extraction overhead. Experimental results demonstrate that TMFWC preserves auditory consistency while enhancing time-frequency resolution for non-stationary speech signals. Compared to existing Mel-scale wavelet approaches, TMFWC reduces computational complexity by approximately 30%–50%, establishing a lightweight, high-fidelity paradigm for real-time audio processing.

Technology Category

Application Category

📝 Abstract

Extracting features from the speech is the most critical process in speech signal processing. Mel Frequency Cepstral Coefficients (MFCC) are the most widely used features in the majority of the speaker and speech recognition applications, as the filtering in this feature is similar to the filtering taking place in the human ear. But the main drawback of this feature is that it provides only the frequency information of the signal but does not provide the information about at what time which frequency is present. The wavelet transform, with its flexible time-frequency window, provides time and frequency information of the signal and is an appropriate tool for the analysis of non-stationary signals like speech. On the other hand, because of its uniform frequency scaling, a typical wavelet transform may be less effective in analysing speech signals, have poorer frequency resolution in low frequencies, and be less in line with human auditory perception. Hence, it is necessary to develop a feature that incorporates the merits of both MFCC and wavelet transform. A great deal of studies are trying to combine both these features. The present Wavelet Transform based Mel-scaled feature extraction methods require more computation when a wavelet transform is applied on top of Mel-scale filtering, since it adds extra processing steps. Here we are proposing a method to extract Mel scale features in time domain combining the concept of wavelet transform, thus reducing the computational burden of time-frequency conversion and the complexity of wavelet extraction. Combining our proposed Time domain Mel frequency Wavelet Coefficient(TMFWC) technique with the reservoir computing methodology has significantly improved the efficiency of audio signal processing.

Problem

Research questions and friction points this paper is trying to address.

Extracting time-frequency features for speech recognition

Combining wavelet transform with Mel-scale filtering efficiently

Reducing computational complexity in audio signal processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Time domain Mel frequency wavelet coefficient extraction

Combining wavelet transform with Mel-scale filtering

Reducing computational complexity in feature extraction

🔎 Similar Papers

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

2024-03-06FICCCitations: 14

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Speech Scientist / Engineer (Interspeech 2022)

Apple

Cupertino, United States of America

Authors to Follow