SELEBI: Percussion-aware Time Stretching via Selective Magnitude Spectrogram Compression by Nonstationary Gabor Transform

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the “percussive smearing” artifacts commonly produced by traditional phase vocoders during audio time-stretching, which degrade the perceptual quality of percussive components. The authors propose SELEBI, a novel method that introduces nonstationary Gabor transforms into adaptive time-stretching for the first time. By directly generating time-localized magnitude spectrograms in the time domain and dynamically adjusting the analysis window length, SELEBI enhances the consistency between magnitude and phase structures. Notably, the approach achieves this without relying on heuristic post-processing or explicit signal component separation, while preserving perfect reconstruction properties. Experimental results demonstrate that SELEBI effectively mitigates percussive smearing and significantly improves the naturalness, stability, and high-fidelity reconstruction of the synthesized audio.

Technology Category

Application Category

📝 Abstract
Phase vocoder-based time-stretching is a widely used technique for the time-scale modification of audio signals. However, conventional implementations suffer from ``percussion smearing,'' a well-known artifact that significantly degrades the quality of percussive components. We attribute this artifact to a fundamental time-scale mismatch between the temporally smeared magnitude spectrogram and the localized, newly generated phase. To address this, we propose SELEBI, a signal-adaptive phase vocoder algorithm that significantly reduces percussion smearing while preserving stability and the perfect reconstruction property. Unlike conventional methods that rely on heuristic processing or component separation, our approach leverages the nonstationary Gabor transform. By dynamically adapting analysis window lengths to assign short windows to intervals containing significant energy associated with percussive components, we directly compute a temporally localized magnitude spectrogram from the time-domain signal. This approach ensures greater consistency between the temporal structures of the magnitude and phase. Furthermore, the perfect reconstruction property of the nonstationary Gabor transform guarantees stable, high-fidelity signal synthesis, in contrast to previous heuristic approaches. Experimental results demonstrate that the proposed method effectively mitigates percussion smearing and yields natural sound quality.
Problem

Research questions and friction points this paper is trying to address.

percussion smearing
time stretching
phase vocoder
audio signal processing
magnitude spectrogram
Innovation

Methods, ideas, or system contributions that make the work stand out.

nonstationary Gabor transform
percussion-aware time stretching
selective magnitude spectrogram compression
phase vocoder
perfect reconstruction
🔎 Similar Papers
No similar papers found.