Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the fundamental trade-off between time and frequency resolution inherent in conventional time–frequency representations such as the short-time Fourier transform (STFT), which is governed by the Gabor–Heisenberg uncertainty principle. To overcome this limitation, the authors propose a super-resolution spectrogram fusion method based on optimal transport (OT). By designing a novel transport cost function that preserves time–frequency geometric structure while remaining computationally efficient, they formulate a block majorization–minimization algorithm within an unbalanced OT framework. This approach enables flexible fusion of spectrograms defined on arbitrary time–frequency grids without requiring a common grid alignment, effectively integrating their locally optimal time–frequency characteristics. Experiments on both synthetic signals and real-world speech demonstrate that the proposed method consistently outperforms existing unsupervised fusion techniques in both qualitative and quantitative evaluations.

Technology Category

Application Category

📝 Abstract
Time-frequency representations, such as the short-time Fourier transform (STFT), are fundamental tools for analyzing non-stationary signals. However, their ability to achieve sharp localization in both time and frequency is inherently limited by the Gabor-Heisenberg uncertainty principle. In this paper, we address this limitation by introducing a method to generate super-resolution spectrograms through the fusion of two or more spectrograms with varying resolutions. Specifically, we compute the super-resolution spectrogram as the barycenter of input spectrograms using optimal transport (OT) divergences. Unlike existing fusion approaches, our method does not require the input spectrograms to share the same time-frequency grid. Instead, the input spectrograms can be computed using any STFT parameters, and the resulting super-resolution spectrogram can be defined on an arbitrary user-specified grid. We explore various OT divergences based on different transportation costs. Notably, we introduce a novel transportation cost that preserves time-frequency geometry while significantly reducing computational complexity compared to standard Wasserstein barycenters. We adopt the unbalanced OT framework and derive a new block majorization-minimization algorithm for efficient barycenter computation. We validate the proposed method on controlled synthetic signals and recorded speech using both quantitative and qualitative evaluations. The results show that our approach combines the best localization properties of the input spectrograms and outperforms an unsupervised state-of-the-art fusion method.
Problem

Research questions and friction points this paper is trying to address.

time-frequency resolution
spectrogram fusion
uncertainty principle
super-resolution
non-stationary signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal transport
spectrogram fusion
super-resolution time-frequency representation
Wasserstein barycenter
unbalanced optimal transport