🤖 AI Summary
This work addresses the lack of invertible, stable, and perceptually grounded filter banks in machine learning. We propose ISAC: an invertible, strictly stable, complex wavelet-based filter bank enabling perfect reconstruction. ISAC is the first framework to jointly realize auditory-scale mapping (e.g., ERB), user-specified time-domain support, jointly tunable center frequencies and bandwidths, and end-to-end differentiability. Its analysis–synthesis pair is constructed from parameterized FIR convolutional kernels, ensuring native compatibility with deep learning frameworks such as PyTorch. Experiments demonstrate that ISAC significantly improves model robustness and generalization in speech enhancement and source separation tasks. Moreover, it supports zero-latency real-time processing and lossless signal reconstruction. The implementation is publicly available.
📝 Abstract
This paper introduces ISAC, an invertible and stable, perceptually-motivated filter bank that is specifically designed to be integrated into machine learning paradigms. More precisely, the center frequencies and bandwidths of the filters are chosen to follow a non-linear, auditory frequency scale, the filter kernels have user-defined maximum temporal support and may serve as learnable convolutional kernels, and there exists a corresponding filter bank such that both form a perfect reconstruction pair. ISAC provides a powerful and user-friendly audio front-end suitable for any application, including analysis-synthesis schemes.