ISAC: An Invertible and Stable Auditory Filter Bank with Customizable Kernels for ML Integration

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of invertible, stable, and perceptually grounded filter banks in machine learning. We propose ISAC: an invertible, strictly stable, complex wavelet-based filter bank enabling perfect reconstruction. ISAC is the first framework to jointly realize auditory-scale mapping (e.g., ERB), user-specified time-domain support, jointly tunable center frequencies and bandwidths, and end-to-end differentiability. Its analysis–synthesis pair is constructed from parameterized FIR convolutional kernels, ensuring native compatibility with deep learning frameworks such as PyTorch. Experiments demonstrate that ISAC significantly improves model robustness and generalization in speech enhancement and source separation tasks. Moreover, it supports zero-latency real-time processing and lossless signal reconstruction. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
This paper introduces ISAC, an invertible and stable, perceptually-motivated filter bank that is specifically designed to be integrated into machine learning paradigms. More precisely, the center frequencies and bandwidths of the filters are chosen to follow a non-linear, auditory frequency scale, the filter kernels have user-defined maximum temporal support and may serve as learnable convolutional kernels, and there exists a corresponding filter bank such that both form a perfect reconstruction pair. ISAC provides a powerful and user-friendly audio front-end suitable for any application, including analysis-synthesis schemes.
Problem

Research questions and friction points this paper is trying to address.

Designing an invertible auditory filter bank for ML integration
Customizing filter kernels with user-defined temporal support
Ensuring perfect reconstruction in analysis-synthesis audio schemes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Invertible and stable auditory filter bank
Customizable non-linear auditory frequency scale
Learnable convolutional kernels for ML
D
Daniel Haider
Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
F
Felix Perfler
Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
Peter Balazs
Peter Balazs
Acoustics Research Institute, Austrian Academy of Sciences
Application-oriented mathematicsAcousticsFrame Theory
C
Clara Hollomey
University of Applied Sciences St. Pölten, St. Pölten, Austria
Nicki Holighaus
Nicki Holighaus
Acoustics Research Institute, Austrian Academy of Sciences