🤖 AI Summary
This work addresses the high adaptation cost and computational inefficiency of existing speech enhancement models when deployed in unseen noisy environments, particularly on resource-constrained devices. The authors propose a lightweight online adaptation method that freezes the backbone network and introduces only a low-rank adapter module, updating fewer than 1% of the model parameters. Leveraging self-supervised learning, the approach enables rapid environmental adaptation with minimal overhead. Evaluated across 111 scenarios encompassing 37 noise types and multiple signal-to-noise ratios, the method achieves an average improvement of 1.51 dB in SI-SDR after just 20 update steps per scenario. It matches or exceeds state-of-the-art methods in perceptual quality while substantially reducing computational and memory demands, and ensures stable convergence without compromising enhancement performance.
📝 Abstract
Recent studies have shown that post-deployment adaptation can improve the robustness of speech enhancement models in unseen noise conditions. However, existing methods often incur prohibitive computational and memory costs, limiting their suitability for on-device deployment. In this work, we investigate model adaptation in realistic settings with dynamic acoustic scene changes and propose a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training. Experiments on sequential scene evaluations spanning 111 environments across 37 noise types and three signal-to-noise ratio ranges, including the challenging [-8, 0] dB range, show that our method updates fewer than 1% of the base model's parameters while achieving an average 1.51 dB SI-SDR improvement within only 20 updates per scene. Compared to state-of-the-art approaches, our framework achieves competitive or superior perceptual quality with smoother and more stable convergence, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.