🤖 AI Summary
To address the limitation of learnable audio frontends—whose parameters remain fixed during inference and thus struggle to adapt to dynamic acoustic environments—this paper proposes a closed-loop neural controller-based adaptive frontend architecture. Built upon a simplified LEAF frontend, the method incorporates a learnable per-channel energy normalization module and introduces, for the first time, a lightweight neural controller that dynamically adjusts normalization parameters online based on current and historical subband energies, enabling input-dependent feature optimization. Compared to conventional fixed or statically learnable frontends, our approach significantly improves audio classification performance under both clean and challenging conditions (e.g., noise and reverberation), demonstrating the efficacy of closed-loop adaptive modeling for robustness enhancement. The core contribution lies in integrating neural control into audio frontends, achieving end-to-end differentiability and inference-time dynamic adaptation for self-adaptive representation learning.
📝 Abstract
In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and limiting robustness under dynamic complex acoustic environments. In this paper, we introduce a novel adaptive paradigm for audio front-ends that replaces static parameterization with a closed-loop neural controller. Specifically, we simplify the learnable front-end LEAF architecture and integrate a neural controller for adaptive representation via dynamically tuning Per-Channel Energy Normalization. The neural controller leverages both the current and the buffered past subband energies to enable input-dependent adaptation during inference. Experimental results on multiple audio classification tasks demonstrate that the proposed adaptive front-end consistently outperforms prior fixed and learnable front-ends under both clean and complex acoustic conditions. These results highlight neural adaptability as a promising direction for the next generation of audio front-ends.