🤖 AI Summary
To address the limited representational capacity, low encoding efficiency, and poor few-shot adaptability of audio implicit neural representations (INRs), this paper proposes a novel audio INR modeling framework based on Kolmogorov–Arnold Networks (KANs), marking the first application of learnable-activation KANs to audio signal representation. We further introduce FewSound—a lightweight hypernetwork architecture enabling parameter-efficient adaptation and cross-sample generalization. Experiments on 1.5-second audio segments demonstrate that the KAN-based model achieves a Log-Spectral Distance (LSD) of 1.29 and a PESQ score of 3.57. Compared to the HyperSound baseline, FewSound reduces mean squared error (MSE) by 33.3% and improves scale-invariant signal-to-noise ratio (SI-SNR) by 60.87%. Our core contributions are threefold: (i) pioneering the integration of KANs with audio INRs; (ii) introducing learnable activations for spectral modeling; and (iii) establishing the first lightweight hypernetwork framework tailored for few-shot audio reconstruction.
📝 Abstract
Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound.git.