A Hypernetwork-Based Approach to KAN Representation of Audio Signals

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address the limited representational capacity, low encoding efficiency, and poor few-shot adaptability of audio implicit neural representations (INRs), this paper proposes a novel audio INR modeling framework based on Kolmogorov–Arnold Networks (KANs), marking the first application of learnable-activation KANs to audio signal representation. We further introduce FewSound—a lightweight hypernetwork architecture enabling parameter-efficient adaptation and cross-sample generalization. Experiments on 1.5-second audio segments demonstrate that the KAN-based model achieves a Log-Spectral Distance (LSD) of 1.29 and a PESQ score of 3.57. Compared to the HyperSound baseline, FewSound reduces mean squared error (MSE) by 33.3% and improves scale-invariant signal-to-noise ratio (SI-SNR) by 60.87%. Our core contributions are threefold: (i) pioneering the integration of KANs with audio INRs; (ii) introducing learnable activations for spectral modeling; and (iii) establishing the first lightweight hypernetwork framework tailored for few-shot audio reconstruction.

Technology Category

Application Category

📝 Abstract

Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at https://github.com/gmum/fewsound.git.

Problem

Research questions and friction points this paper is trying to address.

Develops KAN for efficient audio signal representation.

Introduces FewSound to enhance INR parameter updates.

Demonstrates KAN's superior performance in audio quality metrics.

Innovation

Methods, ideas, or system contributions that make the work stand out.

KAN uses learnable activation functions for audio.

FewSound enhances INR updates via hypernetwork architecture.

KAN achieves superior perceptual performance in audio.

🔎 Similar Papers

Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs