Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the growing threat of highly realistic synthetic speech in spoofing attacks, this work integrates Kolmogorov–Arnold Networks (KANs) into the XLSR-Conformer architecture, replacing conventional MLP layers to enhance discriminative capability in self-supervised speech representations. This is the first application of KANs to synthetic speech detection, leveraging their theoretically grounded nonlinear function approximation capacity to improve deep feature modeling in SSL models. Evaluated on the ASVspoof2021 benchmark, the proposed method achieves a 60.55% relative reduction in equal error rate (EER) on both the Logical Access (LA) and Deepfake (DF) subsets, with an EER of 0.70% on LA—substantially outperforming existing self-supervised baselines. These results demonstrate the effectiveness and generalization potential of KANs for audio deepfake detection.

Technology Category

Application Category

📝 Abstract

Recent advancements in speech synthesis technologies have led to increasingly advanced spoofing attacks, posing significant challenges for automatic speaker verification systems. While systems based on self-supervised learning (SSL) models, particularly the XLSR-Conformer model, have demonstrated remarkable performance in synthetic speech detection, there remains room for architectural improvements. In this paper, we propose a novel approach that replaces the traditional Multi-Layer Perceptron in the XLSR-Conformer model with a Kolmogorov-Arnold Network (KAN), a novel architecture based on the Kolmogorov-Arnold representation theorem. Our results on ASVspoof2021 demonstrate that integrating KAN into the SSL-based models can improve the performance by 60.55% relatively on LA and DF sets, further achieving 0.70% EER on the 21LA set. These findings suggest that incorporating KAN into SSL-based models is a promising direction for advances in synthetic speech detection.

Problem

Research questions and friction points this paper is trying to address.

Detect synthetic speech to counter spoofing attacks

Improve SSL-based models for better detection performance

Enhance architecture using Kolmogorov-Arnold Networks (KAN)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replacing MLP with Kolmogorov-Arnold Network

Integrating KAN into SSL-based models

Improving synthetic speech detection performance

🔎 Similar Papers

Effective Integration of KAN for Keyword Spotting