Reliable Audio Deepfake Detection in Variable Conditions via Quantum-Kernel SVMs

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalization and high false alarm rates in audio deepfake detection under scarce labeled data and highly variable recording conditions, this paper proposes a quantum kernel support vector machine (QK-SVM) method grounded in classically simulatable quantum feature maps. Without altering the feature extraction pipeline (e.g., Mel-spectrogram preprocessing) or introducing additional trainable parameters, our approach replaces the classical SVM kernel with a quantum-derived kernel, thereby preserving model lightweightness while substantially improving cross-domain robustness. Evaluated on four major benchmarks—ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, and In-the-Wild—QK-SVM achieves consistently lower equal error rates (EER) than classical SVM, with maximum EER reduction of 56.9% on ADD23. Absolute false alarm rate reductions range from 0.053 to 0.116. This work provides the first empirical validation that quantum kernels enhance inter-class separability in audio deepfake detection.

Technology Category

Application Category

📝 Abstract
Detecting synthetic speech is challenging when labeled data are scarce and recording conditions vary. Existing end-to-end deep models often overfit or fail to generalize, and while kernel methods can remain competitive, their performance heavily depends on the chosen kernel. Here, we show that using a quantum kernel in audio deepfake detection reduces falsepositive rates without increasing model size. Quantum feature maps embed data into high-dimensional Hilbert spaces, enabling the use of expressive similarity measures and compact classifiers. Building on this motivation, we compare quantum-kernel SVMs (QSVMs) with classical SVMs using identical mel-spectrogram preprocessing and stratified 5-fold cross-validation across four corpora (ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, and an In-the-Wild set). QSVMs achieve consistently lower equalerror rates (EER): 0.183 vs. 0.299 on ASVspoof 5 (2024), 0.081 vs. 0.188 on ADD23, 0.346 vs. 0.399 on ASVspoof 2019, and 0.355 vs. 0.413 In-the-Wild. At the EER operating point (where FPR equals FNR), these correspond to absolute false-positiverate reductions of 0.116 (38.8%), 0.107 (56.9%), 0.053 (13.3%), and 0.058 (14.0%), respectively. We also report how consistent the results are across cross-validation folds and margin-based measures of class separation, using identical settings for both models. The only modification is the kernel; the features and SVM remain unchanged, no additional trainable parameters are introduced, and the quantum kernel is computed on a conventional computer.
Problem

Research questions and friction points this paper is trying to address.

Detect synthetic speech with scarce labeled data
Reduce false positives without increasing model size
Generalize across variable recording conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum-kernel SVMs for audio deepfake detection
Quantum feature maps embed data into Hilbert spaces
Reduced false-positive rates without increasing model size
🔎 Similar Papers
No similar papers found.
Lisan Al Amin
Lisan Al Amin
Health Resources and Service Administration(HRSA)
Health ITDeepfake DetectionQuantum Computing
V
Vandana P. Janeja
University of Maryland, Baltimore County, Baltimore, Maryland, USA