Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

271K/year

🤖 AI Summary

This work addresses a critical limitation in existing audio deepfake detection methods, which often treat spectrograms as generic images and neglect their intrinsic time-frequency structure. To overcome this, the authors propose Q-Patch, a novel approach that integrates a time-frequency-aware four-dimensional acoustic descriptor with an adjacency-sensitive entangled architecture. Specifically, local patches of Mel-spectrograms are encoded into quantum states using a shallow quantum circuit of at most three layers operating on four qubits, enabling the construction of a tailored quantum kernel suited for near-term quantum devices. Under a balanced evaluation protocol, Q-Patch achieves an AUROC of 0.87, outperforming classical RBF-SVM (0.82). Furthermore, in the induced kernel space, intra-class self-similarity reaches 1.00 while inter-class similarity drops to 0.615, demonstrating strong discriminative capability.

📝 Abstract

Quantum machine learning has emerged as a promising tool for pattern recognition, yet many audio-focused approaches still treat spectrograms as generic images and do not explicitly exploit their time-frequency structure. We propose Q-Patch, a quantum feature map tailored to audio that encodes local time-frequency patches from mel-spectrograms into quantum states using shallow, hardware-efficient circuits with adjacency-aware entanglement. Each selected patch is summarized by a compact four-dimensional acoustic descriptor and mapped to a four-qubit circuit with depth at most three, enabling practical quantum kernel construction under near-term constraints. We evaluate Q-Patch on an audio spoofing detection task using a controlled, balanced protocol and compare it with size-matched classical baselines. Q-Patch improves discrimination between bona fide and spoofed samples, achieving an area under the receiver operating characteristic curve (AUROC) of 0.87, compared with 0.82 for a radial basis function support vector machine (RBF-SVM) trained on the same patch-level features. Kernel-space analysis further reveals a clear class structure, with cross-class similarity around 0.615 and within-class self-similarity of 1.00. Overall, Q-Patch provides a practical framework for incorporating time-frequency-aware representations into quantum kernel learning for audio authenticity assessment in low-resource settings.

Problem

Research questions and friction points this paper is trying to address.

quantum kernels

audio deepfake detection

spectrogram

time-frequency structure

quantum machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum kernel

time-frequency representation

spectrogram patch