Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Deepfake attacks generated by voice conversion and TTS synthesis pose serious threats to automatic speaker verification (ASV) systems. To address this, we propose an enhanced AASIST anti-spoofing architecture. Our method retains the frozen Wav2Vec 2.0 encoder to preserve robust self-supervised speech representations; replaces the original graph attention module with normalized multi-head attention and introduces heterogeneous query projections to enhance feature discriminability; and designs a trainable, context-aware frame-segment fusion layer to improve modeling and ensemble capability in low-resource scenarios. Ablation studies validate the effectiveness of each component. Evaluated on the ASVspoof 2021 dataset, our model achieves an equal error rate (EER) of 7.6%, significantly outperforming the baseline AASIST. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Advances in voice conversion and text-to-speech synthesis have made automatic speaker verification (ASV) systems more susceptible to spoofing attacks. This work explores modest refinements to the AASIST anti-spoofing architecture. It incorporates a frozen Wav2Vec 2.0 encoder to retain self-supervised speech representations in limited-data settings, substitutes the original graph attention block with a standardized multi-head attention module using heterogeneous query projections, and replaces heuristic frame-segment fusion with a trainable, context-aware integration layer. When evaluated on the ASVspoof 5 corpus, the proposed system reaches a 7.6% equal error rate (EER), improving on a re-implemented AASIST baseline under the same training conditions. Ablation experiments suggest that each architectural change contributes to the overall performance, indicating that targeted adjustments to established models may help strengthen speech deepfake detection in practical scenarios. The code is publicly available at https://github.com/KORALLLL/AASIST_SCALING.

Problem

Research questions and friction points this paper is trying to address.

Enhancing AASIST for better speech deepfake detection

Improving spoofing attack resistance in ASV systems

Optimizing graph attention in limited-data settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen Wav2Vec 2.0 encoder for self-supervised speech

Standardized multi-head attention with heterogeneous queries

Trainable context-aware fusion layer replaces heuristics

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection