Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the challenges of traceability and poor cross-domain generalization in audio deepfake detection, this paper proposes Real Emphasis & Fake Dispersion (REFD), the first end-to-end framework for generative system provenance. Methodologically: (i) it integrates Conformer architectures with N-pair metric learning to jointly optimize fine-grained discriminability and robustness; (ii) it introduces a spectrogram enhancement preprocessing module to strengthen perceptibility of synthetic artifacts; and (iii) it employs a score-embedding ensemble strategy to concurrently improve in-domain accuracy and out-of-domain generalization. Evaluated across multiple benchmarks, REFD achieves significant gains over state-of-the-art methods in source identification accuracy, while reducing Fréchet distance by 23.6%, thereby demonstrating superior precision, strong cross-domain generalization, and robustness.

Technology Category

Application Category

📝 Abstract

Audio deepfakes are acquiring an unprecedented level of realism with advanced AI. While current research focuses on discerning real speech from spoofed speech, tracing the source system is equally crucial. This work proposes a novel audio source tracing system combining deep metric multi-class N-pair loss with Real Emphasis and Fake Dispersion framework, a Conformer classification network, and ensemble score-embedding fusion. The N-pair loss improves discriminative ability, while Real Emphasis and Fake Dispersion enhance robustness by focusing on differentiating real and fake speech patterns. The Conformer network captures both global and local dependencies in the audio signal, crucial for source tracing. The proposed ensemble score-embedding fusion shows an optimal trade-off between in-domain and out-of-domain source tracing scenarios. We evaluate our method using Frechet Distance and standard metrics, demonstrating superior performance in source tracing over the baseline system.

Problem

Research questions and friction points this paper is trying to address.

Trace origin of audio deepfakes using advanced AI techniques

Differentiate real and fake speech patterns robustly

Optimize performance for in-domain and out-of-domain scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep metric multi-class N-pair loss framework

Conformer network for global-local dependencies

Ensemble score-embedding fusion optimization

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey