Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the poor interpretability and black-box decision-making prevalent in deepfake audio detection by proposing a multi-task Transformer architecture that jointly discriminates genuine from spoofed speech while predicting formant trajectories and phonation patterns. By incorporating an intrinsic interpretability mechanism, optimizing input segmentation strategies, and refining the decoding process, the model achieves high detection performance with reduced parameter count and training time. Built upon an enhanced Speaker-Formant Transformer, the approach integrates temporal formant modeling, multi-task learning, and attention visualization to significantly improve model transparency. The method outperforms baseline models without compromising detection accuracy, offering a more interpretable and efficient solution for deepfake audio forensics.

Technology Category

Application Category

📝 Abstract

In this work, we introduce a multi-task transformer for speech deepfake detection, capable of predicting formant trajectories and voicing patterns over time, ultimately classifying speech as real or fake, and highlighting whether its decisions rely more on voiced or unvoiced regions. Building on a prior speaker-formant transformer architecture, we streamline the model with an improved input segmentation strategy, redesign the decoding process, and integrate built-in explainability. Compared to the baseline, our model requires fewer parameters, trains faster, and provides better interpretability, without sacrificing prediction performance.

Problem

Research questions and friction points this paper is trying to address.

speech deepfake detection

explainability

formant modeling

multi-task learning

voice authenticity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Task Transformer

Formant Modeling

Explainable AI