Time-Domain Voice Identity Morphing (TD-VIM): A Signal-Level Approach to Morphing Attacks on Speaker Verification Systems

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of effective identity-morphing attacks in voice biometrics, a domain where prior research has predominantly focused on image modalities. The work proposes the first time-domain, signal-level voice morphing attack that directly fuses speech features from two speakers to generate adversarial samples capable of simultaneously deceiving multiple speaker verification systems. Experimental evaluation on multilingual audio-visual datasets, deep learning models, and a commercial system (Verispeak) demonstrates remarkable success rates of 99.40% on iPhone-11 and 99.74% on Samsung S8 under text-dependent conditions at a false match rate (FMR) of 0.1%. These results expose critical security vulnerabilities in current voice authentication systems and introduce Generalized Morphing Attack Potential (G-MAP) as a novel metric for assessing such threats.
📝 Abstract
In biometric systems, it is a common practice to associate each sample or template with a specific individual. Nevertheless, recent studies have demonstrated the feasibility of generating "morphed" biometric samples capable of matching multiple identities. These morph attacks have been recognized as potential security risks for biometric systems. However, most research on morph attacks has focused on biometric modalities that operate within the image domain, such as the face, fingerprints, and iris. In this work, we introduce Time-domain Voice Identity Morphing (TD-VIM), a novel approach for voice-based biometric morphing. This method enables the blending of voice characteristics from two distinct identities at the signal level, creating morphed samples that present a high vulnerability for speaker verification systems. Leveraging the Multilingual Audio-Visual Smartphone database, our study created four distinct morphed signals based on morphing factors and evaluated their effectiveness using a comprehensive vulnerability analysis. To assess the security impact of TD-VIM, we benchmarked our approach using the Generalized Morphing Attack Potential (G-MAP) metric, measuring attack success across two deep-learning-based Speaker Verification Systems (SVS) and one commercial system, Verispeak. Our findings indicate that the morphed voice samples achieved a high attack success rate, with G-MAP values reaching 99.40% on iPhone-11 and 99.74% on Samsung S8 in text-dependent scenarios, at a false match rate of 0.1%.
Problem

Research questions and friction points this paper is trying to address.

Voice Identity Morphing
Speaker Verification
Morphing Attacks
Biometric Security
Time-Domain Signal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-Domain Voice Identity Morphing
Speaker Verification
Morphing Attack
Signal-Level Manipulation
G-MAP
🔎 Similar Papers
No similar papers found.