Measuring Soft Biometric Leakage in Speaker De-Identification Systems

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing speaker de-identification evaluation focuses solely on identity re-identification risk, overlooking leakage of soft biometric attributes—such as channel type, age, dialect, gender, and speaking style—that compromise speaker privacy even when identity remains anonymous. Method: We propose the Soft Biometric Leakage Score (SBLS) framework, the first to uniformly quantify the inferability of multiple non-unique attributes from anonymized speech under zero-shot attacks. SBLS integrates inference with pretrained classifiers, mutual information analysis, and cross-attribute subgroup robustness evaluation to expose security vulnerabilities undetectable by conventional distributional metrics. Contribution/Results: Evaluated on five state-of-the-art de-identification systems using publicly available models, SBLS reveals significant soft biometric leakage across all systems—enabling high-confidence recovery of sensitive attributes via zero-shot inference alone. This work establishes a novel, rigorous paradigm for speech privacy assessment and provides a reproducible, open-source benchmark tool.

Technology Category

Application Category

📝 Abstract

We use the term re-identification to refer to the process of recovering the original speaker's identity from anonymized speech outputs. Speaker de-identification systems aim to reduce the risk of re-identification, but most evaluations focus only on individual-level measures and overlook broader risks from soft biometric leakage. We introduce the Soft Biometric Leakage Score (SBLS), a unified method that quantifies resistance to zero-shot inference attacks on non-unique traits such as channel type, age range, dialect, sex of the speaker, or speaking style. SBLS integrates three elements: direct attribute inference using pre-trained classifiers, linkage detection via mutual information analysis, and subgroup robustness across intersecting attributes. Applying SBLS with publicly available classifiers, we show that all five evaluated de-identification systems exhibit significant vulnerabilities. Our results indicate that adversaries using only pre-trained models - without access to original speech or system details - can still reliably recover soft biometric information from anonymized output, exposing fundamental weaknesses that standard distributional metrics fail to capture.

Problem

Research questions and friction points this paper is trying to address.

Quantifying soft biometric leakage in speaker de-identification systems

Assessing vulnerability to zero-shot inference attacks on non-unique traits

Evaluating re-identification risks from anonymized speech outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified method quantifying zero-shot inference attacks

Integrates attribute inference and mutual information analysis

Uses pre-trained classifiers to expose system vulnerabilities

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey