🤖 AI Summary
Existing speaker de-identification evaluation focuses solely on identity re-identification risk, overlooking leakage of soft biometric attributes—such as channel type, age, dialect, gender, and speaking style—that compromise speaker privacy even when identity remains anonymous.
Method: We propose the Soft Biometric Leakage Score (SBLS) framework, the first to uniformly quantify the inferability of multiple non-unique attributes from anonymized speech under zero-shot attacks. SBLS integrates inference with pretrained classifiers, mutual information analysis, and cross-attribute subgroup robustness evaluation to expose security vulnerabilities undetectable by conventional distributional metrics.
Contribution/Results: Evaluated on five state-of-the-art de-identification systems using publicly available models, SBLS reveals significant soft biometric leakage across all systems—enabling high-confidence recovery of sensitive attributes via zero-shot inference alone. This work establishes a novel, rigorous paradigm for speech privacy assessment and provides a reproducible, open-source benchmark tool.
📝 Abstract
We use the term re-identification to refer to the process of recovering the original speaker's identity from anonymized speech outputs. Speaker de-identification systems aim to reduce the risk of re-identification, but most evaluations focus only on individual-level measures and overlook broader risks from soft biometric leakage. We introduce the Soft Biometric Leakage Score (SBLS), a unified method that quantifies resistance to zero-shot inference attacks on non-unique traits such as channel type, age range, dialect, sex of the speaker, or speaking style. SBLS integrates three elements: direct attribute inference using pre-trained classifiers, linkage detection via mutual information analysis, and subgroup robustness across intersecting attributes. Applying SBLS with publicly available classifiers, we show that all five evaluated de-identification systems exhibit significant vulnerabilities. Our results indicate that adversaries using only pre-trained models - without access to original speech or system details - can still reliably recover soft biometric information from anonymized output, exposing fundamental weaknesses that standard distributional metrics fail to capture.