Evaluating Identity Leakage in Speaker De-Identification Systems

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates privacy leakage risks in speaker identity de-identification (SID) systems. While existing SID methods aim to conceal speaker identity, they still exhibit substantial residual information leakage. To address this, we introduce the first standardized evaluation benchmark for SID and propose three complementary metrics: Equal Error Rate (EER), Cumulative Match Hit Rate (CMHR), and an embedding-space similarity measure based on Canonical Correlation Analysis (CCA) combined with Procrustes alignment. Empirical evaluation across mainstream SID systems reveals severe identity leakage: the best-performing method achieves only marginal improvement over random guessing in EER, while the worst attains a 45% top-50 CMHR. Our work establishes the first multidimensional, reproducible, and cross-model quantitative assessment framework for privacy protection in speech de-identification, thereby introducing a new paradigm for rigorous security evaluation of SID systems.

Technology Category

Application Category

📝 Abstract
Speaker de-identification aims to conceal a speaker's identity while preserving intelligibility of the underlying speech. We introduce a benchmark that quantifies residual identity leakage with three complementary error rates: equal error rate, cumulative match characteristic hit rate, and embedding-space similarity measured via canonical correlation analysis and Procrustes analysis. Evaluation results reveal that all state-of-the-art speaker de-identification systems leak identity information. The highest performing system in our evaluation performs only slightly better than random guessing, while the lowest performing system achieves a 45% hit rate within the top 50 candidates based on CMC. These findings highlight persistent privacy risks in current speaker de-identification technologies.
Problem

Research questions and friction points this paper is trying to address.

Evaluating identity leakage in speaker de-identification systems
Quantifying residual identity leakage through multiple error rates
Assessing privacy risks in state-of-the-art de-identification technologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark with three complementary error rates
Measures embedding-space similarity via canonical correlation
Quantifies identity leakage in de-identification systems
🔎 Similar Papers
No similar papers found.